Skip to content

Conversation

@colesbury
Copy link
Member

@colesbury colesbury commented Nov 8, 2018

The new error message now looks like (from Python):

  RuntimeError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 11.93 GiB total capacity; 4.00 GiB already allocated; 7.33 GiB free; 179.00 KiB cached)

Summary of terms:

  "total capacity": total global memory on GPU
  "already allocated": memory allocated by the program using the
                       caching allocator
  "free": free memory as reported by the CUDA API
  "cached": memory held by the allocator but not used by the program
 
  The "allocated" amount  does not include memory allocated outside
  of the caching allocator, such as memory allocated by other programs
  or memory held by the driver.
 
  The sum of "allocated" + "free" + "cached" may be less than the
  total capacity due to memory held by the driver and usage by other
  programs.

  Note that at this point cuda_malloc_retry has already returned all
  possible "cached" memory to the driver. The only remaining "cached"
  memory is split from a larger block that is partially in-use.

This also fixes an issue where on out-of-memory could cause an unrelated subsequent CUDA kernel launch to fail because cudaGetLastError() was not cleared.

The new error message now looks like (from Python):

  RuntimeError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 11.93 GiB total; 4.00 GiB allocated; 179.00 KiB cached)

Summary of terms:

  "total": total global memory on GPU
  "allocated": memory allocated by the program using the caching allocator
  "cached": memory held by the allocator but not used by the program

  The "allocated" amount  does not include memory allocated outside
  of the caching allocator, such as memory allocated by other programs
  or memory held by the driver.

  Note that at this point cuda_malloc_retry has already returned all
  possible "cached" memory to the driver. The only remaining "cached"
  memory is split from a larger block that is partially in-use.
@soumith
Copy link
Contributor

soumith commented Nov 8, 2018

I think allocated here is confusing, because it conflicts with Tried to allocate. Would it make sense to say:

GPU 0; 11.93 GiB total capacity; 4.00 GiB already allocated; 179.00 KiB cached


Block search_key(device, stream, size);
auto& free_blocks = small ? large_blocks : small_blocks;
auto& free_blocks = small ? small_blocks : large_blocks;

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

cudaGetLastError(); // clear CUDA error

cudaDeviceProp prop;
AT_CUDA_CHECK(cudaGetDeviceProperties(&prop, device));

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesbury has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Nov 9, 2018
Summary:
```
The new error message now looks like (from Python):

  RuntimeError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 11.93 GiB total capacity; 4.00 GiB already allocated; 7.33 GiB free; 179.00 KiB cached)

Summary of terms:

  "total capacity": total global memory on GPU
  "already allocated": memory allocated by the program using the
                       caching allocator
  "free": free memory as reported by the CUDA API
  "cached": memory held by the allocator but not used by the program

  The "allocated" amount  does not include memory allocated outside
  of the caching allocator, such as memory allocated by other programs
  or memory held by the driver.

  The sum of "allocated" + "free" + "cached" may be less than the
  total capacity due to memory held by the driver and usage by other
  programs.

  Note that at this point cuda_malloc_retry has already returned all
  possible "cached" memory to the driver. The only remaining "cached"
  memory is split from a larger block that is partially in-use.
```

This also fixes an issue where on out-of-memory could cause an unrelated subsequent CUDA kernel launch to fail because `cudaGetLastError()` was not cleared.
Pull Request resolved: pytorch/pytorch#13751

Differential Revision: D13007177

Pulled By: colesbury

fbshipit-source-id: ea7121461b3f2a34646102959b45bde19f2fabab
@ezyang ezyang added the merged label Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants