Skip to content

[BE] Preserve caller source location in the error message#162808

Closed
albanD wants to merge 3 commits intopytorch:mainfrom
albanD:export-D81880552
Closed

[BE] Preserve caller source location in the error message#162808
albanD wants to merge 3 commits intopytorch:mainfrom
albanD:export-D81880552

Conversation

@albanD
Copy link
Collaborator

@albanD albanD commented Sep 12, 2025

Summary:
Currently the C10_CUDA_CHECK only shows source location in CUDAException like below:

Exception raised from c10_cuda_check_implementation at fbcode/caffe2/c10/cuda/CUDAException.cpp:44

which is not terribly useful.

By checking the original diff D39619861 that introduced c10_cuda_check_implementation, it seems the original macro would show the source location correctly but c10_cuda_check_implementation broke it.

This diff will propagate caller source location to c10_cuda_check_implementation to fix the issue.

Test Plan:
CI

Observed desired error message after the change:

CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Device-side assertion tracking was not enabled by user.
Exception raised from operator() at fbcode/sigrid/predictor/aed/AedContainer.cpp:659 (most recent call first):

Note the last line reports actual caller location.

Rollback Plan:

Reviewed By: Raymo111

Differential Revision: D81880552

Summary:
Currently the C10_CUDA_CHECK only shows source location in CUDAException like below:
```
Exception raised from c10_cuda_check_implementation at fbcode/caffe2/c10/cuda/CUDAException.cpp:44
```
which is not terribly useful.

By checking the original diff D39619861 that introduced c10_cuda_check_implementation, it seems the original macro would show the source location correctly but c10_cuda_check_implementation broke it.

This diff will propagate caller source location to c10_cuda_check_implementation to fix the issue.

Test Plan:
CI

Observed desired error message after the change:
```
CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Device-side assertion tracking was not enabled by user.
Exception raised from operator() at fbcode/sigrid/predictor/aed/AedContainer.cpp:659 (most recent call first):
```

Note the last line reports actual caller location.

Rollback Plan:

Reviewed By: Raymo111

Differential Revision: D81880552
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162808

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 37a1255 with merge base 03798b0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

@albanD has exported this pull request. If you are a Meta employee, you can view the originating diff in D81880552.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 12, 2025
@albanD albanD added the topic: not user facing topic category label Sep 12, 2025
@janeyx99 janeyx99 added release notes: cuda release notes category and removed topic: not user facing topic category labels Sep 12, 2025
@albanD albanD removed the release notes: cuda release notes category label Sep 12, 2025
@janeyx99 janeyx99 added the release notes: cuda release notes category label Sep 12, 2025
@albanD albanD added the topic: bug fixes topic category label Sep 12, 2025
@albanD
Copy link
Collaborator Author

albanD commented Sep 12, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@albanD
Copy link
Collaborator Author

albanD commented Sep 12, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2022-cuda12.6-py3 / build

Details for Dev Infra team Raised by workflow job

@albanD
Copy link
Collaborator Author

albanD commented Sep 15, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…2808)

Summary:
Currently the C10_CUDA_CHECK only shows source location in CUDAException like below:
```
Exception raised from c10_cuda_check_implementation at fbcode/caffe2/c10/cuda/CUDAException.cpp:44
```
which is not terribly useful.

By checking the original diff D39619861 that introduced c10_cuda_check_implementation, it seems the original macro would show the source location correctly but c10_cuda_check_implementation broke it.

This diff will propagate caller source location to c10_cuda_check_implementation to fix the issue.

Test Plan:
CI

Observed desired error message after the change:
```
CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Device-side assertion tracking was not enabled by user.
Exception raised from operator() at fbcode/sigrid/predictor/aed/AedContainer.cpp:659 (most recent call first):
```

Note the last line reports actual caller location.

Rollback Plan:

Reviewed By: Raymo111

Differential Revision: D81880552

Pull Request resolved: pytorch#162808
Approved by: https://github.com/janeyx99
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…2808)

Summary:
Currently the C10_CUDA_CHECK only shows source location in CUDAException like below:
```
Exception raised from c10_cuda_check_implementation at fbcode/caffe2/c10/cuda/CUDAException.cpp:44
```
which is not terribly useful.

By checking the original diff D39619861 that introduced c10_cuda_check_implementation, it seems the original macro would show the source location correctly but c10_cuda_check_implementation broke it.

This diff will propagate caller source location to c10_cuda_check_implementation to fix the issue.

Test Plan:
CI

Observed desired error message after the change:
```
CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Device-side assertion tracking was not enabled by user.
Exception raised from operator() at fbcode/sigrid/predictor/aed/AedContainer.cpp:659 (most recent call first):
```

Note the last line reports actual caller location.

Rollback Plan:

Reviewed By: Raymo111

Differential Revision: D81880552

Pull Request resolved: pytorch#162808
Approved by: https://github.com/janeyx99
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…2808)

Summary:
Currently the C10_CUDA_CHECK only shows source location in CUDAException like below:
```
Exception raised from c10_cuda_check_implementation at fbcode/caffe2/c10/cuda/CUDAException.cpp:44
```
which is not terribly useful.

By checking the original diff D39619861 that introduced c10_cuda_check_implementation, it seems the original macro would show the source location correctly but c10_cuda_check_implementation broke it.

This diff will propagate caller source location to c10_cuda_check_implementation to fix the issue.

Test Plan:
CI

Observed desired error message after the change:
```
CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Device-side assertion tracking was not enabled by user.
Exception raised from operator() at fbcode/sigrid/predictor/aed/AedContainer.cpp:659 (most recent call first):
```

Note the last line reports actual caller location.

Rollback Plan:

Reviewed By: Raymo111

Differential Revision: D81880552

Pull Request resolved: pytorch#162808
Approved by: https://github.com/janeyx99
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…2808)

Summary:
Currently the C10_CUDA_CHECK only shows source location in CUDAException like below:
```
Exception raised from c10_cuda_check_implementation at fbcode/caffe2/c10/cuda/CUDAException.cpp:44
```
which is not terribly useful.

By checking the original diff D39619861 that introduced c10_cuda_check_implementation, it seems the original macro would show the source location correctly but c10_cuda_check_implementation broke it.

This diff will propagate caller source location to c10_cuda_check_implementation to fix the issue.

Test Plan:
CI

Observed desired error message after the change:
```
CUDA error: an illegal memory access was encountered
Search for `cudaErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Device-side assertion tracking was not enabled by user.
Exception raised from operator() at fbcode/sigrid/predictor/aed/AedContainer.cpp:659 (most recent call first):
```

Note the last line reports actual caller location.

Rollback Plan:

Reviewed By: Raymo111

Differential Revision: D81880552

Pull Request resolved: pytorch#162808
Approved by: https://github.com/janeyx99
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged meta-exported release notes: cuda release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants