Skip to content

Conversation

@Aidyn-A
Copy link
Collaborator

@Aidyn-A Aidyn-A commented Nov 9, 2022

Fixes #87894

This PR adds a warning if captured graph is empty (consists of zero nodes).
The example snippet where would it be useful:

import torch

x = torch.randn(10)
z = torch.zeros(10)

g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g):
    z = x * x
# Warn user

and in #87894

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 9, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88754

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 57860ee:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@albanD albanD requested a review from mcarilli November 10, 2022 17:30
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 10, 2022
TEST_WITH_ROCM or
int(torch.version.cuda.split(".")[0]) < 11, "CUDA >= 11.0 required for graphs")
def test_graph_warn_if_has_zero_nodes(self):
with warnings.catch_warnings(record=True) as caught:
Copy link
Collaborator Author

@Aidyn-A Aidyn-A Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is having issues with catching the warnings. @jjsjann123 pointed out that the possible cause is the call_guard in pybind11 https://github.com/pytorch/pytorch/blob/master/torch/csrc/cuda/Graph.cpp#L31-L39

Any ideas on how to catch this warning?
cc @ezyang @ngimel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the wrapper that clears the warning queue into Python. HANDLE_TH_ERRORS and END_HANDLE_TH_ERRORS_PYBIND check torch/csrc/README.md

cc @peterbell10

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest torch::wrap_pybind_function but it didn't work with member functions, so I've added that support and wrapped the CUDA graph functions in #88932. So, if you rebase it should start catching the warnings.

Copy link
Collaborator Author

@Aidyn-A Aidyn-A Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peterbell10,
The test is passing now!

@ezyang
Copy link
Contributor

ezyang commented Nov 28, 2022

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 28, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Fixes pytorch#87894

This PR adds a warning if captured graph is empty (consists of zero nodes).
The example snippet where would it be useful:

```python
import torch

x = torch.randn(10)
z = torch.zeros(10)

g = torch.cuda.CUDAGraph()
with torch.cuda.graph(g):
    z = x * x
# Warn user
```

and in pytorch#87894

Pull Request resolved: pytorch#88754
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CUDAGraph] Silent failure when graphs capture attempted on wrong device

6 participants