-
Notifications
You must be signed in to change notification settings - Fork 26.3k
adds list_gpu_processes function #44616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 9c069ee (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| handling out-of-memory exceptions. | ||
| Arguments: | ||
| device (torch.device or int, optional): selected device. Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this could accept a string, too.
Line 8 in 8d570bc
| def _get_device_index(device: Union[Device, str, int], optional: bool = False, |
torch/cuda/memory.py
Outdated
| """ | ||
|
|
||
| try: | ||
| import pynvml # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lint failure is real
mruberry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: per title, to make it easier to track the creation of stray contexts: ``` python -c "import torch; a=torch.randn(1, device='cuda'); print(torch.cuda.memory.list_gpu_processes(0)); print(torch.cuda.memory.list_gpu_processes(1))" GPU:0 process 79749 uses 601.000 MB GPU memory GPU:1 no processes are running ``` Pull Request resolved: #44616 Reviewed By: mruberry Differential Revision: D23675739 Pulled By: ngimel fbshipit-source-id: ffa14cad9d7144e883de13b1c2c6817bd432f53a
per title, to make it easier to track the creation of stray contexts: