Sets CUDA_MODULE_LOADING to LAZY when not set by the user #85692

syed-ahmed · 2022-09-27T05:59:24Z

This PR sets CUDA_MODULE_LOADING if it's not set by the user. By default, it sets it to "LAZY".

It was tested using the following commands:

python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"

which shows a memory usage of: 287,047,680 bytes

vs

CUDA_MODULE_LOADING="DEFAULT" python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"

which shows 666,632,192 bytes.

C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality).

cc: @ptrblck @ngimel @malfet

pytorch-bot · 2022-09-27T05:59:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85692

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e7ccb4d:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

syed-ahmed · 2022-09-29T20:57:54Z

Some notes:
I discovered that syncing os.eviron with C runtime environment variable related methods (getenv, putenv/setenv) is a tricky thing. The CUDA_MODULE_LOADING environment variable set in the CUDAHooks using setenv works but for it to be visible on the python side, we need to set it in os.environ.

ptrblck · 2022-09-30T04:26:41Z

The CUDA_MODULE_LOADING environment variable set in the CUDAHooks using setenv works but for it to be visible on the python side, we need to set it in os.environ.

Does this mean that python -m torch.utils.collect_env won't be able to report it in its current implementation?
Would it work if we replace config = os.environ.get('CUDA_MODULE_LOADING', '') with setting/reading an internal flag in addition to setenv("CUDA_MODULE_LOADING", def_value.c_str(), 1);?

syed-ahmed · 2022-10-03T05:45:21Z

Does this mean that python -m torch.utils.collect_env won't be able to report it in its current implementation?
Would it work if we replace config = os.environ.get('CUDA_MODULE_LOADING', '') with setting/reading an internal flag in addition to setenv("CUDA_MODULE_LOADING", def_value.c_str(), 1);?

@ptrblck ~~No,~~python -m torch.utils.collect_env works in this implementation. My comment was just an observation on how setenv can add/update an environment variable in a process and os.environ would not show it (since os.environ is just a copy of the environment before the process started).

To elaborate a bit more, I started with the changes in the CUDAHooks.cpp. Those changes set the environment variable if the user hasn't set it. I then naively tried to read the environment variable in collect_env.py and print it, but that didn't show the value of the variable, even though I could verify (using the snippet above) that the environment variable was being set. So then I realized that os.environ doesn't get updated when setenv is called from the cpp side. So the solution I came up with for this PR is to set the variable in os.environ in torch/cuda/init.py before the __cuda_init() call. The if-else in the changes in CUDAHooks.cpp then becomes redundant when using the python front-end, however, it's still needed if you are using standalone aten (I also manually verified the setting of the variable on the cpp side by not setting the variable in os.environ).

There are other solutions people suggest that use getenv with ctypes to get the environment variable that was set from the cpp side (e.g. https://stackoverflow.com/questions/235435/environment-variables-in-python-on-linux). I tried that in collect_env.py but that didn't work (in linux) and I saw an empty value for the variable even though it was set. So just ended up with simply updating the os.environ and thought usage of ctypes like this is an overkill for this PR (given you'll also have to handle other OS's).

syed-ahmed · 2022-10-03T21:13:45Z

@ptrblck Ah, I think I misunderstood your question. Yes, in its current implementation, collect_env.py won't be able to show if the variable was set or not and your suggestion of using an internal variable in addition to the setenv would work. Would that be the preferred approach? My approach in this PR is to just simply add it in the os.environ before cuda lazy init, but may be I'm missing some downsides of it?

I also ran this on a distributed example and each process gets the environment variable set in os.environ in __cuda_init: https://gist.github.com/syed-ahmed/10c07824c63a5a20c69208f18cce6c61 ran with torchrun --nnodes=1 --nproc_per_node=4 ddp_example.py. When specifying CUDA_MODULE_LOADING=DEFAULT before the torchrun, each process prints out DEFAULT instead of LAZY confirming that the variable is overridable by the user.

ngimel · 2022-10-03T21:14:55Z

lgtm, I'll let @malfet to also take a look.

facebook-github-bot · 2022-10-04T00:11:40Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T00:11:43Z

The committers listed above are authorized under a signed CLA.

✅ login: syed-ahmed / name: Syed Tousif Ahmed (990be4c7de9190ac4a568e6daa5c04afbf589ab0, cf92f4af1734fdc92829b13db14c1078f49ecfe7, 7a261a6ab78b45c823ac6f0108baecb8fd2944ce)

syed-ahmed · 2022-10-06T16:58:51Z

@malfet I processed the EasyCLA. Please review at your convenience. We were thinking this PR might be good to be included in the 1.13.

malfet · 2022-10-07T21:43:01Z

Can you please target master and then create a cherry-pick into r1.13
Also, please mention in PR description, that C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality)

syed-ahmed · 2022-10-07T23:04:27Z

@malfet done. I updated the release tracker with the new PR.

malfet · 2022-10-09T14:20:33Z

@pytorchbot rebase

pytorchmergebot · 2022-10-09T14:22:24Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2022-10-09T14:22:29Z

Successfully rebased cuda-module-loading-patch onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda-module-loading-patch && git pull --rebase)

malfet · 2022-10-13T14:01:25Z

@pytorchbot merge

pytorchmergebot · 2022-10-13T14:02:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

github-actions · 2022-10-13T14:04:00Z

Hey @syed-ahmed.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot added the cla signed label Sep 27, 2022

pytorchbot added the open source label Sep 27, 2022

syed-ahmed marked this pull request as ready for review September 29, 2022 20:20

anjali411 requested review from malfet and ngimel October 3, 2022 17:24

anjali411 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 3, 2022

syed-ahmed changed the base branch from master to release/1.13 October 7, 2022 20:59

syed-ahmed changed the title ~~Sets CUDA_MODULE_LOADING to LAZY when not set by the user~~ [1.13] Sets CUDA_MODULE_LOADING to LAZY when not set by the user Oct 7, 2022

syed-ahmed mentioned this pull request Oct 7, 2022

[v.1.13.0] Release Tracker #86312

Closed

syed-ahmed changed the base branch from release/1.13 to master October 7, 2022 22:50

syed-ahmed changed the title ~~[1.13] Sets CUDA_MODULE_LOADING to LAZY when not set by the user~~ Sets CUDA_MODULE_LOADING to LAZY when not set by the user Oct 7, 2022

malfet approved these changes Oct 9, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2022

syed-ahmed added 3 commits October 9, 2022 14:22

Enables lazy loading of cuda modules if not set by user

b2d9df1

Prints module loading var with collect env

1109ffb

Adds module loading var in the pretty print

e7ccb4d

pytorchmergebot force-pushed the cuda-module-loading-patch branch from 7a261a6 to e7ccb4d Compare October 9, 2022 14:22

pytorchmergebot added the Merged label Oct 13, 2022

pytorchmergebot closed this in 77d94ac Oct 13, 2022

atalman mentioned this pull request Feb 10, 2023

Preloads more nvidia pypi library for multi arch distributions #94355

Closed

Sets CUDA_MODULE_LOADING to LAZY when not set by the user #85692

Sets CUDA_MODULE_LOADING to LAZY when not set by the user #85692

Uh oh!

Conversation

syed-ahmed commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85692

✅ No Failures

Uh oh!

syed-ahmed commented Sep 29, 2022

Uh oh!

ptrblck commented Sep 30, 2022

Uh oh!

syed-ahmed commented Oct 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

syed-ahmed commented Oct 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Oct 3, 2022

Uh oh!

facebook-github-bot commented Oct 4, 2022

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

syed-ahmed commented Oct 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet commented Oct 7, 2022

Uh oh!

syed-ahmed commented Oct 7, 2022

Uh oh!

malfet commented Oct 9, 2022

Uh oh!

pytorchmergebot commented Oct 9, 2022

Uh oh!

pytorchmergebot commented Oct 9, 2022

Uh oh!

malfet commented Oct 13, 2022

Uh oh!

pytorchmergebot commented Oct 13, 2022

Merge started

Uh oh!

github-actions bot commented Oct 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

syed-ahmed commented Sep 27, 2022 •

edited

Loading

pytorch-bot bot commented Sep 27, 2022 •

edited

Loading

syed-ahmed commented Oct 3, 2022 •

edited

Loading

syed-ahmed commented Oct 3, 2022 •

edited

Loading

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

Loading

syed-ahmed commented Oct 6, 2022 •

edited

Loading