-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Sets CUDA_MODULE_LOADING to LAZY when not set by the user #85692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85692
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e7ccb4d: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Some notes: |
Does this mean that |
@ptrblck To elaborate a bit more, I started with the changes in the CUDAHooks.cpp. Those changes set the environment variable if the user hasn't set it. I then naively tried to read the environment variable in collect_env.py and print it, but that didn't show the value of the variable, even though I could verify (using the snippet above) that the environment variable was being set. So then I realized that os.environ doesn't get updated when setenv is called from the cpp side. So the solution I came up with for this PR is to set the variable in os.environ in torch/cuda/init.py before the __cuda_init() call. The if-else in the changes in CUDAHooks.cpp then becomes redundant when using the python front-end, however, it's still needed if you are using standalone aten (I also manually verified the setting of the variable on the cpp side by not setting the variable in os.environ). There are other solutions people suggest that use getenv with ctypes to get the environment variable that was set from the cpp side (e.g. https://stackoverflow.com/questions/235435/environment-variables-in-python-on-linux). I tried that in collect_env.py but that didn't work (in linux) and I saw an empty value for the variable even though it was set. So just ended up with simply updating the os.environ and thought usage of ctypes like this is an overkill for this PR (given you'll also have to handle other OS's). |
|
@ptrblck Ah, I think I misunderstood your question. Yes, in its current implementation, collect_env.py won't be able to show if the variable was set or not and your suggestion of using an internal variable in addition to the setenv would work. Would that be the preferred approach? My approach in this PR is to just simply add it in the os.environ before cuda lazy init, but may be I'm missing some downsides of it? I also ran this on a distributed example and each process gets the environment variable set in os.environ in __cuda_init: https://gist.github.com/syed-ahmed/10c07824c63a5a20c69208f18cce6c61 ran with |
|
lgtm, I'll let @malfet to also take a look. |
|
/easycla As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details. This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign. |
|
@malfet I processed the EasyCLA. Please review at your convenience. We were thinking this PR might be good to be included in the 1.13. |
|
Can you please target master and then create a cherry-pick into r1.13 |
|
@malfet done. I updated the release tracker with the new PR. |
|
@pytorchbot rebase |
|
@pytorchbot successfully started a rebase job. Check the current status here |
|
Successfully rebased |
7a261a6 to
e7ccb4d
Compare
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
Hey @syed-ahmed. |
This PR sets CUDA_MODULE_LOADING if it's not set by the user. By default, it sets it to "LAZY".
It was tested using the following commands:
which shows a memory usage of: 287,047,680 bytes
vs
which shows 666,632,192 bytes.
C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality).
cc: @ptrblck @ngimel @malfet