Skip to content

Conversation

@syed-ahmed
Copy link
Collaborator

@syed-ahmed syed-ahmed commented Sep 27, 2022

This PR sets CUDA_MODULE_LOADING if it's not set by the user. By default, it sets it to "LAZY".

It was tested using the following commands:

python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"

which shows a memory usage of: 287,047,680 bytes

vs

CUDA_MODULE_LOADING="DEFAULT" python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"

which shows 666,632,192 bytes.

C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality).

cc: @ptrblck @ngimel @malfet

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 27, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85692

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e7ccb4d:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@syed-ahmed syed-ahmed marked this pull request as ready for review September 29, 2022 20:20
@syed-ahmed
Copy link
Collaborator Author

Some notes:
I discovered that syncing os.eviron with C runtime environment variable related methods (getenv, putenv/setenv) is a tricky thing. The CUDA_MODULE_LOADING environment variable set in the CUDAHooks using setenv works but for it to be visible on the python side, we need to set it in os.environ.

@ptrblck
Copy link
Collaborator

ptrblck commented Sep 30, 2022

The CUDA_MODULE_LOADING environment variable set in the CUDAHooks using setenv works but for it to be visible on the python side, we need to set it in os.environ.

Does this mean that python -m torch.utils.collect_env won't be able to report it in its current implementation?
Would it work if we replace config = os.environ.get('CUDA_MODULE_LOADING', '') with setting/reading an internal flag in addition to setenv("CUDA_MODULE_LOADING", def_value.c_str(), 1);?

@syed-ahmed
Copy link
Collaborator Author

syed-ahmed commented Oct 3, 2022

Does this mean that python -m torch.utils.collect_env won't be able to report it in its current implementation?
Would it work if we replace config = os.environ.get('CUDA_MODULE_LOADING', '') with setting/reading an internal flag in addition to setenv("CUDA_MODULE_LOADING", def_value.c_str(), 1);?

@ptrblck No,python -m torch.utils.collect_env works in this implementation. My comment was just an observation on how setenv can add/update an environment variable in a process and os.environ would not show it (since os.environ is just a copy of the environment before the process started).

To elaborate a bit more, I started with the changes in the CUDAHooks.cpp. Those changes set the environment variable if the user hasn't set it. I then naively tried to read the environment variable in collect_env.py and print it, but that didn't show the value of the variable, even though I could verify (using the snippet above) that the environment variable was being set. So then I realized that os.environ doesn't get updated when setenv is called from the cpp side. So the solution I came up with for this PR is to set the variable in os.environ in torch/cuda/init.py before the __cuda_init() call. The if-else in the changes in CUDAHooks.cpp then becomes redundant when using the python front-end, however, it's still needed if you are using standalone aten (I also manually verified the setting of the variable on the cpp side by not setting the variable in os.environ).

There are other solutions people suggest that use getenv with ctypes to get the environment variable that was set from the cpp side (e.g. https://stackoverflow.com/questions/235435/environment-variables-in-python-on-linux). I tried that in collect_env.py but that didn't work (in linux) and I saw an empty value for the variable even though it was set. So just ended up with simply updating the os.environ and thought usage of ctypes like this is an overkill for this PR (given you'll also have to handle other OS's).

@anjali411 anjali411 requested review from malfet and ngimel October 3, 2022 17:24
@anjali411 anjali411 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 3, 2022
@syed-ahmed
Copy link
Collaborator Author

syed-ahmed commented Oct 3, 2022

@ptrblck Ah, I think I misunderstood your question. Yes, in its current implementation, collect_env.py won't be able to show if the variable was set or not and your suggestion of using an internal variable in addition to the setenv would work. Would that be the preferred approach? My approach in this PR is to just simply add it in the os.environ before cuda lazy init, but may be I'm missing some downsides of it?

I also ran this on a distributed example and each process gets the environment variable set in os.environ in __cuda_init: https://gist.github.com/syed-ahmed/10c07824c63a5a20c69208f18cce6c61 ran with torchrun --nnodes=1 --nproc_per_node=4 ddp_example.py. When specifying CUDA_MODULE_LOADING=DEFAULT before the torchrun, each process prints out DEFAULT instead of LAZY confirming that the variable is overridable by the user.

@ngimel
Copy link
Collaborator

ngimel commented Oct 3, 2022

lgtm, I'll let @malfet to also take a look.

@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: syed-ahmed / name: Syed Tousif Ahmed (990be4c7de9190ac4a568e6daa5c04afbf589ab0, cf92f4af1734fdc92829b13db14c1078f49ecfe7, 7a261a6ab78b45c823ac6f0108baecb8fd2944ce)

@syed-ahmed
Copy link
Collaborator Author

syed-ahmed commented Oct 6, 2022

@malfet I processed the EasyCLA. Please review at your convenience. We were thinking this PR might be good to be included in the 1.13.

@syed-ahmed syed-ahmed changed the base branch from master to release/1.13 October 7, 2022 20:59
@syed-ahmed syed-ahmed changed the title Sets CUDA_MODULE_LOADING to LAZY when not set by the user [1.13] Sets CUDA_MODULE_LOADING to LAZY when not set by the user Oct 7, 2022
@malfet
Copy link
Contributor

malfet commented Oct 7, 2022

Can you please target master and then create a cherry-pick into r1.13
Also, please mention in PR description, that C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality)

@syed-ahmed syed-ahmed changed the base branch from release/1.13 to master October 7, 2022 22:50
@syed-ahmed syed-ahmed changed the title [1.13] Sets CUDA_MODULE_LOADING to LAZY when not set by the user Sets CUDA_MODULE_LOADING to LAZY when not set by the user Oct 7, 2022
@syed-ahmed
Copy link
Collaborator Author

@malfet done. I updated the release tracker with the new PR.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2022
@malfet
Copy link
Contributor

malfet commented Oct 9, 2022

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased cuda-module-loading-patch onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cuda-module-loading-patch && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the cuda-module-loading-patch branch from 7a261a6 to e7ccb4d Compare October 9, 2022 14:22
@malfet
Copy link
Contributor

malfet commented Oct 13, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions
Copy link
Contributor

Hey @syed-ahmed.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants