-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[FSDP][RFC] Enforce rank r's current device is cuda:r
#92035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92035
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit 3e82ae9: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
IIUC, FSDP assumes that rank `r` uses `cuda:r`. In other words, FSDP assumes single-process single-device (SPSD) and CUDA devices only. The SPSD assumption can help us simplify the `compute_device` and `device_id` code since then specifying `device_id` is simply a signal to FSDP whether the user wants FSDP to move parameters on behalf of the user or not. The actual target device is fixed (namely, `cuda:r`). Since the SPSD assumption is deeply embedded into FSDP, I think this is reasonable. We can generalize over non-CUDA devices in the future while still enforcing SPSD. Feel free to let me know your guys' thoughts. If we go with this, we can provide earlier and cleaner error handling in the case the user forgets to set the current CUDA device. See #91661. [ghstack-poisoned]
|
I think the SPSD and CUDA device only assumption is current FSDP state, " we can provide earlier and cleaner error handling in the case the user forgets to set the current CUDA device." sounds great to me, thank you!! |
IIUC, FSDP assumes that rank `r` uses `cuda:r`. In other words, FSDP assumes single-process single-device (SPSD) and CUDA devices only. The SPSD assumption can help us simplify the `compute_device` and `device_id` code since then specifying `device_id` is simply a signal to FSDP whether the user wants FSDP to move parameters on behalf of the user or not. The actual target device is fixed (namely, `cuda:r`). Since the SPSD assumption is deeply embedded into FSDP, I think this is reasonable. We can generalize over non-CUDA devices in the future while still enforcing SPSD. Feel free to let me know your guys' thoughts. If we go with this, we can provide earlier and cleaner error handling in the case the user forgets to set the current CUDA device. See #91661. [ghstack-poisoned]
IIUC, FSDP assumes that rank `r` uses `cuda:r`. In other words, FSDP assumes single-process single-device (SPSD) and CUDA devices only. The SPSD assumption can help us simplify the `compute_device` and `device_id` code since then specifying `device_id` is simply a signal to FSDP whether the user wants FSDP to move parameters on behalf of the user or not. The actual target device is fixed (namely, `cuda:r`). Since the SPSD assumption is deeply embedded into FSDP, I think this is reasonable. We can generalize over non-CUDA devices in the future while still enforcing SPSD. Feel free to let me know your guys' thoughts. If we go with this, we can provide earlier and cleaner error handling in the case the user forgets to set the current CUDA device. See #91661. [ghstack-poisoned]
IIUC, FSDP assumes that rank `r` uses `cuda:r`. In other words, FSDP assumes single-process single-device (SPSD) and CUDA devices only. The SPSD assumption can help us simplify the `compute_device` and `device_id` code since then specifying `device_id` is simply a signal to FSDP whether the user wants FSDP to move parameters on behalf of the user or not. The actual target device is fixed (namely, `cuda:r`). Since the SPSD assumption is deeply embedded into FSDP, I think this is reasonable. We can generalize over non-CUDA devices in the future while still enforcing SPSD. Feel free to let me know your guys' thoughts. If we go with this, we can provide earlier and cleaner error handling in the case the user forgets to set the current CUDA device. See #91661. [ghstack-poisoned]
ghstack-source-id: df22a11 Pull Request resolved: pytorch#92035
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack:
r's current device iscuda:r#92035 [FSDP][RFC] Enforce rankr's current device iscuda:ruse_orig_params=True#91767 [FSDP] Do not clean FQNs even foruse_orig_params=Truedevice_id+ CPU offload test #92031 [FSDP][BE] Improvedevice_id+ CPU offload testprefixed_param_names->fqnsfor consolidation #92028 [FSDP][BE] Renameprefixed_param_names->fqnsfor consolidationIIUC, FSDP assumes that rank
rusescuda:r. In other words, FSDP assumes single-process single-device (SPSD) and CUDA devices only.The SPSD assumption can help us simplify the
compute_deviceanddevice_idcode since then specifyingdevice_idis simply a signal to FSDP whether the user wants FSDP to move parameters on behalf of the user or not. The actual target device is fixed (namely,cuda:r). Since the SPSD assumption is deeply embedded into FSDP, I think this is reasonable. We can generalize over non-CUDA devices in the future while still enforcing SPSD.Feel free to let me know your guys' thoughts.
If we go with this, we can provide earlier and cleaner error handling in the case the user forgets to set the current CUDA device. See #91661.