Skip to content

Conversation

@razarmehr
Copy link
Collaborator

@razarmehr razarmehr commented Feb 8, 2023

  • This PR is a prerequisite for the upcoming Memory Leak Detection PR.
  • Enable global manual seeding via torch.manual_seed() + test case
  • Add torch.mps.synchronize() to wait for MPS stream to finish + test case
  • Enable the following python interfaces for MPS:
    torch.mps.[get_rng_state(), set_rng_state(), synchronize(), manual_seed(), seed()]
  • Added some test cases in test_mps.py
  • Added mps.rst to document the torch.mps module.
  • Fixed the failure with test_public_bindings.py

Description of new files added:

  • torch/csrc/mps/Module.cpp: implements torch._C module functions for torch.mps and torch.backends.mps.
  • torch/mps/__init__.py: implements Python bindings for torch.mps module.

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94417

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 2ba30c2:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Feb 8, 2023
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 8, 2023
@razarmehr
Copy link
Collaborator Author

I'm working on a fix to address CI failures on other backends caused by this PR.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some parts of the code is re-exporting MPS properties which are already available in torch.backends.mps package and also copy-n-pastes anti-patterns from old days of CUDA.(like lazy-init)
Please request re-review when this feedback is addressed.

@razarmehr razarmehr force-pushed the MPS_Binding_Module branch 2 times, most recently from 5c1d31f to 60437cd Compare February 9, 2023 03:48
@razarmehr razarmehr requested review from albanD and malfet February 9, 2023 20:47
@albanD
Copy link
Collaborator

albanD commented Feb 10, 2023

Sounds pretty good. Thanks for the changes. Only small things.

@razarmehr razarmehr requested a review from albanD February 10, 2023 17:26
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!
Thanks for the updates.

@razarmehr
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #94417, but it was already up to date

@razarmehr
Copy link
Collaborator Author

The dynamo CI failure is unrelated to this PR and is due to following failed issue #89395.

FAIL [0.001s]: test_coalesce_reference_cycle_cpu_float64 (__main__.TestSparseCPU)

@razarmehr
Copy link
Collaborator Author

@pytorchbot merge -f "Lint and MPS tests are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

- Enable global manual seeding via torch.manual_seed() + test case
- Add torch.mps.synchronize() to wait for MPS stream to finish + test case
- Enable the following python interfaces for MPS:
torch.mps.get_rng_state()
torch.mps.set_rng_state()
torch.mps.is_available()
torch.mps.synchronize()
torch.mps.manual_seed()
torch.mps.seed()
torch.mps.is_initialized()
torch.mps.init()
- This patch fixes a regression caused by recent MPS module interface. Since we now compile torch._C.is_mps_available()
only if USE_MPS is defined, then it may cause failures on CUDA (and other devices when USE_MPS is not defined) if we upstream.
So, this patch checks if is_mps_available is implemented first and then calls it.
- Also use the unique name `default_mps_generator` to avoid conflicts with CPU default generator
- Removed USE_MPS condition when adding python binding definitions
- Replaced the global attribute setting of default_mps_generator with torch.C_._mps_get_default_generator()
- Removed redundant hasattr() checks
- Error out in MPSHooks::isOnMacOS13orNewer() if MPS not available
- Removed is_available() and is_macos13_or_newer() from torch.mps (already exists in torch.backends.mps)
This should fix the failure with test_public_bindings.py
- Remove import torch.mps from test_mps
- Add torch::mps namespace
@pytorchmergebot
Copy link
Collaborator

Successfully rebased MPS_Binding_Module onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout MPS_Binding_Module && git pull --rebase)

@razarmehr
Copy link
Collaborator Author

Thanks @huydhn. The commit 2ba30c2 should fix the problem with the bad forking problem. I tested with pytest -k test_fd_limit_exceeded test_dataloader.py, and it's all good now.
Since that dataloader test isn't part of the CI checks on the PR, I will re-merge the PR with that fix (after all CIs are green), and will monitor PyTorch HUD for any regressions.

@huydhn huydhn added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 12, 2023
@huydhn
Copy link
Contributor

huydhn commented Feb 12, 2023

You can add ciflow/trunk and it will run all MacOS tests on trunk in the PR for you, including the failed test above. It's also done automatically if the PR is merged normally with @pytorchbot merge instead of force merging

@razarmehr
Copy link
Collaborator Author

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Copy link
Collaborator

@kulinseth kulinseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@kulinseth kulinseth reopened this Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: mps Release notes category Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants