add XPU backend to support torch.save and torch.load #89679

guangyey · 2022-11-25T10:00:02Z

Motivate

We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

Solution

We give a design via wrap data as a tensor:

and use an in-place copy for H2D

directly call a tensor.to() for D2H.

This can help us:

unify the generic code for all backends.

support all the non-CPU device backends.

Additional Context

No need more UT.
test/test_serialization.py will cover this code change.

pytorch-bot · 2022-11-25T10:00:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89679

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 97e63b4:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

gujinghui · 2022-11-29T02:27:02Z

@ezyang could you help review this change. Thanks a lot.

ezyang · 2022-11-29T03:37:48Z

I'm OK with the direction (replacing manual memcpy with a dispatched copy). I ask you to go further: there is no need to gate on HIP/XPU/CUDA; we should work for all non-CPU devices this way.

guangyey · 2022-11-29T04:12:15Z

@ezyang Following your comments, I have updated the related code and current PR's comments. Is it good?

ezyang · 2022-11-29T15:19:26Z

torch/csrc/serialization.cpp

This seems a bit more circuitous than is necessary. Since you're doing a dispatched copy, the operator can take care of allocating a CPU tensor for you. So you can just directly convert device_tensor to CPU, and then pull out the data pointer directly, no need to hand allocate cpu_data buffer anymore.

Done. It looks more clear.
@ezyang Thanks, any more comments?

linux-foundation-easycla · 2022-11-30T08:25:44Z

The committers listed above are authorized under a signed CLA.

✅ login: guangyey / name: Yu, Guangye (97e63b4)

ezyang · 2022-11-30T14:34:53Z

@pytorchbot merge

pytorchmergebot · 2022-11-30T14:36:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-11-30T14:36:47Z

Merge failed

Reason: 1 additional jobs have failed, first few of them are: .github/workflows/trunk.yml

Details for Dev Infra team

Raised by workflow job

ezyang · 2022-11-30T18:29:55Z

@pytorchbot merge

pytorchmergebot · 2022-11-30T18:31:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2022-11-30T18:32:02Z

Merge failed

Reason: 1 additional jobs have failed, first few of them are: trunk

Details for Dev Infra team

Raised by workflow job

ezyang · 2022-11-30T20:36:13Z

@pytorchbot merge -f "idk why this failed everything looks good"

pytorchmergebot · 2022-11-30T20:37:58Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivate We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False. # Solution We give a design via wrap data as a tensor: >1. and use an in-place copy for H2D >2. directly call a tensor.to() for D2H. This can help us: >1. unify the generic code for all backends. >2. support all the non-CPU device backends. # Additional Context No need more UT. test/test_serialization.py will cover this code change. Pull Request resolved: pytorch#89679 Approved by: https://github.com/ezyang

We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False. We give a design via wrap data as a tensor: >1. and use an in-place copy for H2D >2. directly call a tensor.to() for D2H. This can help us: >1. unify the generic code for all backends. >2. support all the non-CPU device backends. No need more UT. test/test_serialization.py will cover this code change. Pull Request resolved: pytorch#89679 Approved by: https://github.com/ezyang

pytorchbot added the open source label Nov 25, 2022

albanD requested a review from ezyang November 29, 2022 14:22

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 29, 2022

ezyang reviewed Nov 29, 2022

View reviewed changes

repush commit

97e63b4

ezyang approved these changes Nov 30, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 30, 2022

pytorchmergebot added the Merged label Nov 30, 2022

pytorchmergebot closed this in 4144ad1 Nov 30, 2022

guangyey deleted the guangyey/save_load branch December 1, 2022 10:01

malfet mentioned this pull request Feb 14, 2023

writeFileRaw: Device not recognized: mps #87839

Closed

add XPU backend to support torch.save and torch.load #89679

add XPU backend to support torch.save and torch.load #89679

Uh oh!

Conversation

guangyey commented Nov 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivate

Solution

Additional Context

Uh oh!

pytorch-bot bot commented Nov 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89679

✅ No Failures

Uh oh!

gujinghui commented Nov 29, 2022

Uh oh!

ezyang commented Nov 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guangyey commented Nov 29, 2022

Uh oh!

ezyang Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

guangyey Nov 30, 2022

Choose a reason for hiding this comment

Uh oh!

linux-foundation-easycla bot commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ezyang commented Nov 30, 2022

Uh oh!

pytorchmergebot commented Nov 30, 2022

Merge started

Uh oh!

pytorchmergebot commented Nov 30, 2022

Merge failed

Uh oh!

ezyang commented Nov 30, 2022

Uh oh!

pytorchmergebot commented Nov 30, 2022

Merge started

Uh oh!

pytorchmergebot commented Nov 30, 2022

Merge failed

Uh oh!

ezyang commented Nov 30, 2022

Uh oh!

pytorchmergebot commented Nov 30, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

guangyey commented Nov 25, 2022 •

edited

Loading

pytorch-bot bot commented Nov 25, 2022 •

edited

Loading

ezyang commented Nov 29, 2022 •

edited

Loading

linux-foundation-easycla bot commented Nov 30, 2022 •

edited

Loading