Skip to content

Conversation

@guangyey
Copy link
Collaborator

@guangyey guangyey commented Nov 25, 2022

Motivate

We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

Solution

We give a design via wrap data as a tensor:

  1. and use an in-place copy for H2D
  2. directly call a tensor.to() for D2H.

This can help us:

  1. unify the generic code for all backends.
  2. support all the non-CPU device backends.

Additional Context

No need more UT.
test/test_serialization.py will cover this code change.

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 25, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89679

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 97e63b4:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@gujinghui
Copy link
Collaborator

@ezyang could you help review this change. Thanks a lot.

@ezyang
Copy link
Contributor

ezyang commented Nov 29, 2022

I'm OK with the direction (replacing manual memcpy with a dispatched copy). I ask you to go further: there is no need to gate on HIP/XPU/CUDA; we should work for all non-CPU devices this way.

@guangyey
Copy link
Collaborator Author

@ezyang Following your comments, I have updated the related code and current PR's comments. Is it good?

@albanD albanD requested a review from ezyang November 29, 2022 14:22
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 29, 2022
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit more circuitous than is necessary. Since you're doing a dispatched copy, the operator can take care of allocating a CPU tensor for you. So you can just directly convert device_tensor to CPU, and then pull out the data pointer directly, no need to hand allocate cpu_data buffer anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It looks more clear.
@ezyang Thanks, any more comments?

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 30, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: guangyey / name: Yu, Guangye (97e63b4)

@ezyang
Copy link
Contributor

ezyang commented Nov 30, 2022

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 30, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 additional jobs have failed, first few of them are: .github/workflows/trunk.yml

Details for Dev Infra team Raised by workflow job

@ezyang
Copy link
Contributor

ezyang commented Nov 30, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 additional jobs have failed, first few of them are: trunk

Details for Dev Infra team Raised by workflow job

@ezyang
Copy link
Contributor

ezyang commented Nov 30, 2022

@pytorchbot merge -f "idk why this failed everything looks good"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@guangyey guangyey deleted the guangyey/save_load branch December 1, 2022 10:01
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
# Motivate
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

# Solution
We give a design via wrap data as a tensor:
>1. and use an in-place copy for H2D
>2. directly call a tensor.to() for D2H.

This can help us:
>1. unify the generic code for all backends.
>2. support all the non-CPU device backends.

# Additional Context
No need more UT.
test/test_serialization.py will cover this code change.

Pull Request resolved: pytorch#89679
Approved by: https://github.com/ezyang
neggles pushed a commit to neggles/pytorch that referenced this pull request Mar 9, 2023
We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False.

We give a design via wrap data as a tensor:
>1. and use an in-place copy for H2D
>2. directly call a tensor.to() for D2H.

This can help us:
>1. unify the generic code for all backends.
>2. support all the non-CPU device backends.

No need more UT.
test/test_serialization.py will cover this code change.

Pull Request resolved: pytorch#89679
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants