Skip to content

Conversation

@qqaatw
Copy link
Collaborator

@qqaatw qqaatw commented Dec 18, 2022

Previously, the "can slice" flag in Placeholder constructor in OperationUtils.mm is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims.

For example, if we want to stack y_mps and x_mps on the last dim:

t_mps = torch.tensor([1, 2, 3, 4], device="mps")
x_mps = t_mps[2:]  # [3, 4]
y_mps = t_mps[:2]  # [1, 2]

res_mps = torch.stack((y_mps, x_mps), dim=-1)

the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to:

res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1)

x_mps.unsqueeze(-1) is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage.

Fixes #87856
Fixes #91065

cc @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev

@qqaatw qqaatw requested a review from kulinseth as a code owner December 18, 2022 04:31
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 18, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91071

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1a8786b:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Dec 18, 2022
@qqaatw qqaatw changed the title [MPS] Fix tensor with storage offset graph gathering [MPS] Fix tensor with non-zero storage offset graph gathering Dec 18, 2022
@qqaatw
Copy link
Collaborator Author

qqaatw commented Dec 19, 2022

@pytorchbot label "module: mps"

@pytorch-bot pytorch-bot bot added the module: mps Related to Apple Metal Performance Shaders framework label Dec 19, 2022
@qqaatw qqaatw force-pushed the fix_view_gather_logic branch from ffd155b to f64db8c Compare December 19, 2022 11:28
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR contains unrelated changes as well as not much of an explanation as to why it is necessary to make a copy of the tensor view.

I.e. it probably fixes the symptom, but not the fundamental underlying problem.

res_mps = x_mps <= y_mps
res_cpu = x_cpu <= y_cpu
if operator == "<":
elif operator == "<":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated to change in question, is it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes not much related, but I think this improvement is trivial and small enough to not cause a reading difficulty. I feel it would be more costly and unnecessary to improve this in a separate PR, wouldn't it?

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 21, 2022
qqaatw and others added 7 commits February 15, 2023 01:36
Fixes backward pass for bilinear.

Summary of changes:
- bilinear op is able to produce **contiguous, non-view** tensors with a storage offset, such as: shape=`[1, 1, 1, 1]`, `storage_offset=12`. This seems a weird case, but it is valid, and for these type of tensors we wouldn't be able to gather/scatter since we look at the view flag (which is not set here). This change looks into `storage_offset` only rather than the is_view flag which is not being set
- **reduction sum** must return a zeroed out output if passing an input with 0 elements (e.g a shape of (0, 5)).
Pull Request resolved: pytorch#94892
Approved by: https://github.com/kulinseth
@qqaatw qqaatw force-pushed the fix_view_gather_logic branch from f64db8c to 70fb582 Compare February 16, 2023 10:40
@qqaatw
Copy link
Collaborator Author

qqaatw commented Feb 16, 2023

@pytorchbot label "ciflow/trunk"

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 16, 2023
@qqaatw
Copy link
Collaborator Author

qqaatw commented Feb 16, 2023

@pytorchbot label "keep-going"

@pytorch-bot pytorch-bot bot added the keep-going Don't stop on first failure, keep running tests until the end label Feb 16, 2023
if (!mpsShape) {
mpsShape = getMPSShape(_tensor);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is purposely an indentation fix for if (!mpsShape) { block.

@kulinseth
Copy link
Collaborator

Modulo the nit, changes look good.

@qqaatw
Copy link
Collaborator Author

qqaatw commented Feb 17, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-x86-64 / test (default, 1, 3, macos-12)

Details for Dev Infra team Raised by workflow job

@qqaatw
Copy link
Collaborator Author

qqaatw commented Feb 17, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Feb 22, 2023
…h#91071)

Previously, the "can slice" flag in Placeholder constructor in `OperationUtils.mm` is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims.

For example, if we want to stack `y_mps` and `x_mps` on the last dim:
```
t_mps = torch.tensor([1, 2, 3, 4], device="mps")
x_mps = t_mps[2:]  # [3, 4]
y_mps = t_mps[:2]  # [1, 2]

res_mps = torch.stack((y_mps, x_mps), dim=-1)
```

the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to:

```
res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1)
```

`x_mps.unsqueeze(-1)` is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage.

Fixes pytorch#87856
Fixes pytorch#91065

Pull Request resolved: pytorch#91071
Approved by: https://github.com/kulinseth
This was referenced Feb 22, 2023
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Feb 23, 2023
…h#91071)

Previously, the "can slice" flag in Placeholder constructor in `OperationUtils.mm` is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims.

For example, if we want to stack `y_mps` and `x_mps` on the last dim:
```
t_mps = torch.tensor([1, 2, 3, 4], device="mps")
x_mps = t_mps[2:]  # [3, 4]
y_mps = t_mps[:2]  # [1, 2]

res_mps = torch.stack((y_mps, x_mps), dim=-1)
```

the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to:

```
res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1)
```

`x_mps.unsqueeze(-1)` is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage.

Fixes pytorch#87856
Fixes pytorch#91065

Pull Request resolved: pytorch#91071
Approved by: https://github.com/kulinseth
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Feb 23, 2023
…h#91071)

Previously, the "can slice" flag in Placeholder constructor in `OperationUtils.mm` is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims.

For example, if we want to stack `y_mps` and `x_mps` on the last dim:
```
t_mps = torch.tensor([1, 2, 3, 4], device="mps")
x_mps = t_mps[2:]  # [3, 4]
y_mps = t_mps[:2]  # [1, 2]

res_mps = torch.stack((y_mps, x_mps), dim=-1)
```

the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to:

```
res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1)
```

`x_mps.unsqueeze(-1)` is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage.

Fixes pytorch#87856
Fixes pytorch#91065

Pull Request resolved: pytorch#91071
Approved by: https://github.com/kulinseth
atalman pushed a commit that referenced this pull request Feb 24, 2023
* [MPS] Fix the uint8 type issue with View ops kernels (#95145)

This should fix the problem in Resnet model with image artifacts due to saturation on int8 type and also the incorrect class recognition reported in #86954.

Fixes #86954

Pull Request resolved: #95145
Approved by: https://github.com/kulinseth, https://github.com/DenisVieriu97

* [MPS] Fix tensor with non-zero storage offset graph gathering (#91071)

Previously, the "can slice" flag in Placeholder constructor in `OperationUtils.mm` is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims.

For example, if we want to stack `y_mps` and `x_mps` on the last dim:
```
t_mps = torch.tensor([1, 2, 3, 4], device="mps")
x_mps = t_mps[2:]  # [3, 4]
y_mps = t_mps[:2]  # [1, 2]

res_mps = torch.stack((y_mps, x_mps), dim=-1)
```

the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to:

```
res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1)
```

`x_mps.unsqueeze(-1)` is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage.

Fixes #87856
Fixes #91065

Pull Request resolved: #91071
Approved by: https://github.com/kulinseth

* [MPS] Fix fill_ where input tensor has a storage offset (#95113)

Fixes #94390

Apart from fixing the issue above, this PR also fixes a bug that when an input tensor can be sliced, a sliced array view is created. This array view seems to be not writable or have a different storage from the original tensor, causing incorrect results with the in-place `fill`.
Pull Request resolved: #95113
Approved by: https://github.com/kulinseth

* [MPS] Fix view op slicing for 2nd dim in case of 0 offset (#95381)

* Fix view op slicing for 2nd dim in case of 0 offset

Pull Request resolved: #95381
Approved by: https://github.com/razarmehr

---------

Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
Co-authored-by: Li-Huai (Allan) Lin <qqaatw@gmail.com>
Co-authored-by: Denis Vieriu <104024078+DenisVieriu97@users.noreply.github.com>
pruthvistony added a commit to ROCm/pytorch that referenced this pull request May 2, 2023
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request May 3, 2023
* [MPS] Fix the uint8 type issue with View ops kernels (pytorch#95145)

This should fix the problem in Resnet model with image artifacts due to saturation on int8 type and also the incorrect class recognition reported in pytorch#86954.

Fixes pytorch#86954

Pull Request resolved: pytorch#95145
Approved by: https://github.com/kulinseth, https://github.com/DenisVieriu97

* [MPS] Fix tensor with non-zero storage offset graph gathering (pytorch#91071)

Previously, the "can slice" flag in Placeholder constructor in `OperationUtils.mm` is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims.

For example, if we want to stack `y_mps` and `x_mps` on the last dim:
```
t_mps = torch.tensor([1, 2, 3, 4], device="mps")
x_mps = t_mps[2:]  # [3, 4]
y_mps = t_mps[:2]  # [1, 2]

res_mps = torch.stack((y_mps, x_mps), dim=-1)
```

the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to:

```
res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1)
```

`x_mps.unsqueeze(-1)` is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage.

Fixes pytorch#87856
Fixes pytorch#91065

Pull Request resolved: pytorch#91071
Approved by: https://github.com/kulinseth

* [MPS] Fix fill_ where input tensor has a storage offset (pytorch#95113)

Fixes pytorch#94390

Apart from fixing the issue above, this PR also fixes a bug that when an input tensor can be sliced, a sliced array view is created. This array view seems to be not writable or have a different storage from the original tensor, causing incorrect results with the in-place `fill`.
Pull Request resolved: pytorch#95113
Approved by: https://github.com/kulinseth

* [MPS] Fix view op slicing for 2nd dim in case of 0 offset (pytorch#95381)

* Fix view op slicing for 2nd dim in case of 0 offset

Pull Request resolved: pytorch#95381
Approved by: https://github.com/razarmehr

---------

Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
Co-authored-by: Li-Huai (Allan) Lin <qqaatw@gmail.com>
Co-authored-by: Denis Vieriu <104024078+DenisVieriu97@users.noreply.github.com>
jhavukainen pushed a commit to kulinseth/pytorch that referenced this pull request Mar 15, 2024
…h#91071)

Previously, the "can slice" flag in Placeholder constructor in `OperationUtils.mm` is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims.

For example, if we want to stack `y_mps` and `x_mps` on the last dim:
```
t_mps = torch.tensor([1, 2, 3, 4], device="mps")
x_mps = t_mps[2:]  # [3, 4]
y_mps = t_mps[:2]  # [1, 2]

res_mps = torch.stack((y_mps, x_mps), dim=-1)
```

the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to:

```
res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1)
```

`x_mps.unsqueeze(-1)` is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage.

Fixes pytorch#87856
Fixes pytorch#91065

Pull Request resolved: pytorch#91071
Approved by: https://github.com/kulinseth
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: mps Related to Apple Metal Performance Shaders framework open source release notes: mps Release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MPS] Incorrect results of two multiplied chunked tensors produced by unsafe_chunk torch.stack gives wrong results on MPS

7 participants