Skip to content

[Inductor][ATen] Fix stride rounding on Blockwise128x128 to accommodate for small shapes#164953

Closed
jananisriram wants to merge 1 commit intomainfrom
export-D84103213
Closed

[Inductor][ATen] Fix stride rounding on Blockwise128x128 to accommodate for small shapes#164953
jananisriram wants to merge 1 commit intomainfrom
export-D84103213

Conversation

@jananisriram
Copy link
Contributor

Summary: Fix rounding issue on Blockwise128x128 to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for test_fp8.py tests as well as test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise tests (GitHub PR).

Test Plan:
test_fp8.py
test_scaled_matmul_cuda.py

Differential Revision: D84103213

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164953

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e52aa0d with merge base d2cb183 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-codesync
Copy link

meta-codesync bot commented Oct 8, 2025

@jananisriram has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84103213.

@jananisriram jananisriram requested a review from slayton58 October 8, 2025 17:50
@jananisriram jananisriram added the topic: not user facing topic category label Oct 8, 2025
jananisriram added a commit that referenced this pull request Oct 8, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 8, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 8, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 8, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 8, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 9, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Differential Revision: D84103213
jananisriram added a commit that referenced this pull request Oct 9, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 9, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Differential Revision: D84103213
Copy link
Contributor

@slayton58 slayton58 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Reviewed By: slayton58

Differential Revision: D84103213
jananisriram added a commit that referenced this pull request Oct 10, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Reviewed By: slayton58

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 10, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Reviewed By: slayton58

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 10, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Reviewed By: slayton58

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 10, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Reviewed By: slayton58

Differential Revision: D84103213
Copy link
Contributor

@slayton58 slayton58 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, verified tests pass clean on H100.

@jananisriram
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

jananisriram added a commit that referenced this pull request Oct 10, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Reviewed By: slayton58

Differential Revision: D84103213
jananisriram added a commit that referenced this pull request Oct 10, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Reviewed By: slayton58

Differential Revision: D84103213
facebook-github-bot pushed a commit that referenced this pull request Oct 10, 2025
…te for small shapes (#164953)

Summary:

Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Reviewed By: slayton58

Differential Revision: D84103213
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
…te for small shapes (pytorch#164953)

Summary: Fix rounding issue on `Blockwise128x128` to accommodate for small shapes. The original implementation rounded all strides to 4, which caused failures for `test_fp8.py` tests as well as `test_scaled_matmul_cuda.py::test_scaled_mm_vs_emulated_block_wise` tests ([GitHub PR](pytorch#164259)).

Test Plan:
`test_fp8.py`
`test_scaled_matmul_cuda.py`

Differential Revision: D84103213

Pull Request resolved: pytorch#164953
Approved by: https://github.com/slayton58, https://github.com/eqy
@github-actions github-actions bot deleted the export-D84103213 branch November 10, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants