Skip to content

Reland #161649, vectorize stored in cat for all dtypes#162440

Closed
ngimel wants to merge 9 commits intomainfrom
ngimel/cat_perf2
Closed

Reland #161649, vectorize stored in cat for all dtypes#162440
ngimel wants to merge 9 commits intomainfrom
ngimel/cat_perf2

Conversation

@ngimel
Copy link
Collaborator

@ngimel ngimel commented Sep 9, 2025

Per title

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162440

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 82805d4 with merge base 5fd6b6a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the ci-no-td Do not run TD on this PR label Sep 9, 2025
@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Meta employee, you can view this in D81983680.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 9, 2025
@ngimel ngimel added ciflow/trunk Trigger trunk jobs on your pull request release notes: cuda release notes category and removed ciflow/trunk Trigger trunk jobs on your pull request labels Sep 9, 2025
@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Meta employee, you can view this in D81983680.

// which requires the input tensor addresses to be aligned to a
// 16 Byte boundary.

constexpr bool isContig = stride_size == 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about stride 0 or is that considered not contiguous?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stride_size is the dimension of the strides, stride_size template argument is set to 1 if all input and output tensors return true for .is_contiguous

for (int i = nDims - 1; i >= 0; --i) {
outputParam.tensorSize[i] = out.size(i);
outputParam.tensorStride[i] = out.stride(i);
if (isContig) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be if constexpr

@ngimel
Copy link
Collaborator Author

ngimel commented Sep 18, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
@github-actions github-actions bot deleted the ngimel/cat_perf2 branch October 19, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: cuda release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants