Fix int32 overflow in embedding_dense_backward by yinghai · Pull Request #165095 · pytorch/pytorch

yinghai · 2025-10-09T19:34:48Z

If max_partial_segment is large we can overflow gid and cause a bunch of IMA.

pytorch-bot · 2025-10-09T19:34:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165095

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b7fb34d with merge base a2f29bc ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ngimel · 2025-10-09T19:35:49Z

Test?

ngimel · 2025-10-09T20:11:36Z

test/nn/test_embedding.py

                )

+    @onlyCUDA
+    @dtypes(torch.bfloat16,)


decorate is with @largeTensorTest to avoid ooms

eqy · 2025-10-09T23:37:45Z

aten/src/ATen/native/cuda/EmbeddingBackwardKernel.cu

-  const int gid = blockIdx.x * blockDim.x + threadIdx.x;
-  const int id = gid / stride_warped;
-  const int startFeature = gid % stride_warped;
+  const int64_t gid = blockIdx.x * blockDim.x + threadIdx.x;


One of these values needs to be cast to 64-bit, otherwise the math is still done in 32bit
see e.g., #142010 #164705

thanks updated.

ugh interesting that the test isn't catching it

ngimel · 2025-10-10T19:40:13Z

@pytorchbot merge

pytorchmergebot · 2025-10-10T19:42:02Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

If `max_partial_segment` is large we can overflow `gid` and cause a bunch of IMA. Pull Request resolved: pytorch#165095 Approved by: https://github.com/ngimel, https://github.com/eqy

fix

f8ab826

yinghai requested review from Aidyn-A, eqy and syed-ahmed as code owners October 9, 2025 19:34

pytorch-bot bot added the release notes: cuda release notes category label Oct 9, 2025

yinghai changed the title ~~FIx int32 overflow in embedding_dense_backward~~ Fix int32 overflow in embedding_dense_backward Oct 9, 2025

pytorchbot added the open source label Oct 9, 2025

test

8e8d13e

ngimel approved these changes Oct 9, 2025

View reviewed changes

ngimel added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2025

comment

aa4e1ea

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2025

Aidyn-A added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2025

indent

9a7dfe9

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2025

eqy requested changes Oct 9, 2025

View reviewed changes

yinghai added 3 commits October 9, 2025 23:40

comment

777cb40

lint

1a960ef

lint

e319bb6

yinghai requested a review from eqy October 10, 2025 03:12

eqy approved these changes Oct 10, 2025

View reviewed changes

ngimel added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 10, 2025

large

b7fb34d

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Oct 10, 2025

Aidyn-A added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 10, 2025

pytorchmergebot added the merging label Oct 10, 2025

pytorchmergebot closed this in 94e6349 Oct 10, 2025

pytorchmergebot added Merged and removed merging labels Oct 10, 2025

yinghai deleted the yinghai/fix_emb_bwd branch October 10, 2025 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix int32 overflow in embedding_dense_backward#165095

Fix int32 overflow in embedding_dense_backward#165095
yinghai wants to merge 8 commits intopytorch:mainfrom
yinghai:yinghai/fix_emb_bwd

yinghai commented Oct 9, 2025

Uh oh!

pytorch-bot bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

ngimel commented Oct 9, 2025

Uh oh!

ngimel Oct 9, 2025

Uh oh!

eqy Oct 9, 2025

Uh oh!

yinghai Oct 9, 2025

Uh oh!

ngimel Oct 9, 2025

Uh oh!

ngimel commented Oct 10, 2025

Uh oh!

pytorchmergebot commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

yinghai commented Oct 9, 2025

Uh oh!

pytorch-bot bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165095

✅ No Failures

Uh oh!

ngimel commented Oct 9, 2025

Uh oh!

ngimel Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

eqy Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

yinghai Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ngimel Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ngimel commented Oct 10, 2025

Uh oh!

pytorchmergebot commented Oct 10, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pytorch-bot bot commented Oct 9, 2025 •

edited

Loading