RowwiseMoments: use float as acc type for bfloat16 inputs #84405

mingfeima · 2022-09-01T07:35:32Z

Stack from ghstack:

Originally utils::RowwiseMoments<BFloat16> will still accululate on BFloat16,
which is not only slow but also introducing additional rounding errors.

This patch will do accumulation on float for the bfloat16 inputs:
each of bfloat16 vec (size 16) will be converted to two float vec (size 8),
and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs.

No effect on float performance, will improve bfloat16 performance:

avx512 single socket:

before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms
after:  LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms

avx512 single core:

before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms
after:  LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms

avx2 single socket:

before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms
after:  LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms

avx2 single core:

before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms
after:  LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms

cc @VitalyFedyunin @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

To fix #77507 Originally `utils::RowwiseMoments<BFloat16>` will still accululate on BFloat16, which is not only slow but also introducing additional rounding errors. This patch will do accumulation on float for the bfloat16 inputs: each of bfloat16 vec (size 16) will be converted to two float vec (size 8), and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs. No effect on float performance, will improve bfloat16 performance: * avx512 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms ``` * avx512 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms ``` * avx2 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms ``` * avx2 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms ``` [ghstack-poisoned]

facebook-github-bot · 2022-09-01T07:35:39Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84405
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit ef70f20 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

To fix #77507 Originally `utils::RowwiseMoments<BFloat16>` will still accululate on BFloat16, which is not only slow but also introducing additional rounding errors. This patch will do accumulation on float for the bfloat16 inputs: each of bfloat16 vec (size 16) will be converted to two float vec (size 8), and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs. No effect on float performance, will improve bfloat16 performance: * avx512 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms ``` * avx512 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms ``` * avx2 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms ``` * avx2 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms ``` ghstack-source-id: 12fde7f Pull Request resolved: #84405

mingfeima · 2022-09-01T07:39:40Z

replacement of #81850

To fix #77507 Originally `utils::RowwiseMoments<BFloat16>` will still accululate on BFloat16, which is not only slow but also introducing additional rounding errors. This patch will do accumulation on float for the bfloat16 inputs: each of bfloat16 vec (size 16) will be converted to two float vec (size 8), and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs. No effect on float performance, will improve bfloat16 performance: * avx512 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms ``` * avx512 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms ``` * avx2 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms ``` * avx2 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms ``` [ghstack-poisoned]

pytorch-bot · 2022-09-08T05:38:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84405

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 459b074:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

To fix #77507 Originally `utils::RowwiseMoments<BFloat16>` will still accululate on BFloat16, which is not only slow but also introducing additional rounding errors. This patch will do accumulation on float for the bfloat16 inputs: each of bfloat16 vec (size 16) will be converted to two float vec (size 8), and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs. No effect on float performance, will improve bfloat16 performance: * avx512 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms ``` * avx512 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms ``` * avx2 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms ``` * avx2 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms ``` [ghstack-poisoned]

facebook-github-bot · 2022-10-04T00:28:41Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T00:28:51Z

The committers listed above are authorized under a signed CLA.

✅ login: mingfeima / name: Ma Mingfei (ef70f20, 20c7b76, 7a94ba5, f10375f)

To fix pytorch#77507 Originally `utils::RowwiseMoments<BFloat16>` will still accululate on BFloat16, which is not only slow but also introducing additional rounding errors. This patch will do accumulation on float for the bfloat16 inputs: each of bfloat16 vec (size 16) will be converted to two float vec (size 8), and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs. No effect on float performance, will improve bfloat16 performance: * avx512 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms ``` * avx512 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms ``` * avx2 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms ``` * avx2 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms ``` ghstack-source-id: 12fde7f Pull Request resolved: pytorch#84405

To fix #77507 Originally `utils::RowwiseMoments<BFloat16>` will still accululate on BFloat16, which is not only slow but also introducing additional rounding errors. This patch will do accumulation on float for the bfloat16 inputs: each of bfloat16 vec (size 16) will be converted to two float vec (size 8), and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs. No effect on float performance, will improve bfloat16 performance: * avx512 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms ``` * avx512 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms ``` * avx2 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms ``` * avx2 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms ``` [ghstack-poisoned]

jgong5

Is the change covered by UT?

mingfeima · 2022-11-29T04:26:45Z

Is the change covered by UT?

yes, old UTs are enough.

@VitalyFedyunin

To fix #77507 Originally `utils::RowwiseMoments<BFloat16>` will still accululate on BFloat16, which is not only slow but also introducing additional rounding errors. This patch will do accumulation on float for the bfloat16 inputs: each of bfloat16 vec (size 16) will be converted to two float vec (size 8), and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs. No effect on float performance, will improve bfloat16 performance: * avx512 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms ``` * avx512 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms ``` * avx2 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms ``` * avx2 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms ``` cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

@VitalyFedyunin

To fix #77507 Originally `utils::RowwiseMoments<BFloat16>` will still accululate on BFloat16, which is not only slow but also introducing additional rounding errors. This patch will do accumulation on float for the bfloat16 inputs: each of bfloat16 vec (size 16) will be converted to two float vec (size 8), and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs. No effect on float performance, will improve bfloat16 performance: * avx512 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms ``` * avx512 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms ``` * avx2 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms ``` * avx2 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms ``` cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima · 2022-12-01T01:57:18Z

@pytorchbot merge

pytorchmergebot · 2022-12-01T01:58:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

) To fix pytorch#77507 Originally `utils::RowwiseMoments<BFloat16>` will still accululate on BFloat16, which is not only slow but also introducing additional rounding errors. This patch will do accumulation on float for the bfloat16 inputs: each of bfloat16 vec (size 16) will be converted to two float vec (size 8), and accumulated on m1(mean) and m2(rstd) vecs which are all float vecs. No effect on float performance, will improve bfloat16 performance: * avx512 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.210 ms; bf16: 0.770 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.215 ms; bf16: 0.178 ms ``` * avx512 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.661 ms; bf16: 12.267 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 2.618 ms; bf16: 2.309 ms ``` * avx2 single socket: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.540 ms; bf16: 2.030 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 0.527 ms; bf16: 0.458 ms ``` * avx2 single core: ``` before: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.349 ms; bf16: 19.252 ms after: LayerNorm((1024,), eps=1e-05, elementwise_affine=True) : 32x128x1024: fp32: 4.416 ms; bf16: 3.524 ms ``` Pull Request resolved: pytorch#84405 Approved by: https://github.com/jgong5

mingfeima mentioned this pull request Sep 1, 2022

fix RowwiseMoments vectorization issue on CPU #84404

Closed

facebook-github-bot added the cla signed label Sep 1, 2022

mingfeima marked this pull request as draft September 1, 2022 07:36

This was referenced Sep 1, 2022

add mixed data type support for LayerNorm #81851

Closed

add mixed data type support for GroupNorm #81852

Closed

pytorchbot added the open source label Sep 1, 2022

This was referenced Sep 1, 2022

Optimize the performance of binary_kernel_reduce for welford #84184

Closed

optimize the performance of binary_kernel_reduce for welford using Ro… #84467

Closed

yanbing-j added the intel This tag is for PR from Intel label Sep 7, 2022

mingfeima marked this pull request as ready for review September 21, 2022 05:58

mingfeima added 2 commits September 27, 2022 10:54

zhuhaozhe closed this Oct 20, 2022

zhuhaozhe reopened this Oct 20, 2022

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 28, 2022

mingfeima added the topic: not user facing topic category label Nov 28, 2022

mingfeima requested a review from jgong5 November 28, 2022 04:52

jgong5 approved these changes Nov 28, 2022

View reviewed changes

mingfeima added 2 commits November 29, 2022 13:18

mingfeima added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 30, 2022

pytorchmergebot added the Merged label Dec 1, 2022

pytorchmergebot closed this in 6372f11 Dec 1, 2022

facebook-github-bot deleted the gh/mingfeima/87/head branch June 8, 2023 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RowwiseMoments: use float as acc type for bfloat16 inputs #84405

RowwiseMoments: use float as acc type for bfloat16 inputs #84405

Uh oh!

mingfeima commented Sep 1, 2022 •

edited by pytorch-bot bot

Loading

Uh oh!

facebook-github-bot commented Sep 1, 2022 •

edited

Loading

Uh oh!

mingfeima commented Sep 1, 2022

Uh oh!

pytorch-bot bot commented Sep 8, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Oct 4, 2022

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

Loading

Uh oh!

jgong5 left a comment

Uh oh!

mingfeima commented Nov 29, 2022

Uh oh!

mingfeima commented Dec 1, 2022

Uh oh!

pytorchmergebot commented Dec 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

RowwiseMoments: use float as acc type for bfloat16 inputs #84405

RowwiseMoments: use float as acc type for bfloat16 inputs #84405

Uh oh!

Conversation

mingfeima commented Sep 1, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Sep 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

mingfeima commented Sep 1, 2022

Uh oh!

pytorch-bot bot commented Sep 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84405

✅ No Failures

Uh oh!

facebook-github-bot commented Oct 4, 2022

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgong5 left a comment

Choose a reason for hiding this comment

Uh oh!

mingfeima commented Nov 29, 2022

Uh oh!

mingfeima commented Dec 1, 2022

Uh oh!

pytorchmergebot commented Dec 1, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

mingfeima commented Sep 1, 2022 •

edited by pytorch-bot bot

Loading

facebook-github-bot commented Sep 1, 2022 •

edited

Loading

pytorch-bot bot commented Sep 8, 2022 •

edited

Loading

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

Loading