add mixed data type support for GroupNorm #81852

mingfeima · 2022-07-21T03:13:53Z

Stack from ghstack:

-> add mixed data type support for GroupNorm #81852

If user uses amp to run bfloat16 models, torch.autocast will
keep module paramters in acc dtype which will leave gamma andbeta
in float while input/output will be in bfloat16.
If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

facebook-github-bot · 2022-07-21T03:13:59Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852
✖️ Python docs build was skipped
✖️ C++ docs build was skipped
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 1 New Failures

As of commit 624f804 (more details on the Dr. CI page):

Expand to see more

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details)

2022-09-01T11:12:13.8946754Z AssertionError: te...than the torch result was (1.666915112918943e-05)!

2022-09-01T11:12:13.8932744Z =================================== FAILURES ===================================
2022-09-01T11:12:13.8936464Z ______ TestCommonCPU.test_python_ref__refs_native_layer_norm_cpu_float32 _______
2022-09-01T11:12:13.8941186Z [gw1] linux -- Python 3.7.13 /opt/conda/bin/python
2022-09-01T11:12:13.8942504Z Traceback (most recent call last):
2022-09-01T11:12:13.8942919Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 325, in test_python_ref
2022-09-01T11:12:13.8944395Z     self._ref_test_helper(lambda: TorchRefsMode(strict=True), device, dtype, op)
2022-09-01T11:12:13.8944806Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 308, in _ref_test_helper
2022-09-01T11:12:13.8945336Z     self.assertTrue(ref_distance <= torch_distance, msg=msg)
2022-09-01T11:12:13.8945935Z   File "/opt/conda/lib/python3.7/unittest/case.py", line 705, in assertTrue
2022-09-01T11:12:13.8946206Z     raise self.failureException(msg)
2022-09-01T11:12:13.8946754Z AssertionError: tensor(False) is not true : Reference result was farther (1.7186287103232445e-05) from the precise computation than the torch result was (1.666915112918943e-05)!
2022-09-01T11:12:13.9171705Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops.xml -
2022-09-01T11:12:13.9174156Z =========================== short test summary info ============================
2022-09-01T11:12:13.9174684Z FAILED test_ops.py::TestCommonCPU::test_python_ref__refs_native_layer_norm_cpu_float32
2022-09-01T11:12:13.9178815Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-09-01T11:12:13.9179407Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-09-01T11:12:13.9210278Z = 1 failed, 6894 passed, 4798 skipped, 105 xfailed, 121 warnings, 2 rerun in 349.85s (0:05:49) =
2022-09-01T11:12:14.0678482Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-09-01T11:12:15.3915917Z Traceback (most recent call last):
2022-09-01T11:12:15.3916583Z   File "test/run_test.py", line 1065, in <module>
2022-09-01T11:12:15.3918129Z     main()

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: 31fb4af Pull Request resolved: #81852

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: 9a20d8f Pull Request resolved: #81852

aten/src/ATen/native/cpu/group_norm_kernel.cpp

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: d6c20b1 Pull Request resolved: #81852

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

pytorch-bot · 2022-09-08T05:38:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d7f5ccd:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2022-09-08T05:38:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: e9198d5 Pull Request resolved: #81852

Seems to be caused by incompatible types in group_norm when we use autocast. Patch group_norm to cast the weights to the same type as the inputs From what I can understand all the other repos just switch to full precision instead of addressing this. I think this would make things slower but I'm not sure. So maybe the patching solution I'm doing is better? pytorch/pytorch#81852

This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

…and GroupNorm" This PR is cherry-picked from #84404 ~ #81852. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

This PR is cherry-picked from #84404 ~ #81852. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

@VitalyFedyunin

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: 02e74b3 Pull Request resolved: #81852

mingfeima · 2022-12-14T04:36:47Z

@pytorchbot merge

pytorchmergebot · 2022-12-14T04:38:24Z

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
shreyanb98, 842974287, Xirider, husthyc, aruntonic, ...

Details for Dev Infra team

Raised by workflow job

aten/src/ATen/native/cpu/group_norm_kernel.cpp

malfet

Small nit on the perf, we introduce unnecessary if (mixedtype) for all dtypes but BFloat16, perhaps one can refactor this code as:

template<scalar_t>
void CallGroupNorm(...)

And specialize it for BFloat16 to call mixed_type

@VitalyFedyunin

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

1. If user uses amp to run bfloat16 models, `torch.autocast` will keep module paramters in acc dtype which will leave `gamma` and`beta` in float while input/output will be in bfloat16. 2. If user explicitly cast the model to bfloat16, the input/output and gamma/beta will all be in bfloat16. ghstack-source-id: 31172bb Pull Request resolved: #81852

mingfeima · 2022-12-19T02:01:06Z

Small nit on the perf, we introduce unnecessary if (mixedtype) for all dtypes but BFloat16, perhaps one can refactor this code as:
template<scalar_t>
void CallGroupNorm(...)
And specialize it for BFloat16 to call mixed_type

We will add minxed_type for float16 in near future for the coming new platform of Xeon (Granite Rapids).

Also I noticed that vec_scalar_t is a duplicate now, will have it replaced with opmath_t.

mingfeima · 2022-12-19T07:57:56Z

@pytorchbot merge

pytorchmergebot · 2022-12-19T07:59:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Seems to be caused by incompatible types in group_norm when we use autocast. Patch group_norm to cast the weights to the same type as the inputs From what I can understand all the other repos just switch to full precision instead of addressing this. I think this would make things slower but I'm not sure. So maybe the patching solution I'm doing is better? pytorch/pytorch#81852

This was referenced Jul 21, 2022

fix RowwiseMoments vectorization issue on CPU #81849

Closed

RowwiseMoments: use float as acc type for bfloat16 inputs #81850

Closed

facebook-github-bot added the cla signed label Jul 21, 2022

mingfeima mentioned this pull request Jul 21, 2022

add mixed data type support for LayerNorm #81851

Closed

pytorchbot added the open source label Jul 21, 2022

mingfeima added the intel This tag is for PR from Intel label Jul 22, 2022

yanbing-j added the intel priority matters to intel architecture from performance wise label Jul 27, 2022

frank-wei reviewed Jul 28, 2022

View reviewed changes

aten/src/ATen/native/cpu/group_norm_kernel.cpp Outdated Show resolved Hide resolved

aten/src/ATen/native/cpu/group_norm_kernel.cpp Show resolved Hide resolved

yanbing-j removed the intel priority matters to intel architecture from performance wise label Aug 24, 2022

This was referenced Sep 1, 2022

fix RowwiseMoments vectorization issue on CPU #84404

Closed

RowwiseMoments: use float as acc type for bfloat16 inputs #84405

Closed

pytorch-bot bot added the release notes: nn release notes category label Sep 1, 2022

mingfeima requested a review from frank-wei September 8, 2022 05:58

brycedrennan mentioned this pull request Sep 22, 2022

Bugfixes + per-prompt tile mode brycedrennan/imaginAIry#18

Merged

CaoE added a commit that referenced this pull request Dec 9, 2022

Update on "add mixed data type support for LayerNorm and GroupNorm"

5c691e4

This PR is cherry-picked from #84404 ~ #81852. [ghstack-poisoned]

CaoE added a commit that referenced this pull request Dec 13, 2022

Update on "add mixed data type support for LayerNorm and GroupNorm"

84e82f8

This PR is cherry-picked from #84404 ~ #81852. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 [ghstack-poisoned]

mingfeima requested review from jgong5 and malfet December 13, 2022 05:42

mingfeima added the intel priority matters to intel architecture from performance wise label Dec 13, 2022

jgong5 approved these changes Dec 14, 2022

View reviewed changes

malfet reviewed Dec 14, 2022

View reviewed changes

aten/src/ATen/native/cpu/group_norm_kernel.cpp Show resolved Hide resolved

malfet approved these changes Dec 14, 2022

View reviewed changes

This was referenced Dec 14, 2022

add mixed data type support for GroupNorm backward on CPU #88663

Closed

add channels last with mixed data type support for GroupNorm backward #89485

Closed

pytorchmergebot added the Merged label Dec 19, 2022

pytorchmergebot closed this in 4bf22fc Dec 19, 2022

facebook-github-bot deleted the gh/mingfeima/83/head branch June 8, 2023 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add mixed data type support for GroupNorm #81852

add mixed data type support for GroupNorm #81852

Uh oh!

mingfeima commented Jul 21, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 21, 2022 •

edited

Loading

🕵️ 1 new failure recognized by patterns

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 8, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 8, 2022

Uh oh!

mingfeima commented Dec 14, 2022

Uh oh!

pytorchmergebot commented Dec 14, 2022

Uh oh!

Uh oh!

malfet left a comment

Uh oh!

mingfeima commented Dec 19, 2022

Uh oh!

mingfeima commented Dec 19, 2022

Uh oh!

pytorchmergebot commented Dec 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

add mixed data type support for GroupNorm #81852

add mixed data type support for GroupNorm #81852

Uh oh!

Conversation

mingfeima commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 1 New Failures

🕵️ 1 new failure recognized by patterns

pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

✅ No Failures

Uh oh!

pytorch-bot bot commented Sep 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

Uh oh!

mingfeima commented Dec 14, 2022

Uh oh!

pytorchmergebot commented Dec 14, 2022

Merge failed

Uh oh!

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

mingfeima commented Dec 19, 2022

Uh oh!

mingfeima commented Dec 19, 2022

Uh oh!

pytorchmergebot commented Dec 19, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

mingfeima commented Jul 21, 2022 •

edited

Loading

facebook-github-bot commented Jul 21, 2022 •

edited

Loading

pytorch-bot bot commented Sep 8, 2022 •

edited

Loading