Skip to content

Conversation

@mingfeima
Copy link
Collaborator

@mingfeima mingfeima commented Jul 21, 2022

Stack from ghstack:

  1. If user uses amp to run bfloat16 models, torch.autocast will
    keep module paramters in acc dtype which will leave gamma andbeta
    in float while input/output will be in bfloat16.

  2. If user explicitly cast the model to bfloat16,
    the input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 21, 2022

🔗 Helpful links

❌ 1 New Failures

As of commit 624f804 (more details on the Dr. CI page):

Expand to see more
  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-focal-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details)

2022-09-01T11:12:13.8946754Z AssertionError: te...than the torch result was (1.666915112918943e-05)!
2022-09-01T11:12:13.8932744Z =================================== FAILURES ===================================
2022-09-01T11:12:13.8936464Z ______ TestCommonCPU.test_python_ref__refs_native_layer_norm_cpu_float32 _______
2022-09-01T11:12:13.8941186Z [gw1] linux -- Python 3.7.13 /opt/conda/bin/python
2022-09-01T11:12:13.8942504Z Traceback (most recent call last):
2022-09-01T11:12:13.8942919Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 325, in test_python_ref
2022-09-01T11:12:13.8944395Z     self._ref_test_helper(lambda: TorchRefsMode(strict=True), device, dtype, op)
2022-09-01T11:12:13.8944806Z   File "/var/lib/jenkins/workspace/test/test_ops.py", line 308, in _ref_test_helper
2022-09-01T11:12:13.8945336Z     self.assertTrue(ref_distance <= torch_distance, msg=msg)
2022-09-01T11:12:13.8945935Z   File "/opt/conda/lib/python3.7/unittest/case.py", line 705, in assertTrue
2022-09-01T11:12:13.8946206Z     raise self.failureException(msg)
2022-09-01T11:12:13.8946754Z AssertionError: tensor(False) is not true : Reference result was farther (1.7186287103232445e-05) from the precise computation than the torch result was (1.666915112918943e-05)!
2022-09-01T11:12:13.9171705Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops.xml -
2022-09-01T11:12:13.9174156Z =========================== short test summary info ============================
2022-09-01T11:12:13.9174684Z FAILED test_ops.py::TestCommonCPU::test_python_ref__refs_native_layer_norm_cpu_float32
2022-09-01T11:12:13.9178815Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2022-09-01T11:12:13.9179407Z !!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
2022-09-01T11:12:13.9210278Z = 1 failed, 6894 passed, 4798 skipped, 105 xfailed, 121 warnings, 2 rerun in 349.85s (0:05:49) =
2022-09-01T11:12:14.0678482Z Skip info is located in the xml test reports, please either go to s3 or the hud to download them
2022-09-01T11:12:15.3915917Z Traceback (most recent call last):
2022-09-01T11:12:15.3916583Z   File "test/run_test.py", line 1065, in <module>
2022-09-01T11:12:15.3918129Z     main()

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

mingfeima added a commit that referenced this pull request Jul 21, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: 31fb4af
Pull Request resolved: #81852
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Jul 22, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: 9a20d8f
Pull Request resolved: #81852
@mingfeima mingfeima added the intel This tag is for PR from Intel label Jul 22, 2022
@yanbing-j yanbing-j added the intel priority matters to intel architecture from performance wise label Jul 27, 2022
@yanbing-j yanbing-j removed the intel priority matters to intel architecture from performance wise label Aug 24, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@pytorch-bot pytorch-bot bot added the release notes: nn release notes category label Sep 1, 2022
mingfeima added a commit that referenced this pull request Sep 1, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: d6c20b1
Pull Request resolved: #81852
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d7f5ccd:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81852

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Sep 8, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: e9198d5
Pull Request resolved: #81852
@mingfeima mingfeima requested a review from frank-wei September 8, 2022 05:58
brycedrennan added a commit to brycedrennan/imaginAIry that referenced this pull request Sep 22, 2022
Seems to be caused by incompatible types in group_norm when we use autocast.

Patch group_norm to cast the weights to the same type as the inputs

From what I can understand all the other repos just switch to full precision instead
of addressing this.  I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?

pytorch/pytorch#81852
CaoE added a commit that referenced this pull request Dec 9, 2022


This PR is cherry-picked from #84404 ~ #81852.

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Dec 13, 2022
…and GroupNorm"




This PR is cherry-picked from #84404 ~ #81852.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
CaoE added a commit that referenced this pull request Dec 13, 2022


This PR is cherry-picked from #84404 ~ #81852.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
@mingfeima mingfeima requested review from jgong5 and malfet December 13, 2022 05:42
@mingfeima mingfeima added the intel priority matters to intel architecture from performance wise label Dec 13, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Dec 13, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: 02e74b3
Pull Request resolved: #81852
@mingfeima
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approval needed from one of the following (Rule 'superuser'):
shreyanb98, 842974287, Xirider, husthyc, aruntonic, ...

Details for Dev Infra team Raised by workflow job

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit on the perf, we introduce unnecessary if (mixedtype) for all dtypes but BFloat16, perhaps one can refactor this code as:

template<scalar_t>
void CallGroupNorm(...)

And specialize it for BFloat16 to call mixed_type

1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

cc @VitalyFedyunin jgong5 @XiaobingSuper sanchitintel ashokei jingxu10

[ghstack-poisoned]
mingfeima added a commit that referenced this pull request Dec 19, 2022
1. If user uses amp to run bfloat16 models, `torch.autocast` will
keep module paramters in acc dtype which will leave `gamma` and`beta`
in float while input/output will be in bfloat16.

2. If user explicitly cast the model to bfloat16,
the input/output and gamma/beta will all be in bfloat16.

ghstack-source-id: 31172bb
Pull Request resolved: #81852
@mingfeima
Copy link
Collaborator Author

Small nit on the perf, we introduce unnecessary if (mixedtype) for all dtypes but BFloat16, perhaps one can refactor this code as:

template<scalar_t>
void CallGroupNorm(...)

And specialize it for BFloat16 to call mixed_type

We will add minxed_type for float16 in near future for the coming new platform of Xeon (Granite Rapids).

Also I noticed that vec_scalar_t is a duplicate now, will have it replaced with opmath_t.

@mingfeima
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/mingfeima/83/head branch June 8, 2023 18:01
mattstern31 added a commit to mattstern31/imagin-AIry-Python that referenced this pull request Nov 11, 2023
Seems to be caused by incompatible types in group_norm when we use autocast.

Patch group_norm to cast the weights to the same type as the inputs

From what I can understand all the other repos just switch to full precision instead
of addressing this.  I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?

pytorch/pytorch#81852
superlucky19971023 added a commit to superlucky19971023/imaginAIry that referenced this pull request Aug 18, 2024
Seems to be caused by incompatible types in group_norm when we use autocast.

Patch group_norm to cast the weights to the same type as the inputs

From what I can understand all the other repos just switch to full precision instead
of addressing this.  I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?

pytorch/pytorch#81852
PyPilot-ai pushed a commit to PyPilot-ai/PyPilot-ai that referenced this pull request Apr 9, 2025
Seems to be caused by incompatible types in group_norm when we use autocast.

Patch group_norm to cast the weights to the same type as the inputs

From what I can understand all the other repos just switch to full precision instead
of addressing this.  I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?

pytorch/pytorch#81852
PyPilot-ai added a commit to PyPilot-ai/PyPilot-ai that referenced this pull request Apr 13, 2025
Seems to be caused by incompatible types in group_norm when we use autocast.

Patch group_norm to cast the weights to the same type as the inputs

From what I can understand all the other repos just switch to full precision instead
of addressing this.  I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?

pytorch/pytorch#81852
tune-project1 added a commit to tune-project1/Viziax that referenced this pull request Apr 13, 2025
Seems to be caused by incompatible types in group_norm when we use autocast.

Patch group_norm to cast the weights to the same type as the inputs

From what I can understand all the other repos just switch to full precision instead
of addressing this.  I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?

pytorch/pytorch#81852
PyPilot-ai added a commit to PyPilot-ai/surveylogic that referenced this pull request Apr 14, 2025
Seems to be caused by incompatible types in group_norm when we use autocast.

Patch group_norm to cast the weights to the same type as the inputs

From what I can understand all the other repos just switch to full precision instead
of addressing this.  I think this would make things slower but I'm not sure. So maybe
the patching solution I'm doing is better?

pytorch/pytorch#81852
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request cla signed intel priority matters to intel architecture from performance wise intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source release notes: nn release notes category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

10 participants