[quant][graphmode] Fp16 quant support - match numerics with eager mode #41049

supriyar · 2020-07-07T00:02:55Z

Stack from ghstack:

[quant][graphmode] Fp16 quant support - match numerics with eager mode #41049 [quant][graphmode] Fp16 quant support - match numerics with eager mode
[quant][graphmode] Refactor quantization patterns #40894 [quant][graphmode] Refactor quantization patterns

Summary:
In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator
Remove it from graph mode.
For the weight tensor we handle saturation by clipping the values to fp16 range
This makes numerics match between debug model and final quantized model.

Test Plan:
python test/test_quantization.py test_linear_dynamic_fp16
Reviewers:

Subscribers:

Tasks:

Tags:

Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 1a7c6c6 Pull Request resolved: #41049

…cast op" Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c973a37 Pull Request resolved: #41049

dr-ci · 2020-07-07T18:10:25Z

💊 CI failures summary and remediations

As of commit 31c0a60 (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

7/7 tentatively recognized as flaky ❄️
- Click here to rerun these jobs

❄️ 7 failures tentatively classified as flaky

but reruns have not yet been triggered to confirm:

pytorch_linux_xenial_py3_clang5_asan_build (1/7)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:fff7795428560442086f7b2bb6004b65245dc11a not found

DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:fff7795428560442086f7b2bb6004b65245dc11a 
Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:fff7795428560442086f7b2bb6004b65245dc11a not found

pytorch_xla_linux_bionic_py3_6_clang9_build (2/7)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9:fff7795428560442086f7b2bb6004b65245dc11a not found

DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9:fff7795428560442086f7b2bb6004b65245dc11a 
Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9:fff7795428560442086f7b2bb6004b65245dc11a not found

pytorch_linux_xenial_py3_clang5_mobile_build (3/7)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:fff7795428560442086f7b2bb6004b65245dc11a not found

DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:fff7795428560442086f7b2bb6004b65245dc11a 
Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:fff7795428560442086f7b2bb6004b65245dc11a not found

pytorch_linux_bionic_py3_6_clang9_build (4/7)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9:fff7795428560442086f7b2bb6004b65245dc11a not found

DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9:fff7795428560442086f7b2bb6004b65245dc11a 
Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9:fff7795428560442086f7b2bb6004b65245dc11a not found

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (5/7)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7:fff7795428560442086f7b2bb6004b65245dc11a not found

DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7:fff7795428560442086f7b2bb6004b65245dc11a 
Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7:fff7795428560442086f7b2bb6004b65245dc11a not found

pytorch_libtorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (6/7)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7:fff7795428560442086f7b2bb6004b65245dc11a not found

DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7:fff7795428560442086f7b2bb6004b65245dc11a 
Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7:fff7795428560442086f7b2bb6004b65245dc11a not found

pytorch_linux_xenial_py3_clang5_mobile_custom_build_static (7/7)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:fff7795428560442086f7b2bb6004b65245dc11a not found

DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:fff7795428560442086f7b2bb6004b65245dc11a 
Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:fff7795428560442086f7b2bb6004b65245dc11a not found

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 8 times.

…cast op" Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

…h eager mode" Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. For the weight tensor we handle saturation by clipping the values to fp16 range This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. For the weight tensor we handle saturation by clipping the values to fp16 range This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 510d046 Pull Request resolved: #41049

jerryzh168 · 2020-07-07T23:32:18Z

torch/csrc/jit/passes/quantization/insert_quant_dequant.cpp

+    // We don't need to insert cast operators for activation tensors for fp16
+    // quant.


can we filter this in a different place? e.g. we don't insert observer for activation tensors?

Is it possible for user to not specify activation observer in the qconfig? Does the prepare_jit pass ensure observers aren't inserted in that case for activation tensors?

I mean doing checks like this: https://codebrowser.bddppq.com/pytorch/pytorch/torch/csrc/jit/passes/quantization/insert_observers.cpp.html#1187

That may have issues since for FP16 quant we don't specify dtype anywhere in the qconfig. We set quant type to dynamic so there is no way to distinguish int8 dynamic quant from fp16 dynamic quant. Hence was wondering if not specifying any activation observer (since we don't want it observed) would work here.

Looks like right now we are checking noop observer to do fp16 quantization, this sounds like a hack, can we expose fp16 as an argument to the API?

i feel it makes more sense to expose this in the top level API, why don't we do that?

supriyar requested a review from apaszke as a code owner July 7, 2020 00:02

supriyar mentioned this pull request Jul 7, 2020

[quant][graphmode] Refactor quantization patterns #40894

Closed

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Jul 7, 2020

supriyar changed the title ~~[quant][graphmode] Fp16 quant support - remove activation cast op~~ [quant][graphmode] Fp16 quant support - match numerics with eager mode Jul 7, 2020

supriyar requested review from jerryzh168 and raghuramank100 July 7, 2020 22:45

jerryzh168 reviewed Jul 7, 2020

View reviewed changes

supriyar marked this pull request as draft July 7, 2020 23:46

facebook-github-bot closed this Jul 13, 2020

facebook-github-bot deleted the gh/supriyar/148/head branch August 13, 2020 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[quant][graphmode] Fp16 quant support - match numerics with eager mode #41049

[quant][graphmode] Fp16 quant support - match numerics with eager mode #41049

Uh oh!

supriyar commented Jul 7, 2020 •

edited

Loading

Uh oh!

dr-ci bot commented Jul 7, 2020 •

edited

Loading

Uh oh!

jerryzh168 Jul 7, 2020

Uh oh!

supriyar Jul 7, 2020

Uh oh!

jerryzh168 Jul 7, 2020

Uh oh!

supriyar Jul 7, 2020

Uh oh!

jerryzh168 Jul 7, 2020

Uh oh!

jerryzh168 Jul 7, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		// We don't need to insert cast operators for activation tensors for fp16
		// quant.

[quant][graphmode] Fp16 quant support - match numerics with eager mode #41049

[quant][graphmode] Fp16 quant support - match numerics with eager mode #41049

Uh oh!

Conversation

supriyar commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

❄️ 7 failures tentatively classified as flaky

pytorch_linux_xenial_py3_clang5_asan_build (1/7)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/7)

pytorch_linux_xenial_py3_clang5_mobile_build (3/7)

pytorch_linux_bionic_py3_6_clang9_build (4/7)

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (5/7)

pytorch_libtorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build (6/7)

pytorch_linux_xenial_py3_clang5_mobile_custom_build_static (7/7)

Uh oh!

jerryzh168 Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

supriyar Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

supriyar Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Jul 7, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

supriyar commented Jul 7, 2020 •

edited

Loading

dr-ci bot commented Jul 7, 2020 •

edited

Loading