-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[quant][graphmode] Fp16 quant support - match numerics with eager mode #41049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 1a7c6c6 Pull Request resolved: #41049
…cast op" Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c973a37 Pull Request resolved: #41049
💊 CI failures summary and remediationsAs of commit 31c0a60 (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
❄️ 7 failures tentatively classified as flakybut reruns have not yet been triggered to confirm:
|
…cast op" Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…h eager mode" Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. For the weight tensor we handle saturation by clipping the values to fp16 range This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…h eager mode" Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. For the weight tensor we handle saturation by clipping the values to fp16 range This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator Remove it from graph mode. For the weight tensor we handle saturation by clipping the values to fp16 range This makes numerics match between debug model and final quantized model. Test Plan: python test/test_quantization.py test_linear_dynamic_fp16 Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 510d046 Pull Request resolved: #41049
| // We don't need to insert cast operators for activation tensors for fp16 | ||
| // quant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we filter this in a different place? e.g. we don't insert observer for activation tensors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for user to not specify activation observer in the qconfig? Does the prepare_jit pass ensure observers aren't inserted in that case for activation tensors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean doing checks like this: https://codebrowser.bddppq.com/pytorch/pytorch/torch/csrc/jit/passes/quantization/insert_observers.cpp.html#1187
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may have issues since for FP16 quant we don't specify dtype anywhere in the qconfig. We set quant type to dynamic so there is no way to distinguish int8 dynamic quant from fp16 dynamic quant. Hence was wondering if not specifying any activation observer (since we don't want it observed) would work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like right now we are checking noop observer to do fp16 quantization, this sounds like a hack, can we expose fp16 as an argument to the API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel it makes more sense to expose this in the top level API, why don't we do that?
Stack from ghstack:
Summary:
In eager mode there is no cast operator for the activation tensor in the fbgemm fp16 operator
Remove it from graph mode.
For the weight tensor we handle saturation by clipping the values to fp16 range
This makes numerics match between debug model and final quantized model.
Test Plan:
python test/test_quantization.py test_linear_dynamic_fp16
Reviewers:
Subscribers:
Tasks:
Tags: