Skip to content

Conversation

@jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Mar 17, 2022

Stack from ghstack (oldest at bottom):

Summary:
if a input is used multiple times in modules that are dynamically quantized:

x -- linear1
  \-- linear2

we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:

x - quantize_per_tensor_dynamic - dequantize1 - linear1
                     \----- dequantize2 - linear2

But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:

x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
   \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2

so that they can be fused into dynamic linear:

x - linear_dynamic1
\-- linear_dynamic2

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D34952755

…ple times

Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
  \-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
                     \----- dequantize2 - linear2
```

But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
   \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 17, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/4910aa12640408fc3ab97f04882446e46a9e977b/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
linux-binary-manywheel ciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build ciflow/all, ciflow/cpu, ciflow/default, ciflow/libtorch, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
windows-binary-libtorch-debug ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
windows-binary-libtorch-release ciflow/all, ciflow/binaries, ciflow/binaries_libtorch, ciflow/default, ciflow/trunk ✅ triggered
windows-binary-wheel ciflow/all, ciflow/binaries, ciflow/binaries_wheel, ciflow/default, ciflow/trunk ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-bionic-rocm4.5-py3.7-distributed ciflow/all, ciflow/linux, ciflow/rocm, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 17, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 27a8e0d (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jerryzh168 jerryzh168 requested review from andrewor14 and vkuzo March 17, 2022 06:22
… used multiple times"

Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
  \-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
                     \----- dequantize2 - linear2
```

But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
   \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D34952755](https://our.internmc.facebook.com/intern/diff/D34952755)

[ghstack-poisoned]
… used multiple times"

Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
  \-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
                     \----- dequantize2 - linear2
```

But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
   \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D34952755](https://our.internmc.facebook.com/intern/diff/D34952755)

[ghstack-poisoned]
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

# run the weight observer
weight_observer_module()

# this method is temporary will be removed soon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the real fix removing the check for whether dequantize nodes have multiple users in the lowering code? Is there a reason why we don't just directly change that instead of adding a temporary fix?

Copy link
Contributor Author

@jerryzh168 jerryzh168 Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the real fix is to stop duplicating dequantize and remove the check, planning to talk to you in today's sync

… used multiple times"

Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
  \-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
                     \----- dequantize2 - linear2
```

But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
   \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D34952755](https://our.internmc.facebook.com/intern/diff/D34952755)

[ghstack-poisoned]
… used multiple times"

Summary:
if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
  \-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
                     \----- dequantize2 - linear2
```

But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
   \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D34952755](https://our.internmc.facebook.com/intern/diff/D34952755)

[ghstack-poisoned]
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jerryzh168 jerryzh168 added release notes: quantization release notes category topic: bug fixes topic category labels Mar 18, 2022
facebook-github-bot pushed a commit that referenced this pull request Mar 18, 2022
…ple times (#74364)

Summary:
Pull Request resolved: #74364

if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
  \-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
                     \----- dequantize2 - linear2
```

But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
   \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use

Imported from OSS

Reviewed By: yixin94

Differential Revision: D34952755

fbshipit-source-id: a950159fd6a661e84faf0baf1692f6783904cfb3
@facebook-github-bot facebook-github-bot deleted the gh/jerryzh168/747/head branch March 22, 2022 14:17
shahofblah pushed a commit that referenced this pull request Mar 25, 2022
…ple times (#74364)

Summary:
Pull Request resolved: #74364

if a input is used multiple times in modules that are dynamically quantized:
```
x -- linear1
  \-- linear2
```
we'll insert quantize_per_tensor_dynamic and dequantize for input, and we'll have a duplicate pass
to duplicate dequantize ops for pattern matching:
```
x - quantize_per_tensor_dynamic - dequantize1 - linear1
                     \----- dequantize2 - linear2
```

But we also have a check in the lowering code that if quantize_per_tensor_dynamic is used by multiple nodes
we'll skip the pattern, so the pattern is not recognized, we need to duplicate quantize_per_tensor_dynamic as well in this case
to recover both patterns:
```
x - quantize_per_tensor_dynamic1 -- dequantize1 -- linear1
   \- quantize_per-tensor_dynamic2 -- dequantize2 -- linear2
```
so that they can be fused into dynamic linear:
```
x - linear_dynamic1
\-- linear_dynamic2
```

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_dynamic_linear_input_multiple_use

Imported from OSS

Reviewed By: yixin94

Differential Revision: D34952755

fbshipit-source-id: a950159fd6a661e84faf0baf1692f6783904cfb3
(cherry picked from commit 8a68968)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants