[PyTorch Edge] Make contexts thread local for quantized matmul #74676

salilsdesai · 2022-03-24T14:42:07Z

Summary: We don't want to create and destroy a new context with each multiplication

Test Plan:
From fbcode:
buck test caffe2/test:quantization -- test_qmatmul

Performance Improvement

Benchmarking done by on a model which performs matmuls of the same shapes and counts as Transformer Model, as determined in D30901505

Notebook in which Benchmarking was performed: https://www.internalfb.com/intern/anp/view/?id=1582075&revision_id=1891629751047842

Improvement from this diff alone
~9.71% Reduction in Latency

Non Thread Local Contexts (before this diff, D35087184 v2): 8.5410ms
Thread Local Contexts (this diff, v12): 7.7113ms

FP32 Matmul vs Quantized Matmul, Overall Improvement from this diff stack
56% reduction in latency compared to FP32 Matmul, 71% reduction in latency compared to Naive QMatmul

FP32 Matmul: 17.4910ms
Quantized Matmul (after this diff): 7.7113ms
Naive Quantized Matmul (dequantize → fp32matmul → quantize): 26.8639ms

Reviewed By: kimishpatel

Differential Revision: D34756288

facebook-github-bot · 2022-03-24T14:42:17Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74676
Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit cf49eb1 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2022-03-24T14:42:41Z

This pull request was exported from Phabricator. Differential Revision: D34756288

facebook-github-bot · 2022-03-24T18:01:25Z

This pull request was exported from Phabricator. Differential Revision: D34756288

…ch#74676) Summary: Pull Request resolved: pytorch#74676 We don't want to create and destroy a new context with each multiplication Test Plan: From fbcode: ```buck test caffe2/test:quantization -- test_qmatmul``` # Performance Improvement *Benchmarking done by on a model which performs matmuls of the same shapes and counts as Transformer Model, as determined in D30901505* *Notebook in which Benchmarking was performed: https://www.internalfb.com/intern/anp/view/?id=1582075&revision_id=1891629751047842* **Improvement from this diff alone** ~9.71% Reduction in Latency - Non Thread Local Contexts (before this diff, D35087184 v2): [8.5410ms](https://www.internalfb.com/intern/aibench/details/661728682381311 ) - Thread Local Contexts (this diff, v12): [7.7113ms](https://www.internalfb.com/intern/aibench/details/956655867696198) **FP32 Matmul vs Quantized Matmul, Overall Improvement from this diff stack** 56% reduction in latency compared to FP32 Matmul, 71% reduction in latency compared to Naive QMatmul - FP32 Matmul: [17.4910ms](https://www.internalfb.com/intern/aibench/details/875394396322469) - Quantized Matmul (after this diff): [7.7113ms](https://www.internalfb.com/intern/aibench/details/956655867696198 ) - Naive Quantized Matmul (dequantize → fp32matmul → quantize): [26.8639ms](https://www.internalfb.com/intern/aibench/details/52181682131461 ) Reviewed By: kimishpatel Differential Revision: D34756288 fbshipit-source-id: 27c46645f1084a07974dbe2be9b52c15f539928b

facebook-github-bot · 2022-03-25T02:44:19Z

This pull request was exported from Phabricator. Differential Revision: D34756288

Summary: Pull Request resolved: #74676 We don't want to create and destroy a new context with each multiplication Test Plan: From fbcode: ```buck test caffe2/test:quantization -- test_qmatmul``` # Performance Improvement *Benchmarking done by on a model which performs matmuls of the same shapes and counts as Transformer Model, as determined in D30901505* *Notebook in which Benchmarking was performed: https://www.internalfb.com/intern/anp/view/?id=1582075&revision_id=1891629751047842* **Improvement from this diff alone** ~9.71% Reduction in Latency - Non Thread Local Contexts (before this diff, D35087184 v2): [8.5410ms](https://www.internalfb.com/intern/aibench/details/661728682381311 ) - Thread Local Contexts (this diff, v12): [7.7113ms](https://www.internalfb.com/intern/aibench/details/956655867696198) **FP32 Matmul vs Quantized Matmul, Overall Improvement from this diff stack** 56% reduction in latency compared to FP32 Matmul, 71% reduction in latency compared to Naive QMatmul - FP32 Matmul: [17.4910ms](https://www.internalfb.com/intern/aibench/details/875394396322469) - Quantized Matmul (after this diff): [7.7113ms](https://www.internalfb.com/intern/aibench/details/956655867696198 ) - Naive Quantized Matmul (dequantize → fp32matmul → quantize): [26.8639ms](https://www.internalfb.com/intern/aibench/details/52181682131461 ) Reviewed By: kimishpatel Differential Revision: D34756288 fbshipit-source-id: b000658152cf71b4185dcd34a3cccc71b4cec1f0

github-actions · 2022-03-25T15:36:39Z

Hey @salilsdesai.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot added the cla signed label Mar 24, 2022

facebook-github-bot added the fb-exported label Mar 24, 2022

salilsdesai force-pushed the export-D34756288 branch from d85a8be to 38fa1a7 Compare March 24, 2022 18:01

salilsdesai force-pushed the export-D34756288 branch from 38fa1a7 to cf49eb1 Compare March 25, 2022 02:44

pytorchmergebot closed this in cdcd1ac Mar 25, 2022

WBobby mentioned this pull request Aug 17, 2022

Add ROCm5.2.3/AMDGPU support for PyTorch WBobby/pytorch#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch Edge] Make contexts thread local for quantized matmul #74676

[PyTorch Edge] Make contexts thread local for quantized matmul #74676

Uh oh!

salilsdesai commented Mar 24, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 24, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 24, 2022

Uh oh!

facebook-github-bot commented Mar 24, 2022

Uh oh!

facebook-github-bot commented Mar 25, 2022

Uh oh!

github-actions bot commented Mar 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[PyTorch Edge] Make contexts thread local for quantized matmul #74676

[PyTorch Edge] Make contexts thread local for quantized matmul #74676

Uh oh!

Conversation

salilsdesai commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Improvement

Uh oh!

facebook-github-bot commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

facebook-github-bot commented Mar 24, 2022

Uh oh!

facebook-github-bot commented Mar 24, 2022

Uh oh!

facebook-github-bot commented Mar 25, 2022

Uh oh!

github-actions bot commented Mar 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

salilsdesai commented Mar 24, 2022 •

edited

Loading

facebook-github-bot commented Mar 24, 2022 •

edited

Loading