-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[PyTorch Edge] Use Parallelization in Internal Quantized Matmul #73247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit cee0a2d (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) ghstack-source-id: 149691337 Pull Request resolved: #73247
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 150002588 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 150241218 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 150355543 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 150726277 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 150840723 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 150916673 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 151003681 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 151028027 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 151138390 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 151217389 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul" Split up multiplication over outer dimensions Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/) [ghstack-poisoned]
Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 151250864 Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
Summary: Pull Request resolved: #73247 Split up multiplication over outer dimensions ghstack-source-id: 151250864 Test Plan: From fbcode: ```buck test caffe2/test:quantization -- test_qmatmul``` Performance Improvement Summary: For matmuls used by Transformer Model - This diff makes qmatmul ~53% faster than the preceding diff (Ruy without parallelization) - This entire diff stack makes qmatmul ~75% faster than the naive implementation (see below for details) **Detailed Benchmarking Results:** *Benchmarking done by on a model which performs matmuls of the same shapes and counts as Transformer Model, as determined in D30901505* *Notebook in which Benchmarking was performed: https://www.internalfb.com/intern/anp/view/?id=1582075&revision_id=537916317667891* - Ruy QMatMul, Parallelization within PyTorch (this diff, v5): [7.5257ms](https://www.internalfb.com/intern/aibench/details/621856970876663) - Ruy QMatMul, No Parallelization (D33735479, v18): [16.0261ms](https://www.internalfb.com/intern/aibench/details/867786467365069) - Naive QMatMul (on master branch (base of D33332098), v22): [30.9919ms](https://www.internalfb.com/intern/aibench/details/418359955621359) Experiments using Ruy Threadpool (which ended up being bad; abandoning): - Ruy QMatMul, with Ruy Threadpool 4 threads (D34110676, v1): [59.8889ms](https://www.internalfb.com/intern/aibench/details/487293857402229) - Ruy QMatMul, Parallelization within PyTorch and with Ruy Threadpool 4 threads (D34111050, v1): [624.8932 ms (?!)](https://www.internalfb.com/intern/aibench/details/330231112631355) Reviewed By: kimishpatel Differential Revision: D34012771 fbshipit-source-id: 79d137f295b05812968ab53fdf9798606f3f4e63
|
Hey @salilsdesai. |
Stack from ghstack (oldest at bottom):
Split up multiplication over outer dimensions
Differential Revision: D34012771