Skip to content

Conversation

@salilsdesai
Copy link
Contributor

@salilsdesai salilsdesai commented Feb 22, 2022

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 22, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/8b7f9a0b5c53fb116dffeb7b5e1a68f3a40993d2/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 22, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit cee0a2d (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

salilsdesai added a commit that referenced this pull request Feb 22, 2022
Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

ghstack-source-id: 149691337
Pull Request resolved: #73247
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Feb 25, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 150002588

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 1, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 150241218

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 2, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 150355543

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 7, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 150726277

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 8, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 150840723

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 9, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 150916673

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 10, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 151003681

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 10, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 151028027

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 11, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 151138390

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 12, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 151217389

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
…atmul"

Split up multiplication over outer dimensions

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)

[ghstack-poisoned]
salilsdesai added a commit that referenced this pull request Mar 14, 2022
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 151250864

Differential Revision: [D34012771](https://our.internmc.facebook.com/intern/diff/D34012771/)
facebook-github-bot pushed a commit that referenced this pull request Mar 14, 2022
Summary:
Pull Request resolved: #73247

Split up multiplication over outer dimensions
ghstack-source-id: 151250864

Test Plan:
From fbcode:
```buck test caffe2/test:quantization -- test_qmatmul```

Performance Improvement Summary:
For matmuls used by Transformer Model
- This diff makes qmatmul ~53% faster than the preceding diff (Ruy without parallelization)
- This entire diff stack makes qmatmul ~75% faster than the naive implementation
(see below for details)

**Detailed Benchmarking Results:**
*Benchmarking done by on a model which performs matmuls of the same shapes and counts as Transformer Model, as determined in D30901505*

*Notebook in which Benchmarking was performed: https://www.internalfb.com/intern/anp/view/?id=1582075&revision_id=537916317667891*

- Ruy QMatMul, Parallelization within PyTorch (this diff, v5): [7.5257ms](https://www.internalfb.com/intern/aibench/details/621856970876663)
- Ruy QMatMul, No Parallelization (D33735479, v18): [16.0261ms](https://www.internalfb.com/intern/aibench/details/867786467365069)
- Naive QMatMul (on master branch (base of D33332098), v22): [30.9919ms](https://www.internalfb.com/intern/aibench/details/418359955621359)

Experiments using Ruy Threadpool (which ended up being bad; abandoning):
-  Ruy QMatMul, with Ruy Threadpool 4 threads (D34110676, v1): [59.8889ms](https://www.internalfb.com/intern/aibench/details/487293857402229)
- Ruy QMatMul, Parallelization within PyTorch and with Ruy Threadpool 4 threads (D34111050, v1): [624.8932 ms (?!)](https://www.internalfb.com/intern/aibench/details/330231112631355)

Reviewed By: kimishpatel

Differential Revision: D34012771

fbshipit-source-id: 79d137f295b05812968ab53fdf9798606f3f4e63
@github-actions
Copy link
Contributor

Hey @salilsdesai.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@facebook-github-bot facebook-github-bot deleted the gh/salilsdesai/17/head branch March 18, 2022 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants