[SDPA] Call _sdp_attention in nn.functional.mha #89470

drisspg · 2022-11-22T00:14:21Z

Summary

Replaces the the inline block of code in nn.funcitonal.mha with _scaled_dot_product_attention. This function allows the fused kernels to be called if all the required input conditions are met.

cc @VitalyFedyunin @ngimel

pytorch-bot · 2022-11-22T00:14:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89470

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 41b3f91:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg · 2022-11-23T22:47:58Z

aten/src/ATen/native/transformers/attention.cpp

    // Scale q,k before matmul for stability see https://tinyurl.com/sudb9s96 for math
-    const double scaling_factor = ::sqrt(::sqrt(static_cast<double>(embed_size)));
+    const auto embed_size = SymFloat(query_.sym_size(-1));
+    // const double scaling_factor = ::sqrt(::sqrt(static_cast<double>(embed_size)));


TODO Different options I was working through, need to remove before landing
but curious if this is the best way to do things

aten/src/ATen/native/transformers/cuda/sdp_utils.h

drisspg · 2022-11-23T22:49:09Z

c10/core/SymFloat.cpp

  return os;
 }

+SymFloat SymFloat::sqrt() const {


Needs double checking and placement guidance

drisspg · 2022-11-23T22:50:19Z

torch/nn/functional.py

-    attn_output_weights = softmax(attn_output_weights, dim=-1)
-    if dropout_p > 0.0:
-        attn_output_weights = dropout(attn_output_weights, p=dropout_p)
+        if attn_mask.size(0) == 1:


hacky.. the API between SDP vs the nn.funcitonal.mha has a decent amount of impedance mismatch requiring all this transposing and viewing fluff. Ideally this would also work with nested tensors out of the box but need to do some a once over of this forward understand the gap

drisspg · 2022-11-23T23:35:20Z

@BowenBao @abock Sorry for pinging you directly but I am getting a test failure for:

test/onnx/test_models_onnxruntime.py::TestModelsONNXRuntime_is_script_False::test_transformer_encoder - 
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::_scaled_dot_product_attention' to ONNX opset version 14 is not supported. 
Please feel free to request support or submit a pull request on PyTorch GitHub: https://github.com/pytorch/pytorch/issues.

Which can be found on the hud: https://hud.pytorch.org/pr/89470

I am not sure what are the next steps to enable ONNX support. I tried reading through the wiki but didn't find anything very fruitful. Any guidance would be much appreciated

bdhirsh · 2022-11-28T18:29:10Z

aten/src/ATen/native/transformers/cuda/sdp_utils.h

TIL we have a c10::Join

aten/src/ATen/native/transformers/cuda/sdp_utils.h

bdhirsh · 2022-11-28T18:36:01Z

c10/core/SymFloat.h

Have you tried writing something like:

namespace std { template <> SymFloat sqrt<SymFloat>(const SymFloat& self) { ... } }

Not sure if this would work exactly, but maybe we can get std::sqrt() to Just Work on symfloats, so we don't need any code changes in the future

Can't seem to match the function template specialization to a function template. Tried for both std:: and c10::complex

aten/src/ATen/native/transformers/attention.cpp

facebook-github-bot · 2022-11-28T21:56:18Z

@mikekgfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mikekgfb · 2022-11-28T23:31:38Z

torch/nn/functional.py

The transpose() calls necessary to pull this off might cost us significant overheads.

There's an impedance mismatch not just between functional.MHA and sdp_attention, but also between nn.MHA and functional.MHA.

nn.MHA treats batch_first as preferred format, but functional.MHA appears to only support not batch_first. I see two options:
1 - pass in the bacth_first variable (but that's somewhat non-preferred because we might end up with control flow as a function of an input, the input being batch_first.)
2 - have an inverted polarity MHA which prefers batch_first and use that one internally, then have nn.MHA and legacy compatibility functional.MHA call the new implementation - with a possible path to deprecating the old interface eventually)

drisspg · 2022-11-28T23:57:29Z

@pytorchbot merge

pytorchmergebot · 2022-11-29T00:00:26Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

facebook-github-bot · 2022-11-30T16:52:43Z

@cpuhrsch has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-11-30T19:34:18Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-11-30T19:37:04Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-11-30T19:43:16Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-11-30T20:29:47Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met. cc VitalyFedyunin ngimel X-link: pytorch/pytorch#89470 Reviewed By: cpuhrsch Differential Revision: D41625335 Pulled By: drisspg fbshipit-source-id: dcf79e9d51d1bb0b1649f6bf0a8b0e2869170874

Summary: Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met. cc VitalyFedyunin ngimel Pull Request resolved: pytorch#89470 Reviewed By: cpuhrsch Differential Revision: D41625335 Pulled By: drisspg fbshipit-source-id: 460b5cadfbd2e8f8a21fb46bce92fe831984ee02

facebook-github-bot · 2022-11-30T21:32:18Z

This pull request was exported from Phabricator. Differential Revision: D41625335

Summary: Pull Request resolved: pytorch#6038 Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met. cc VitalyFedyunin ngimel X-link: pytorch/pytorch#89470 Reviewed By: cpuhrsch Differential Revision: D41625335 Pulled By: drisspg fbshipit-source-id: 7bfe67cbc52d545faa0eefa7600f39a1685d01e4

facebook-github-bot · 2022-12-01T04:00:40Z

This pull request was exported from Phabricator. Differential Revision: D41625335

Summary: X-link: pytorch/glow#6038 Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met. cc VitalyFedyunin ngimel Pull Request resolved: pytorch#89470 Reviewed By: cpuhrsch Differential Revision: D41625335 Pulled By: drisspg fbshipit-source-id: cd7d010a6c325618e0df9dd75246a291451c8021

Summary: X-link: pytorch/glow#6038 Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met. cc VitalyFedyunin ngimel Pull Request resolved: pytorch#89470 Reviewed By: mostafaelhoushi, cpuhrsch Differential Revision: D41625335 Pulled By: drisspg fbshipit-source-id: 1723c11739fc73963bd9be8dc04f45e5abda79c0

Summary: Pull Request resolved: pytorch#6038 Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met. cc VitalyFedyunin ngimel X-link: pytorch/pytorch#89470 Reviewed By: mostafaelhoushi, cpuhrsch Differential Revision: D41625335 Pulled By: drisspg fbshipit-source-id: 44947eb40c48a8530d84155c0b11020278155bd8

facebook-github-bot · 2022-12-02T16:46:45Z

This pull request was exported from Phabricator. Differential Revision: D41625335

Summary: Pull Request resolved: #6038 Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met. cc VitalyFedyunin ngimel X-link: pytorch/pytorch#89470 Reviewed By: mostafaelhoushi, cpuhrsch Differential Revision: D41625335 Pulled By: drisspg fbshipit-source-id: c3ce8e1fbec25af249e6c8c8cda3086fdddaf558

facebook-github-bot · 2022-12-02T19:44:43Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2022-12-02T19:46:17Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Summary Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met. Pull Request resolved: pytorch#89470 Approved by: https://github.com/cpuhrsch, https://github.com/mikekgfb

…orch#89847)" This reverts commit b9afa92. Reverted pytorch#89847 on behalf of https://github.com/jeanschmidt due to Need to revert this commit as it is causing conflict when reverting pytorch#89470

This reverts commit 4d7ec30. Reverted pytorch#89470 on behalf of https://github.com/jeanschmidt due to breaking internal builds

# Summary Replaces the the inline block of code in nn.funcitonal.mha with `_scaled_dot_product_attention`. This function allows the fused kernels to be called if all the required input conditions are met. Pull Request resolved: pytorch#89470 Approved by: https://github.com/cpuhrsch, https://github.com/mikekgfb

drisspg force-pushed the call__sdp_in_mha_functional branch 2 times, most recently from 32b4332 to 603616c Compare November 23, 2022 22:04

drisspg marked this pull request as ready for review November 23, 2022 22:36

drisspg requested review from albanD and jbschlosser as code owners November 23, 2022 22:36

drisspg requested a review from cpuhrsch November 23, 2022 22:36

drisspg commented Nov 23, 2022

View reviewed changes

aten/src/ATen/native/transformers/cuda/sdp_utils.h Show resolved Hide resolved

drisspg commented Nov 23, 2022

View reviewed changes

c10/core/SymFloat.cpp

return os;

}

SymFloat SymFloat::sqrt() const {

Copy link

Contributor Author

drisspg Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs double checking and placement guidance

drisspg commented Nov 23, 2022

View reviewed changes

drisspg added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 24, 2022

mikekgfb mentioned this pull request Nov 25, 2022

Suppress self attention check for MHA native fastpath #89654

Closed

drisspg force-pushed the call__sdp_in_mha_functional branch from 603616c to 22acd66 Compare November 28, 2022 17:47

drisspg requested review from BowenBao and abock as code owners November 28, 2022 17:47

bdhirsh reviewed Nov 28, 2022

View reviewed changes

aten/src/ATen/native/transformers/cuda/sdp_utils.h Outdated

Copy link

Contributor

bdhirsh Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL we have a c10::Join

bdhirsh reviewed Nov 28, 2022

View reviewed changes

aten/src/ATen/native/transformers/cuda/sdp_utils.h Outdated Show resolved Hide resolved

bdhirsh reviewed Nov 28, 2022

View reviewed changes

aten/src/ATen/native/transformers/attention.cpp Outdated Show resolved Hide resolved

drisspg added module: performance Issues related to performance, either of kernel code or framework glue better-engineering Relatively self-contained tasks for better engineering contributors labels Nov 28, 2022

pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Nov 28, 2022

cpuhrsch approved these changes Nov 28, 2022

View reviewed changes

mikekgfb approved these changes Nov 28, 2022

View reviewed changes

pytorchmergebot added the Merged label Nov 29, 2022

albanD removed their request for review November 30, 2022 17:45

drisspg force-pushed the call__sdp_in_mha_functional branch from e10a728 to d62aa79 Compare November 30, 2022 19:26

drisspg mentioned this pull request Nov 30, 2022

Call _sdp_attention in nn.functional.mha (#89470) pytorch/glow#6038

Closed

drisspg force-pushed the call__sdp_in_mha_functional branch from 858d7d4 to f4690a6 Compare November 30, 2022 21:32

drisspg force-pushed the call__sdp_in_mha_functional branch from f4690a6 to dce73fa Compare December 1, 2022 04:00

drisspg force-pushed the call__sdp_in_mha_functional branch from dce73fa to 41b3f91 Compare December 2, 2022 16:47

pytorchmergebot closed this in 78bdb85 Dec 2, 2022

drisspg changed the title ~~Call _sdp_attention in nn.functional.mha~~ [SDPA] Call _sdp_attention in nn.functional.mha Jan 10, 2023

[SDPA] Call _sdp_attention in nn.functional.mha #89470

[SDPA] Call _sdp_attention in nn.functional.mha #89470

Uh oh!

Conversation

drisspg commented Nov 22, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot bot commented Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89470

✅ No Failures

Uh oh!

drisspg Nov 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drisspg Nov 23, 2022

Choose a reason for hiding this comment

Uh oh!

drisspg Nov 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drisspg commented Nov 23, 2022

Uh oh!

bdhirsh Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bdhirsh Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

drisspg Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Nov 28, 2022

Uh oh!

mikekgfb Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drisspg commented Nov 28, 2022

Uh oh!

pytorchmergebot commented Nov 29, 2022

Merge started

Uh oh!

facebook-github-bot commented Nov 30, 2022

Uh oh!

facebook-github-bot commented Nov 30, 2022

Uh oh!

facebook-github-bot commented Nov 30, 2022

Uh oh!

facebook-github-bot commented Nov 30, 2022

Uh oh!

facebook-github-bot commented Nov 30, 2022

Uh oh!

facebook-github-bot commented Nov 30, 2022

Uh oh!

facebook-github-bot commented Dec 1, 2022

Uh oh!

facebook-github-bot commented Dec 2, 2022

Uh oh!

facebook-github-bot commented Dec 2, 2022

Uh oh!

pytorchmergebot commented Dec 2, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

drisspg commented Nov 22, 2022 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 22, 2022 •

edited

Loading

drisspg Nov 23, 2022 •

edited

Loading

drisspg Nov 23, 2022 •

edited

Loading

mikekgfb Nov 28, 2022 •

edited

Loading