-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[RELAND] [cuDNN] Add a new optimized cuDNN RNN algorithm for small RNN hidden_size #73211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…MALL_H algorithm (pytorch#72089)" This reverts commit c93d6f9.
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit eeb5945 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
What about regular persistent algorithm? Should it bail out when inputs are packed? Does it do it (if so, where? It's not in the algo selection function)? |
|
|
|
The other persistent algo already bails out if the input is packed (and it only triggers for fp16 data). |
|
I reworked the checks, hopefully good now. |
|
Does PERSIST_STATIC_SMALL_H support arbitrary hidden and input sizes? For regular persist algorithm there are checks that those are multiples of 128 or 64 or something. |
No idea. @xwang233 did you discuss this at all with cudnn people during the original PR? |
|
Only small hidden sizes are supported and those are already in the PR code. All input sizes should be supported. |
|
@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…N hidden_size (#73211) Summary: #62143 was reverted (#72089) because, when running native tests internally with cudnn and GPUs such that `CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H` was used, we hit some `CUDNN_STATUS_NOT_SUPPORTED` errors. Based on https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#features-of-rnn-functions and experiments, I strongly suspect the errors were because `CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H` doesn't support variable sequence lengths in the batch. This PR restores #62143 and adds a bailout condition if the input is a packed batch that might have different sequence lengths per element. Question for review: Do we also need to add a bailout condition if the input is double precision? Pull Request resolved: #73211 Reviewed By: ejguan Differential Revision: D34688016 Pulled By: ngimel fbshipit-source-id: e7335c4701dabc7d0b36ebdb6414c4353a71ee91
|
Hey @mcarilli. |
…N hidden_size (#73211) Summary: pytorch/pytorch#62143 was reverted (pytorch/pytorch#72089) because, when running native tests internally with cudnn and GPUs such that `CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H` was used, we hit some `CUDNN_STATUS_NOT_SUPPORTED` errors. Based on https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#features-of-rnn-functions and experiments, I strongly suspect the errors were because `CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H` doesn't support variable sequence lengths in the batch. This PR restores pytorch/pytorch#62143 and adds a bailout condition if the input is a packed batch that might have different sequence lengths per element. Question for review: Do we also need to add a bailout condition if the input is double precision? Pull Request resolved: pytorch/pytorch#73211 Reviewed By: ejguan Differential Revision: D34688016 Pulled By: ngimel fbshipit-source-id: e7335c4701dabc7d0b36ebdb6414c4353a71ee91 (cherry picked from commit b9023bfd1c31eb9a38bf0552a20412e9a4e60b91)
…N hidden_size (#73211) Summary: pytorch/pytorch#62143 was reverted (pytorch/pytorch#72089) because, when running native tests internally with cudnn and GPUs such that `CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H` was used, we hit some `CUDNN_STATUS_NOT_SUPPORTED` errors. Based on https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#features-of-rnn-functions and experiments, I strongly suspect the errors were because `CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_H` doesn't support variable sequence lengths in the batch. This PR restores pytorch/pytorch#62143 and adds a bailout condition if the input is a packed batch that might have different sequence lengths per element. Question for review: Do we also need to add a bailout condition if the input is double precision? Pull Request resolved: pytorch/pytorch#73211 Reviewed By: ejguan Differential Revision: D34688016 Pulled By: ngimel fbshipit-source-id: e7335c4701dabc7d0b36ebdb6414c4353a71ee91 (cherry picked from commit b9023bfd1c31eb9a38bf0552a20412e9a4e60b91)
#62143 was reverted (#72089) because, when running native tests internally with cudnn and GPUs such that
CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_Hwas used, we hit someCUDNN_STATUS_NOT_SUPPORTEDerrors.Based on https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#features-of-rnn-functions and experiments, I strongly suspect the errors were because
CUDNN_RNN_ALGO_PERSIST_STATIC_SMALL_Hdoesn't support variable sequence lengths in the batch.This PR restores #62143 and adds a bailout condition if the input is a packed batch that might have different sequence lengths per element.
Question for review: Do we also need to add a bailout condition if the input is double precision?