-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Speed up CUDA kernel launch when block/thread extents are statically known #42899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…known [ghstack-poisoned]
| // TODO: eventually, codegen these calculations and make them part of the | ||
| // module. | ||
| for (size_t i = 0; i < gpu_block_extents.size(); i++) { | ||
| auto extent = dynamic_cast<const IntImm*>(gpu_block_extents[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's almost certainly an int, but supporting more dtypes doesn't cost us much here:
if (gpu_block_extents[i]->isConstant()) {
gpu_block_extents_v[i] = immediateAs<int>(gpu_block_extents[i]);
}
…statically known" Differential Revision: [D23078708](https://our.internmc.facebook.com/intern/diff/D23078708) [ghstack-poisoned]
…statically known" Differential Revision: [D23078708](https://our.internmc.facebook.com/intern/diff/D23078708) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 49a3618 (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
🚧 1 fixed upstream failure:These were probably caused by upstream breakages that were already fixed.
Please rebase on the
|
|
@bertmaher merged this pull request in 1adeed2. |
Stack from ghstack:
Differential Revision: D23078708