-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
[Models] Use in-place adds in Idefics2Vision #23932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a performance optimization in Idefics2VisionTransformer by replacing out-of-place additions with in-place additions. The changes in Idefics2VisionEmbeddings and Idefics2EncoderLayer are correct and safe, as they operate on locally-scoped tensors, avoiding side effects. This modification should lead to reduced memory allocation and improved throughput as described in the pull request, which is a valuable improvement for inference performance. The changes are well-implemented. As you suggested in the description, applying this optimization to other models in the codebase would likely be beneficial as well.
|
Thanks for the optimization. Let's open a separate PR with a benchmark for each model |
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Head branch was pushed to by a user without write access
4c086b6 to
c727b73
Compare
|
Rebased to re-trigger CI... |
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Purpose
This changes the
Idefics2VisionTransformermodel to use in-place adds where possible. This is relevant since vision encoders currently run outside of torch compile so things like this won't be automatically optimised away. See also #18922.Related to: #23884
Test Plan
Test Result
In an internal benchmark with many image inputs this has a surprisingly big performance impact. I'm seeing a 5.5% - 6.2% increase in throughput with the
openbmb/MiniCPM-V-4_5model running on a L40s GPU.@DarkLight1337 let me know if you'd like me to do a bit of search and replace and apply similar changes in the other model definitions as well which might also benefit text only models when run in eager mode.