-
Notifications
You must be signed in to change notification settings - Fork 31.5k
Optimize to_py_obj for python-native numeric lists and scalars
#36885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
|
This seems like a good idea, but I worry that there might be some edge cases where it fails because it only tests the first element! One idea would be to use a conversion like
Then we can guarantee that the input list/tuple was a flat list of numbers and just return the original list. It would guarantee correctness without needing to recursively call |
|
@Rocketknight1 Great idea. Thanks for the suggestion! I ran some benchmarks comparing different approaches (using the same example code, but with an increased number of iterations). Here are the results:
As you see, the Also, instead of checking whether the array has 1 dimension and returning Let me know if you have any feedback or further suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this seems good now! I made one small suggestion, so let me know what you think and then we can merge this.
src/transformers/utils/generic.py
Outdated
| """ | ||
| Convert a TensorFlow tensor, PyTorch tensor, Numpy array or python list to a python list. | ||
| """ | ||
| if is_py_number(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if is_py_number(obj): | |
| if isinstance(obj, (int, float)): |
Since we're only using the function once, we can just inline it directly here. I think it's understandable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you merge this you should also delete is_py_number()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied in a04e338. Also added some test code for to_py_obj here.
|
Also, one more thought: I think this should still work even if |
|
This looks good to me now! Ping me whenever you're ready for me to merge it @n0gu-furiosa |
|
@Rocketknight1 Everything’s ready on my end. Please feel free to merge whenever you get a chance. Thanks in advance! |
|
thanks 🤗 |
…gingface#36885) * Optimize to_py_obj for python-native numeric lists and scalars * Fix bug that tuple is not converted to list * Try np.array for more robust type checking * Apply review and add tests for to_py_obj
…gingface#36885) * Optimize to_py_obj for python-native numeric lists and scalars * Fix bug that tuple is not converted to list * Try np.array for more robust type checking * Apply review and add tests for to_py_obj
What does this PR do?
This PR optimizes
to_py_objby adding early returns for python-native numeric scalars and 1D lists/tuples of numbers. This avoids unnecessary recursive conversions, which can significantly impact performance ofdecode().Fixes #36872
In the provided example from #36872, the runtime decreased from approximately 11 seconds to 0.8 seconds.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@Rocketknight1 @ArthurZucker