bpo-38015: Replace get_small_int() function with macro version#15718
bpo-38015: Replace get_small_int() function with macro version#15718sir-sigurd wants to merge 1 commit intopython:masterfrom
Conversation
This produces tiny speed-up for handling of small ints in PyLong_FromLong(), due to the avoidance of unnecessary casts.
452940f to
0209249
Compare
gnprice
left a comment
There was a problem hiding this comment.
This is really quite significantly more complex to read than the function version is, and I think that cost is much more than the payoff in this case.
Making is_small_int into a macro (GH-15710) was quite clean, because the body is a simple expression that doesn't have to change. The difference is only at the boundary of the function/macro, where a regular C-level function gets nice things like proper type-checking and visibility to debuggers and other tools. We give those up, but that's not too bad (partly because, again, the function-or-macro is so tiny.)
But in this case there's significantly more inside the body, and turning it into a macro involves contorting it in all the usual ways involved in complex macros. That's quite a steep cost in code complexity. Such a cost is only worth it when there's a solid payoff. From the issue thread, the payoff here is "small speedup in tightly-focused microbenchmark, on some compilers".
Remember that among the costs of making the code more complex is that it's harder to change -- for example, to make other optimizations. So the sum of a lot of micro-optimizations isn't just a small total speedup -- if they each add complexity as much as this one does, they can also have the effect of moving other, bigger possible optimizations from "hard but doable" to "infeasible", so that the end state is slower than if we hadn't done the micro-optimizations at all.
I've seen other PRs you've made that have a much higher ratio of payoff to complexity-cost than this one. I think more of those would be great. 🙂
More generally, if you profile CPython running some ordinary Python code, there are lots of things that could potentially be optimized with the right insight or effort. Ideally that's the direction to start from in finding things to optimize, so you know the piece you're working on is connected to what matters for performance in practice.
This produces tiny speed-up for handling of small ints in
PyLong_FromLong(), due to the avoidance of unnecessary casts.https://bugs.python.org/issue38015