gh-91247: improve performance of list and tuple repeat (with specialization for n=1)#91482
Conversation
05e2b70 to
5445cfb
Compare
|
@sweeneyde Could you have a look at this ticket? The approach is similar to #31856 and #31999 which you reviewed. |
1491705 to
66cc8ae
Compare
|
@eendebakpt Sorry this slipped under my radar @vstinner, would you mind reviewing? I think the change is good in principle but I want another reviewer and to make sure the header file changes are appropriate. |
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
Misc/NEWS.d/next/Core and Builtins/2022-03-22-13-12-27.bpo-47091.tJcy-P.rst
Outdated
Show resolved
Hide resolved
…91.tJcy-P.rst Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
Co-authored-by: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
sweeneyde
left a comment
There was a problem hiding this comment.
Some minor nits, but this is looking good. Thanks for your patience and persistence on this.
Co-authored-by: Dennis Sweeney <36520290+sweeneyde@users.noreply.github.com>
|
I verified a benchmark on a Windows machine since this might be a bit compiler-specific. Looks great! |
|
🤖 New build scheduled with the buildbot fleet by @sweeneyde for commit 516f091 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
| *dest++ = *src++; | ||
| } | ||
|
|
||
| _Py_memory_repeat((char *)np->ob_item, sizeof(PyObject *)*output_size, |
There was a problem hiding this comment.
Note to self: this cannot overflow because list_new_prealloc uses PyMem_New, and so this multiplication already succeeded at some point.
Compared to current main branch there are 3 improvements
This PR keeps the specializations for repeats of lists and tuples with length 1. An alternative PR without specialization for n=1 (#32045) was rejected because the specializations might yield a speedup.
For performance comparisons to main and the version without specializations and more detailed discussion see issue #91247.
The inefficient pointer chasing can be seen in the following performance benchmark:
(results obtained with
runner.timeit(name=f"list({n}) repeat {r}", stmt=f"x=a*{r}", setup=f'a=list(range({n}))'))