bpo-45225: use map function instead of genexpr in capwords#28342
Merged
rhettinger merged 2 commits intopython:mainfrom Sep 16, 2021
speedrun-program:patch-1
Merged
bpo-45225: use map function instead of genexpr in capwords#28342rhettinger merged 2 commits intopython:mainfrom speedrun-program:patch-1
rhettinger merged 2 commits intopython:mainfrom
speedrun-program:patch-1
Conversation
In string.py, the capwords function passes str.join a generator expression, but the map function
could be used instead. This is how capwords is currently written:
--------------------
def capwords(s, sep=None):
"""
docstring text
"""
return (sep or ' ').join(x.capitalize() for x in s.split(sep))
--------------------
This is how capwords could be written:
--------------------
def capwords(s, sep=None):
"""
docstring text
"""
return (sep or ' ').join(map(str.capitalize, s.split(sep)))
--------------------
These are the benefits:
1. Faster performance which increases with the number of times the str is split.
2. Very slightly smaller .py and .pyc file sizes.
3. Source code is slightly more concise.
This is the performance test code:
--------------------
from timeit import timeit
setup = """
def capwords_current(s, sep=None):
return (sep or ' ').join(x.capitalize() for x in s.split(sep))
def capwords_new(s, sep=None):
return (sep or ' ').join(map(str.capitalize, s.split(sep)))
tests = ["a " * 10**n for n in range(9)]
tests.append("a " * (10**9 // 2)) # I only have 16GB of RAM
"""
print("empty str without map:", timeit(setup=setup, stmt="x = capwords_current('')", number=1))
print("empty str with map :", timeit(setup=setup, stmt="x = capwords_new('')", number=1))
for n in range(9):
print("- " * 20)
print(f"10**{n} without map:", timeit(setup=setup, stmt=f"x = capwords_current(tests[{n}])", number=1))
print(f"10**{n} with map :", timeit(setup=setup, stmt=f"x = capwords_new(tests[{n}])", number=1))
print("- " * 20)
print("10**9 // 2 without map:", timeit(setup=setup, stmt="x = capwords_current(tests[9])", number=1))
print("10**9 // 2 with map :", timeit(setup=setup, stmt="x = capwords_new(tests[9])", number=1))
print("done")
--------------------
These are the results of a performance test:
--------------------
empty str without map: 2.0000000000020002e-05
empty str with map : 1.8100000000020877e-05
- - - - - - - - - - - - - - - - - - - -
10**0 without map: 1.6600000000033255e-05
10**0 with map : 1.650000000008589e-05
- - - - - - - - - - - - - - - - - - - -
10**1 without map: 2.0399999999920482e-05
10**1 with map : 1.889999999993286e-05
- - - - - - - - - - - - - - - - - - - -
10**2 without map: 5.489999999985784e-05
10**2 with map : 4.6400000000001995e-05
- - - - - - - - - - - - - - - - - - - -
10**3 without map: 0.00026530000000013487
10**3 with map : 0.0001765000000002459
- - - - - - - - - - - - - - - - - - - -
10**4 without map: 0.0026298000000002375
10**4 with map : 0.0014880999999999922
- - - - - - - - - - - - - - - - - - - -
10**5 without map: 0.023361799999999988
10**5 with map : 0.016615499999999894
- - - - - - - - - - - - - - - - - - - -
10**6 without map: 0.24672029999999978
10**6 with map : 0.1923338999999995
- - - - - - - - - - - - - - - - - - - -
10**7 without map: 2.562209
10**7 with map : 1.8905919000000004
- - - - - - - - - - - - - - - - - - - -
10**8 without map: 26.3537843
10**8 with map : 18.781561099999998
- - - - - - - - - - - - - - - - - - - -
10**9 // 2 without map: 349.0668948
10**9 // 2 with map : 312.15139230000005
done
--------------------
|
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA). Recognized GitHub usernameWe couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames: This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for the contribution, we look forward to reviewing it! |
rhettinger
approved these changes
Sep 16, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In string.py, the capwords function passes str.join a generator expression, but the map function
could be used instead. This is how capwords is currently written:
This is how capwords could be written:
These are the benefits:
Faster performance which increases with the number of times the str is split.
Very slightly smaller .py and .pyc file sizes.
Source code is slightly more concise.
This is the performance test code in ipython:
These are the results of a performance test using %timeit in ipython:
%timeit x = capwords_current("")
835 ns ± 15.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit x = capwords_new("")
758 ns ± 35.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit x = capwords_current(tests[0])
977 ns ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit x = capwords_new(tests[0])
822 ns ± 30 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit x = capwords_current(tests[1])
3.07 µs ± 88.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit x = capwords_new(tests[1])
2.17 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit x = capwords_current(tests[2])
28 µs ± 896 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit x = capwords_new(tests[2])
19.4 µs ± 352 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit x = capwords_current(tests[3])
236 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit x = capwords_new(tests[3])
153 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit x = capwords_current(tests[4])
2.12 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = capwords_new(tests[4])
1.5 ms ± 9.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit x = capwords_current(tests[5])
23.8 ms ± 1.38 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit x = capwords_new(tests[5])
15.6 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit x = capwords_current(tests[6])
271 ms ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_new(tests[6])
192 ms ± 807 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit x = capwords_current(tests[7])
2.66 s ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_new(tests[7])
1.95 s ± 26.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_current(tests[8])
25.9 s ± 80.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_new(tests[8])
18.4 s ± 123 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_current(tests[9])
6min 17s ± 29 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit x = capwords_new(tests[9])
5min 36s ± 24.8 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
https://bugs.python.org/issue45225