Skip to content

bpo-47012: speed up iteration of bytes and bytearray#31867

Merged
sweeneyde merged 7 commits intopython:mainfrom
kumaraditya303:speed-iter
Mar 23, 2022
Merged

bpo-47012: speed up iteration of bytes and bytearray#31867
sweeneyde merged 7 commits intopython:mainfrom
kumaraditya303:speed-iter

Conversation

@kumaraditya303
Copy link
Copy Markdown
Contributor

@kumaraditya303 kumaraditya303 commented Mar 14, 2022

Benchmark:

from pyperf import Runner, perf_counter

def bench_bytes(loops, length):
    src = b'helloworld' * length
    t0 = perf_counter()
    for _ in range(loops):
        for i in src:
            pass
    return perf_counter() - t0

def bench_bytearray(loops, length):
    src = bytearray(b'hello' * length)
    t0 = perf_counter()
    for _ in range(loops):
        for i in src:
            pass
    return perf_counter() - t0

runner = Runner()
for n in [10_000, 100_000]:
    runner.bench_time_func(f"bytes {n}", bench_bytes, n)
    runner.bench_time_func(f"bytearray {n}", bench_bytearray, n)

Results:

bytes 10000: Mean +- std dev: [base] 829 us +- 38 us -> [patch] 677 us +- 44 us: 1.23x faster
bytearray 10000: Mean +- std dev: [base] 523 us +- 34 us -> [patch] 360 us +- 19 us: 1.45x faster
bytes 100000: Mean +- std dev: [base] 8.33 ms +- 0.38 ms -> [patch] 6.89 ms +- 0.75 ms: 1.21x faster
bytearray 100000: Mean +- std dev: [base] 5.19 ms +- 0.23 ms -> [patch] 3.61 ms +- 0.23 ms: 1.44x faster

Geometric mean: 1.33x faster

https://bugs.python.org/issue47012

@kumaraditya303 kumaraditya303 changed the title speed up iteration of bytes and bytearray bpo-47012: speed up iteration of bytes and bytearray Mar 14, 2022
@kumaraditya303 kumaraditya303 marked this pull request as ready for review March 14, 2022 11:14
@sweeneyde
Copy link
Copy Markdown
Member

I agree with @animalize that it would be safest to include a preprocessor directive, whether that's with separate #if _PY_NSMALLPOSINTS > 255/#else code, or with a #error directive.

I'm not sure why anyone would compile with fewer than 256 cached small ints. Here, @vstinner added the code

// _PyLong_GetZero() and _PyLong_GetOne() must always be available
#if _PY_NSMALLPOSINTS < 2
#  error "_PY_NSMALLPOSINTS must be greater than 1"
#endif

@vstinner, would there be any downside to requiring all of (0, 1, ..., 255) be in the small int cache?

@vstinner
Copy link
Copy Markdown
Member

I don't think that anyone ever tuned _PY_NSMALLPOSINTS. The value should be hardcoded. But just for sanity, you can add a static_assert() in code which makes assumptions about its value, just in case if someone changes _PY_NSMALLPOSINTS in the future. For example, I added assertions to ensure that 0 and 1 singletons always exist.

@kumaraditya303
Copy link
Copy Markdown
Contributor Author

@sweeneyde I have added the compilation guard, but FYI if _PY_NSMALLPOSINTS is changed then it would break deepfreeze and module freezing infra so it is not configurable not to mention that it is declared in internal header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants