CPython has a fast path for compact integers in binary add/sub, but wide exact ints still go through the generic long arithmetic path even when both operands fit in int64_t.
This issue proposes adding a separate fast path for exact PyLong operands that fit in signed 64-bit integers, while preserving the existing compact-int path.
Suggested implementation:
- Keep the current compact-int specialization unchanged.
- Add a separate wide-int path for exact ints that fit in int64_t.
- Preserve current behavior for overflow, subclasses, and other non-exact-int cases.
Motivation:
- Improve performance for wide integer add/sub without affecting the common compact-int hot path.
- Avoid adding new opcodes in the compact-int path.
- Fit within the current interpreter and specialization structure.
Benchmark evidence:
- I prototyped this locally with a benchmark covering compact and wide add/sub cases.
- Wide cases improved substantially, while compact cases remained effectively flat.
- Representative interpreter-only results with JIT disabled:
- add_wide: about 25% faster
- sub_wide: about 35% faster
- add_compact/sub_compact: effectively unchanged
Benchmark script used locally:
"""Microbenchmark compact vs wide int add/sub with pyperf.
Use this with PYTHON_JIT=0 and -S if you want a stable interpreter-only run:
PYTHON_JIT=0 ./python.exe -S Tools/scripts/bench_wide_int_pyperf.py
"""
from __future__ import annotations
import pyperf
def bench_add_compact() -> int:
a = 1
b = 2
return a + b
def bench_add_wide() -> int:
a = 10_000_000_000
b = 1
return a + b
def bench_sub_compact() -> int:
a = 1
b = 2
return a - b
def bench_sub_wide() -> int:
a = 10_000_000_000
b = 1
return a - b
def main() -> None:
runner = pyperf.Runner()
runner.bench_func("add_compact", bench_add_compact)
runner.bench_func("add_wide", bench_add_wide)
runner.bench_func("sub_compact", bench_sub_compact)
runner.bench_func("sub_wide", bench_sub_wide)
if __name__ == "__main__":
main()
Linked PRs
CPython has a fast path for compact integers in binary add/sub, but wide exact ints still go through the generic long arithmetic path even when both operands fit in int64_t.
This issue proposes adding a separate fast path for exact PyLong operands that fit in signed 64-bit integers, while preserving the existing compact-int path.
Suggested implementation:
Motivation:
Benchmark evidence:
Benchmark script used locally:
Linked PRs