@@ -1319,14 +1319,17 @@ This section covers specific optimizations independent of the
13191319Faster CPython
13201320==============
13211321
1322- CPython 3.11 is on average `25% faster <https://github.com/faster-cpython/ideas#published-results >`_
1323- than CPython 3.10 when measured with the
1322+ CPython 3.11 is an average of
1323+ `25% faster <https://github.com/faster-cpython/ideas#published-results >`_
1324+ than CPython 3.10 as measured with the
13241325`pyperformance <https://github.com/python/pyperformance >`_ benchmark suite,
1325- and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
1326- could be up to 10-60% faster .
1326+ when compiled with GCC on Ubuntu Linux.
1327+ Depending on your workload, the overall speedup could be 10-60%.
13271328
1328- This project focuses on two major areas in Python: faster startup and faster
1329- runtime. Other optimizations not under this project are listed in `Optimizations `_.
1329+ This project focuses on two major areas in Python:
1330+ :ref: `whatsnew311-faster-startup ` and :ref: `whatsnew311-faster-runtime `.
1331+ Optimizations not covered by this project are listed separately under
1332+ :ref: `whatsnew311-optimizations `.
13301333
13311334
13321335.. _whatsnew311-faster-startup :
@@ -1339,8 +1342,8 @@ Faster Startup
13391342Frozen imports / Static code objects
13401343^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13411344
1342- Python caches bytecode in the :ref: `__pycache__<tut-pycache> ` directory to
1343- speed up module loading.
1345+ Python caches :term: ` bytecode ` in the :ref: `__pycache__ <tut-pycache >`
1346+ directory to speed up module loading.
13441347
13451348Previously in 3.10, Python module execution looked like this:
13461349
@@ -1349,8 +1352,9 @@ Previously in 3.10, Python module execution looked like this:
13491352 Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate
13501353
13511354 In Python 3.11, the core modules essential for Python startup are "frozen".
1352- This means that their code objects (and bytecode) are statically allocated
1353- by the interpreter. This reduces the steps in module execution process to this:
1355+ This means that their :ref: `codeobjects ` (and bytecode)
1356+ are statically allocated by the interpreter.
1357+ This reduces the steps in module execution process to:
13541358
13551359.. code-block :: text
13561360
@@ -1359,7 +1363,7 @@ by the interpreter. This reduces the steps in module execution process to this:
13591363 Interpreter startup is now 10-15% faster in Python 3.11. This has a big
13601364impact for short-running programs using Python.
13611365
1362- (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
1366+ (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in many issues.)
13631367
13641368
13651369.. _whatsnew311-faster-runtime :
@@ -1372,17 +1376,19 @@ Faster Runtime
13721376Cheaper, lazy Python frames
13731377^^^^^^^^^^^^^^^^^^^^^^^^^^^
13741378
1375- Python frames are created whenever Python calls a Python function. This frame
1376- holds execution information. The following are new frame optimizations:
1379+ Python frames, holding execution information,
1380+ are created whenever Python calls a Python function.
1381+ The following are new frame optimizations:
13771382
13781383- Streamlined the frame creation process.
13791384- Avoided memory allocation by generously re-using frame space on the C stack.
13801385- Streamlined the internal frame struct to contain only essential information.
13811386 Frames previously held extra debugging and memory management information.
13821387
1383- Old-style frame objects are now created only when requested by debuggers or
1384- by Python introspection functions such as ``sys._getframe `` or
1385- ``inspect.currentframe ``. For most user code, no frame objects are
1388+ Old-style :ref: `frame objects <frame-objects >`
1389+ are now created only when requested by debuggers
1390+ or by Python introspection functions such as :func: `sys._getframe ` and
1391+ :func: `inspect.currentframe `. For most user code, no frame objects are
13861392created at all. As a result, nearly all Python functions calls have sped
13871393up significantly. We measured a 3-7% speedup in pyperformance.
13881394
@@ -1403,10 +1409,11 @@ In 3.11, when CPython detects Python code calling another Python function,
14031409it sets up a new frame, and "jumps" to the new code inside the new frame. This
14041410avoids calling the C interpreting function altogether.
14051411
1406- Most Python function calls now consume no C stack space. This speeds up
1407- most of such calls. In simple recursive functions like fibonacci or
1408- factorial, a 1.7x speedup was observed. This also means recursive functions
1409- can recurse significantly deeper (if the user increases the recursion limit).
1412+ Most Python function calls now consume no C stack space, speeding them up.
1413+ In simple recursive functions like fibonacci or
1414+ factorial, we observed a 1.7x speedup. This also means recursive functions
1415+ can recurse significantly deeper
1416+ (if the user increases the recursion limit with :func: `sys.setrecursionlimit `).
14101417We measured a 1-3% improvement in pyperformance.
14111418
14121419(Contributed by Pablo Galindo and Mark Shannon in :issue: `45256 `.)
@@ -1417,7 +1424,7 @@ We measured a 1-3% improvement in pyperformance.
14171424PEP 659: Specializing Adaptive Interpreter
14181425^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14191426
1420- :pep: `659 ` is one of the key parts of the faster CPython project. The general
1427+ :pep: `659 ` is one of the key parts of the Faster CPython project. The general
14211428idea is that while Python is a dynamic language, most code has regions where
14221429objects and types rarely change. This concept is known as *type stability *.
14231430
@@ -1426,17 +1433,18 @@ in the executing code. Python will then replace the current operation with a
14261433more specialized one. This specialized operation uses fast paths available only
14271434to those use cases/types, which generally outperform their generic
14281435counterparts. This also brings in another concept called *inline caching *, where
1429- Python caches the results of expensive operations directly in the bytecode.
1436+ Python caches the results of expensive operations directly in the
1437+ :term: `bytecode `.
14301438
14311439The specializer will also combine certain common instruction pairs into one
1432- superinstruction. This reduces the overhead during execution.
1440+ superinstruction, reducing the overhead during execution.
14331441
14341442Python will only specialize
14351443when it sees code that is "hot" (executed multiple times). This prevents Python
1436- from wasting time for run-once code. Python can also de-specialize when code is
1444+ from wasting time on run-once code. Python can also de-specialize when code is
14371445too dynamic or when the use changes. Specialization is attempted periodically,
1438- and specialization attempts are not too expensive. This allows specialization
1439- to adapt to new circumstances.
1446+ and specialization attempts are not too expensive,
1447+ allowing specialization to adapt to new circumstances.
14401448
14411449(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
14421450See :pep: `659 ` for more information. Implementation by Mark Shannon and Brandt
@@ -1449,32 +1457,32 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
14491457| Operation | Form | Specialization | Operation speedup | Contributor(s) |
14501458| | | | (up to) | |
14511459+===============+====================+=======================================================+===================+===================+
1452- | Binary | ``x+x; x*x; x-x; `` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
1453- | operations | | such as `` int ``, `` float ``, and `` str `` take custom | | Dong-hee Na, |
1454- | | | fast paths for their underlying types. | | Brandt Bucher, |
1460+ | Binary | ``x + x `` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
1461+ | operations | | such as :class: ` int `, :class: ` float ` and :class: ` str ` | | Dong-hee Na, |
1462+ | | `` x - x `` | take custom fast paths for their underlying types. | | Brandt Bucher, |
14551463| | | | | Dennis Sweeney |
1464+ | | ``x * x `` | | | |
14561465+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1457- | Subscript | ``a[i] `` | Subscripting container types such as `` list ``, | 10-25% | Irit Katriel, |
1458- | | | `` tuple `` and `` dict `` directly index the underlying | | Mark Shannon |
1459- | | | data structures. | | |
1466+ | Subscript | ``a[i] `` | Subscripting container types such as :class: ` list `, | 10-25% | Irit Katriel, |
1467+ | | | :class: ` tuple ` and :class: ` dict ` directly index | | Mark Shannon |
1468+ | | | the underlying data structures. | | |
14601469| | | | | |
1461- | | | Subscripting custom `` __getitem__ `` | | |
1470+ | | | Subscripting custom :meth: ` ~object. __getitem__ ` | | |
14621471| | | is also inlined similar to :ref: `inline-calls `. | | |
14631472+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
14641473| Store | ``a[i] = z `` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney |
14651474| subscript | | | | |
14661475+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
14671476| Calls | ``f(arg) `` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, |
1468- | | ``C(arg) `` | as ``len `` and ``str `` directly call their underlying | | Ken Jin |
1469- | | | C version. This avoids going through the internal | | |
1470- | | | calling convention. | | |
1471- | | | | | |
1477+ | | | as :func: `len ` and :class: `str ` directly call their | | Ken Jin |
1478+ | | ``C(arg) `` | underlying C version. This avoids going through the | | |
1479+ | | | internal calling convention. | | |
14721480+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1473- | Load | ``print `` | The object's index in the globals/builtins namespace | [1 ]_ | Mark Shannon |
1474- | global | `` len `` | is cached. Loading globals and builtins require | | |
1475- | variable | | zero namespace lookups. | | |
1481+ | Load | ``print `` | The object's index in the globals/builtins namespace | [#load-global ]_ | Mark Shannon |
1482+ | global | | is cached. Loading globals and builtins require | | |
1483+ | variable | `` len `` | zero namespace lookups. | | |
14761484+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1477- | Load | ``o.attr `` | Similar to loading global variables. The attribute's | [2 ]_ | Mark Shannon |
1485+ | Load | ``o.attr `` | Similar to loading global variables. The attribute's | [#load-attr ]_ | Mark Shannon |
14781486| attribute | | index inside the class/object's namespace is cached. | | |
14791487| | | In most cases, attribute loading will require zero | | |
14801488| | | namespace lookups. | | |
@@ -1486,14 +1494,15 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
14861494| Store | ``o.attr = z `` | Similar to load attribute optimization. | 2% | Mark Shannon |
14871495| attribute | | | in pyperformance | |
14881496+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1489- | Unpack | ``*seq `` | Specialized for common containers such as ``list `` | 8% | Brandt Bucher |
1490- | Sequence | | and ``tuple ``. Avoids internal calling convention. | | |
1497+ | Unpack | ``*seq `` | Specialized for common containers such as | 8% | Brandt Bucher |
1498+ | Sequence | | :class: `list ` and :class: `tuple `. | | |
1499+ | | | Avoids internal calling convention. | | |
14911500+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
14921501
1493- .. [1 ] A similar optimization already existed since Python 3.8. 3.11
1494- specializes for more forms and reduces some overhead.
1502+ .. [#load-global ] A similar optimization already existed since Python 3.8.
1503+ 3.11 specializes for more forms and reduces some overhead.
14951504
1496- .. [2 ] A similar optimization already existed since Python 3.10.
1505+ .. [#load-attr ] A similar optimization already existed since Python 3.10.
14971506 3.11 specializes for more forms. Furthermore, all attribute loads should
14981507 be sped up by :issue: `45947 `.
14991508
@@ -1503,49 +1512,72 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
15031512Misc
15041513----
15051514
1506- * Objects now require less memory due to lazily created object namespaces. Their
1507- namespace dictionaries now also share keys more freely.
1515+ * Objects now require less memory due to lazily created object namespaces.
1516+ Their namespace dictionaries now also share keys more freely.
15081517 (Contributed Mark Shannon in :issue: `45340 ` and :issue: `40116 `.)
15091518
1519+ * "Zero-cost" exceptions are implemented, eliminating the cost
1520+ of :keyword: `try ` statements when no exception is raised.
1521+ (Contributed by Mark Shannon in :issue: `40222 `.)
1522+
15101523* A more concise representation of exceptions in the interpreter reduced the
15111524 time required for catching an exception by about 10%.
15121525 (Contributed by Irit Katriel in :issue: `45711 `.)
15131526
1527+ * :mod: `re `'s regular expression matching engine has been partially refactored,
1528+ and now uses computed gotos (or "threaded code") on supported platforms. As a
1529+ result, Python 3.11 executes the `pyperformance regular expression benchmarks
1530+ <https://pyperformance.readthedocs.io/benchmarks.html#regex-dna> `_ up to 10%
1531+ faster than Python 3.10.
1532+ (Contributed by Brandt Bucher in :gh: `91404 `.)
1533+
15141534
15151535.. _whatsnew311-faster-cpython-faq :
15161536
15171537FAQ
15181538---
15191539
1520- | Q: How should I write my code to utilize these speedups?
1521- |
1522- | A: You don't have to change your code. Write Pythonic code that follows common
1523- best practices. The Faster CPython project optimizes for common code
1524- patterns we observe.
1525- |
1526- |
1527- | Q: Will CPython 3.11 use more memory?
1528- |
1529- | A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
1530- This is offset by memory optimizations for frame objects and object
1531- dictionaries as mentioned above.
1532- |
1533- |
1534- | Q: I don't see any speedups in my workload. Why?
1535- |
1536- | A: Certain code won't have noticeable benefits. If your code spends most of
1537- its time on I/O operations, or already does most of its
1538- computation in a C extension library like numpy, there won't be significant
1539- speedup. This project currently benefits pure-Python workloads the most.
1540- |
1541- | Furthermore, the pyperformance figures are a geometric mean. Even within the
1542- pyperformance benchmarks, certain benchmarks have slowed down slightly, while
1543- others have sped up by nearly 2x!
1544- |
1545- |
1546- | Q: Is there a JIT compiler?
1547- |
1548- | A: No. We're still exploring other optimizations.
1540+ .. _faster-cpython-faq-my-code :
1541+
1542+ How should I write my code to utilize these speedups?
1543+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1544+
1545+ Write Pythonic code that follows common best practices;
1546+ you don't have to change your code.
1547+ The Faster CPython project optimizes for common code patterns we observe.
1548+
1549+
1550+ .. _faster-cpython-faq-memory :
1551+
1552+ Will CPython 3.11 use more memory?
1553+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1554+
1555+ Maybe not; we don't expect memory use to exceed 20% higher than 3.10.
1556+ This is offset by memory optimizations for frame objects and object
1557+ dictionaries as mentioned above.
1558+
1559+
1560+ .. _faster-cpython-ymmv :
1561+
1562+ I don't see any speedups in my workload. Why?
1563+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1564+
1565+ Certain code won't have noticeable benefits. If your code spends most of
1566+ its time on I/O operations, or already does most of its
1567+ computation in a C extension library like NumPy, there won't be significant
1568+ speedups. This project currently benefits pure-Python workloads the most.
1569+
1570+ Furthermore, the pyperformance figures are a geometric mean. Even within the
1571+ pyperformance benchmarks, certain benchmarks have slowed down slightly, while
1572+ others have sped up by nearly 2x!
1573+
1574+
1575+ .. _faster-cpython-jit :
1576+
1577+ Is there a JIT compiler?
1578+ ^^^^^^^^^^^^^^^^^^^^^^^^
1579+
1580+ No. We're still exploring other optimizations.
15491581
15501582
15511583.. _whatsnew311-faster-cpython-about :
0 commit comments