Victor Stinner blog 3https://vstinner.github.io/2025-10-17T13:00:00+02:00PEP 782 – Add PyBytesWriter C API2025-10-17T13:00:00+02:002025-10-17T13:00:00+02:00Victor Stinnertag:vstinner.github.io,2025-10-17:/pep-782-pybyteswriter-c-api.html<a class="reference external image-reference" href="https://en.wikipedia.org/wiki/Arrietty"> <img alt="The Secret World of Arrietty" src="https://vstinner.github.io/images/arrietty.jpg" /> </a> <p>In the Python C API, I dislike APIs modifying immutable objects such as <tt class="docutils literal">_PyBytes_Resize()</tt>. I designed a whole new <tt class="docutils literal">PyBytesWriter</tt> API to replace this <tt class="docutils literal">_PyBytes_Resize()</tt> function. As usual in Python, it took multiple iterations and one year to design the API and to reach an agreement.</p> <p>Picture: <em>The Secret World …</em></p><a class="reference external image-reference" href="https://en.wikipedia.org/wiki/Arrietty"> <img alt="The Secret World of Arrietty" src="https://vstinner.github.io/images/arrietty.jpg" /> </a> <p>In the Python C API, I dislike APIs modifying immutable objects such as <tt class="docutils literal">_PyBytes_Resize()</tt>. I designed a whole new <tt class="docutils literal">PyBytesWriter</tt> API to replace this <tt class="docutils literal">_PyBytes_Resize()</tt> function. As usual in Python, it took multiple iterations and one year to design the API and to reach an agreement.</p> <p>Picture: <em>The Secret World of Arrietty by Hayao Miyazaki</em>.</p> <div class="section" id="original-private-pybyteswriter-api"> <h2>Original private _PyBytesWriter API</h2> <p>In 2016 (Python 3.6), I designed a private <tt class="docutils literal">_PyBytesWriter</tt> API to create <tt class="docutils literal">bytes</tt> objects in an efficient way, especially by overallocating a buffer. See my article <a class="reference external" href="https://vstinner.github.io/pybyteswriter.html">Fast _PyAccu, _PyUnicodeWriter and_PyBytesWriter APIs to produce strings in CPython</a> about this API (and other similar APIs).</p> <p>In July 2023 (Python 3.13), I moved the private <tt class="docutils literal">_PyBytesWriter</tt> API to the internal C API. See the article <a class="reference external" href="https://vstinner.github.io/remove-c-api-funcs-313.html">Remove private C API functions</a>.</p> </div> <div class="section" id="first-public-api-attempt"> <h2>First public API attempt</h2> <p>In June 2024, Marc-Andre Lemburg <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/70#issuecomment-2158541011">asked</a> to make the private <tt class="docutils literal">_PyBytesWriter</tt> API public.</p> <p>In July, I wrote a first public API attempt: <a class="reference external" href="https://github.com/python/cpython/pull/121726">PR gh-121726</a>. API:</p> <div class="highlight"><pre><span></span><span class="n">PyBytesWriter</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_Create</span><span class="p">(</span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">str</span><span class="p">)</span> <span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_Finish</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">str</span><span class="p">)</span> <span class="kt">void</span><span class="w"> </span><span class="n">PyBytesWriter_Discard</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">)</span> <span class="kt">int</span><span class="w"> </span><span class="n">PyBytesWriter_Prepare</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">)</span> <span class="kt">int</span><span class="w"> </span><span class="n">PyBytesWriter_WriteBytes</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">bytes</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">)</span> </pre></div> <p>Example creating the string <tt class="docutils literal">&quot;abc&quot;</tt>:</p> <div class="highlight"><pre><span></span><span class="n">PyObject</span><span class="o">*</span> <span class="nf">create_abc</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">str</span><span class="p">;</span> <span class="w"> </span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyBytesWriter_Create</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">str</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">writer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">memcpy</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;abc&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">);</span> <span class="w"> </span><span class="n">str</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">3</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyBytesWriter_Finish</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">str</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>With a <tt class="docutils literal">PyBytesWriter_Prepare(writer, &amp;str, size)</tt> to preallocate the buffer.</p> <p>The implementation was fully based on the private structure:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">PyObject</span><span class="w"> </span><span class="o">*</span><span class="n">buffer</span><span class="p">;</span> <span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">allocated</span><span class="p">;</span> <span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">min_size</span><span class="p">;</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">use_bytearray</span><span class="p">;</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">overallocate</span><span class="p">;</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">use_small_buffer</span><span class="p">;</span> <span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">small_buffer</span><span class="p">[</span><span class="mi">512</span><span class="p">];</span> <span class="p">}</span><span class="w"> </span><span class="n">_PyBytesWriter</span><span class="p">;</span> </pre></div> <p>In August, I created a <a class="reference external" href="https://github.com/capi-workgroup/decisions/issues/39">C API Working Group decision</a>. Sadly, this API didn't convinced the C API WG which found the <tt class="docutils literal">Prepare()</tt> API confusing and the <em>str</em> variable hard to use.</p> <p>In October, I closed the decision issue:</p> <blockquote> It seems like this API is too low-level and too error-prone. I prefer to abandon promoting this API as a public API for now. We can revisit this API later if needed.</blockquote> </div> <div class="section" id="second-public-api-attempt"> <h2>Second public API attempt</h2> <p>In February 2025, I gave a try to a second public API: <a class="reference external" href="https://github.com/python/cpython/issues/129813">issue gh-129813</a> and <a class="reference external" href="https://github.com/python/cpython/pull/129814">PR gh-129814</a>. API:</p> <div class="highlight"><pre><span></span><span class="kt">void</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_Create</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">**</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">alloc</span><span class="p">)</span> <span class="kt">void</span><span class="w"> </span><span class="n">PyBytesWriter_Discard</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">)</span> <span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_Finish</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span> <span class="kt">void</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_Extend</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">extend</span><span class="p">)</span> <span class="kt">void</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_WriteBytes</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">bytes</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">)</span> <span class="kt">void</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_Format</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">format</span><span class="p">,</span><span class="w"> </span><span class="p">...)</span> </pre></div> <p>The API now uses <tt class="docutils literal">void*</tt> instead of <tt class="docutils literal">char*</tt> for the buffer and I added <tt class="docutils literal">PyBytesWriter_Format()</tt> function.</p> <p>Example creating the string <tt class="docutils literal">&quot;abc&quot;</tt>:</p> <div class="highlight"><pre><span></span><span class="n">PyObject</span><span class="o">*</span> <span class="nf">create_abc</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">;</span> <span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyBytesWriter_Create</span><span class="p">(</span><span class="o">&amp;</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">buf</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">memcpy</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;abc&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">);</span> <span class="w"> </span><span class="n">buf</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">3</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyBytesWriter_Finish</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>The API is similar to the first version, but <tt class="docutils literal">PyBytesWriter_Create()</tt> now returns a <tt class="docutils literal">void*</tt> instead of the <tt class="docutils literal">PyBytesWriter*</tt>.</p> <p>With a <tt class="docutils literal">buf = PyBytesWriter_Extend(writer, buf, str_size)</tt> API to preallocate the bufer.</p> <p>The implementation now uses a new dedicated simpler structure (less members):</p> <div class="highlight"><pre><span></span><span class="k">struct</span><span class="w"> </span><span class="nc">PyBytesWriter</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">small_buffer</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span> <span class="w"> </span><span class="n">PyObject</span><span class="w"> </span><span class="o">*</span><span class="n">obj</span><span class="p">;</span> <span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">;</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">use_bytearray</span><span class="p">;</span> <span class="p">};</span> </pre></div> <p>This time, I followed <strong>Petr Viktorin</strong>'s advice and I created a <a class="reference external" href="https://discuss.python.org/t/add-pybyteswriter-public-c-api/81182">discussion on Discourse</a>. Again, the API was not liked by other developers who were confused by the API.</p> <p>In March, I gave up again, and closed my PR:</p> <blockquote> It seems like most developers are confused by the API which requires to pass writer and buf to most functions. I abandon this API.</blockquote> </div> <div class="section" id="third-public-api-pep-782"> <h2>Third public API: PEP 782</h2> <p>Following <strong>Antoine Pitrou</strong>'s link, I had a look at <a class="reference external" href="https://github.com/apache/arrow/blob/b3d218c819283bafe973dc7deb5214324f4a68b2/cpp/src/arrow/buffer_builder.h#L176-L191">Arrow C++ BufferBuilder API</a>. Antoine helped me to design a better API using size and without the <tt class="docutils literal">void *buf</tt> parameter.</p> <p>At the end of March, I wrote <a class="reference external" href="https://peps.python.org/pep-0782/">PEP 782 – Add PyBytesWriter C API</a> and created a <a class="reference external" href="https://discuss.python.org/t/pep-782-add-pybyteswriter-c-api/86617">new discussion on the PEP</a>.</p> <p>Example creating the string <tt class="docutils literal">&quot;abc&quot;</tt>:</p> <div class="highlight"><pre><span></span><span class="n">PyObject</span><span class="w"> </span><span class="o">*</span> <span class="nf">create_abc</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyBytesWriter_Create</span><span class="p">(</span><span class="mi">3</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">writer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyBytesWriter_GetData</span><span class="p">(</span><span class="n">writer</span><span class="p">);</span> <span class="w"> </span><span class="n">memcpy</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;abc&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyBytesWriter_Finish</span><span class="p">(</span><span class="n">writer</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>With a <tt class="docutils literal">PyBytesWriter_Resize(writer, size)</tt> API to preallocate the buffer. The <em>size</em> is now absolute, rather than being relative.</p> <p>The mandatory <tt class="docutils literal">void *buf</tt> parameter was replaced with <tt class="docutils literal">PyBytesWriter_GetData()</tt> function.</p> <p>In May, I submitted the PEP to the Steering Council. In September, the Steering Council <a class="reference external" href="https://discuss.python.org/t/pep-782-add-pybyteswriter-c-api/86617/15">approved PEP 782</a>! (Yeah, it took them 4 months to take a decision.)</p> </div> <div class="section" id="final-api"> <h2>Final API</h2> <div class="highlight"><pre><span></span><span class="n">PyBytesWriter</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_Create</span><span class="p">(</span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">)</span> <span class="kt">void</span><span class="w"> </span><span class="n">PyBytesWriter_Discard</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">)</span> <span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_Finish</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">)</span> <span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_FinishWithSize</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">)</span> <span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_FinishWithPointer</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span> <span class="kt">void</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_GetData</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">)</span> <span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">PyBytesWriter_GetSize</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">)</span> <span class="kt">int</span><span class="w"> </span><span class="n">PyBytesWriter_WriteBytes</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">bytes</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">)</span> <span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">PyBytesWriter_Format</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">format</span><span class="p">,</span><span class="w"> </span><span class="p">...)</span> <span class="kt">int</span><span class="w"> </span><span class="n">PyBytesWriter_Resize</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">)</span> <span class="kt">int</span><span class="w"> </span><span class="n">PyBytesWriter_Grow</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">)</span> <span class="kt">void</span><span class="o">*</span><span class="w"> </span><span class="n">PyBytesWriter_GrowAndUpdatePointer</span><span class="p">(</span><span class="n">PyBytesWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">size</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span> </pre></div> <p>See the <a class="reference external" href="https://docs.python.org/dev/c-api/bytes.html#pybyteswriter">documentation</a>.</p> </div> <div class="section" id="implementation"> <h2>Implementation</h2> <p>In September, I implemented the <tt class="docutils literal">PyBytesWriter</tt> API in the main branch (future Python 3.15) with documentation and tests.</p> <p>I also modified code using soft deprecated APIs, <tt class="docutils literal">PyBytes_FromStringAndSize(NULL, size)</tt> and <tt class="docutils literal">_PyBytes_Resize()</tt>, to use the new <tt class="docutils literal">PyBytesWriter</tt> API instead. When doing these conversions, I ran benchmarks to check that there is no significant impact on performance. Examples of benchmarks:</p> <ul class="simple"> <li><a class="reference external" href="https://github.com/python/cpython/pull/138829#issuecomment-3288634427">os.read(1)</a></li> <li><a class="reference external" href="https://github.com/python/cpython/pull/138955#issuecomment-3293995606">io.FileIO.read(1)</a></li> <li><a class="reference external" href="https://github.com/python/cpython/pull/138954#issuecomment-3293959389">io.BufferedReader.read1(1)</a></li> <li><a class="reference external" href="https://github.com/python/cpython/pull/138874#issuecomment-3288721255">utf8_encoder()</a></li> </ul> <p>For example, I abandonned these two changes:</p> <ul class="simple"> <li><a class="reference external" href="https://github.com/python/cpython/pull/138840#issuecomment-3288640248">bytes_concat: 1.4x slower</a></li> <li><a class="reference external" href="https://github.com/python/cpython/pull/138957">_json: 1.10x slower</a></li> </ul> <p>Later, other people joined the party and found other opportunity for <tt class="docutils literal">PyBytesWriter</tt> with great optimizations:</p> <ul class="simple"> <li><a class="reference external" href="https://github.com/python/cpython/pull/139976">zstd: 10-30% speedup for decompression</a></li> <li><a class="reference external" href="https://github.com/python/cpython/pull/140150">Python parser, concatenate bytes strings: 4x faster</a></li> <li><a class="reference external" href="https://github.com/python/cpython/pull/140139">io.RawIOBase.readall(): 4x faster</a></li> </ul> </div> PEP 757 – C API to import-export Python integers2025-09-11T16:00:00+02:002025-09-11T16:00:00+02:00Victor Stinnertag:vstinner.github.io,2025-09-11:/pep-757-c-api-import-export-integers.html<a class="reference external image-reference" href="https://en.wikipedia.org/wiki/Ponyo"> <img alt="Ponyo movie" src="https://vstinner.github.io/images/ponyo.jpg" /> </a> <p>Design an API can take time. This article describes the design of the C API to import and export Python integers. It takes place between August 2023 and December 2024. In total, the discussions got more than 448 messages!</p> <p>The API is a thin abstraction on top of CPython implementation …</p><a class="reference external image-reference" href="https://en.wikipedia.org/wiki/Ponyo"> <img alt="Ponyo movie" src="https://vstinner.github.io/images/ponyo.jpg" /> </a> <p>Design an API can take time. This article describes the design of the C API to import and export Python integers. It takes place between August 2023 and December 2024. In total, the discussions got more than 448 messages!</p> <p>The API is a thin abstraction on top of CPython implementation details to access integer internals. The API has an O(1) complexity: in practice, no memory is copied (at least, in the current CPython implementation).</p> <p>Picture: <em>Ponyo movie by Hayao Miyazaki</em>.</p> <div class="section" id="python-3-13-alpha-1-removes-pylong-new"> <h2>Python 3.13 alpha 1 removes _PyLong_New()</h2> <p>In August 2023, I <a class="reference external" href="https://github.com/python/cpython/pull/108604">removed</a> the <tt class="docutils literal">_PyLong_New()</tt> function as part of <a class="reference external" href="https://discuss.python.org/t/c-api-my-plan-to-clarify-private-vs-public-functions-in-python-3-13/30131">My plan to clarify private vs public C API functions in Python 3.13</a>.</p> <p>In October, Python 3.13.0 alpha 1 was released without this function. <strong>Sergey B Kirpichev</strong> reported that the gmpy2 project uses <tt class="docutils literal">_PyLong_New()</tt> and asked how to replace the removed function. He created <a class="reference external" href="https://github.com/python/cpython/issues/111415">issue gh-111415</a>: Consider restoring _PyLong_New() function as public.</p> </div> <div class="section" id="python-3-13-alpha-2-restores-pylong-new"> <h2>Python 3.13 alpha 2 restores _PyLong_New()</h2> <p>In November, the private <tt class="docutils literal">_PyLong_New()</tt> function has been restored in Python 3.13 alpha 2 which was released at November 22.</p> </div> <div class="section" id="add-public-function-pylong-getdigits"> <h2>Add public function PyLong_GetDigits()</h2> <p>In June 2024, <strong>Sergey B Kirpichev</strong> opened the <a class="reference external" href="https://github.com/capi-workgroup/decisions/issues/31">[C API Working Group] decision issue #31</a>: Add public function <tt class="docutils literal">PyLong_GetDigits()</tt>. API:</p> <pre class="literal-block"> const digits* PyLong_GetDigits(PyObject* obj, Py_ssize_t *ndigits) </pre> <p>I disliked this API since it's too close to the exact implementation. The API cannot be implemented in an efficient way if implementation details change.</p> <p>For example, in the future, CPython might adopt tagged pointers for small integers and so don't have a concrete array of digits.</p> <p>There was a call for a different API to address these issues.</p> </div> <div class="section" id="pylong-export-and-pylong-import-functions"> <h2>PyLong_Export() and PyLong_Import() functions</h2> <div class="section" id="first-api"> <h3>First API</h3> <p>In July, I created <a class="reference external" href="https://github.com/python/cpython/pull/121339">gh-121339</a> pull request to propose a different API. Later, I opened the <a class="reference external" href="https://github.com/capi-workgroup/decisions/issues/35">[C API Working Group] decision issue #35</a> (which got 51 messages): Add import-export API for Python int objects. API:</p> <pre class="literal-block"> // Layout API typedef struct PyLongLayout { uint8_t bits_per_digit; uint8_t digit_size; int8_t word_endian; int8_t array_endian; } PyLongLayout; const PyLongLayout PyLong_LAYOUT; // Export API typedef struct PyLong_DigitArray { PyObject *obj; int negative; Py_ssize_t ndigits; const Py_digit *digits; } PyLong_DigitArray; int PyLong_AsDigitArray(PyObject *obj, PyLong_DigitArray *array) void PyLong_FreeDigitArray(PyLong_DigitArray *array) // Import API PyLongWriter* PyLongWriter_Create(int negative, Py_ssize_t ndigits, Py_digit **digits) PyObject* PyLongWriter_Finish(PyLongWriter *writer) </pre> </div> <div class="section" id="api-changes"> <h3>API changes</h3> <p>The <a class="reference external" href="https://github.com/python/cpython/pull/121339">pull request</a> got 277 messages (!) between July 2024 and February 2025. The API names were discussed in length, and better names have been proposed.</p> <p>The <tt class="docutils literal">PyLong_LAYOUT</tt> constant was replaced with the <tt class="docutils literal">PyLong_GetNativeLayout()</tt> function to have a better ABI.</p> <p>The dependency to the <tt class="docutils literal">Py_digit</tt> type has been removed. The <tt class="docutils literal">Py_digit*</tt> type was replaced with <tt class="docutils literal">void*</tt> to support digit of arbitrary size.</p> <p><tt class="docutils literal">PyLong_Export.obj</tt> became private: <tt class="docutils literal">obj</tt> was renamed to <tt class="docutils literal">reserved</tt>, and its specific type <tt class="docutils literal">PyObject*</tt> became the opaque <tt class="docutils literal">Py_uintptr_t</tt> type.</p> <p><tt class="docutils literal">PyLongWriter_Discard()</tt> function was added to handle errors.</p> </div> </div> <div class="section" id="pep-757"> <h2>PEP 757</h2> <p>In September, <strong>Sergey</strong> and me wrote <a class="reference external" href="https://peps.python.org/pep-0757/">PEP 757 – C API to import-export Python integers</a>: the <a class="reference external" href="https://discuss.python.org/t/pep-757-c-api-to-import-export-python-integers/63895">discussion</a> got 80 messages.</p> <p>It was proposed to use an union for PyLongExport with a <tt class="docutils literal">kind</tt> member to select the format (small integer or digit array). The idea was abandonned in the meanwhile.</p> <p>For small integers, there is no warranty that they will be stored as a digit array in the future. The <tt class="docutils literal">PyLongExport.value</tt> member (<tt class="docutils literal">int64_t</tt>) was added to store the value of small integers.</p> <p>There were two open questions:</p> <ul class="simple"> <li>Should we add <tt class="docutils literal">digits_order</tt> and <tt class="docutils literal">endian</tt> members to <tt class="docutils literal">sys.int_info</tt> and remove <tt class="docutils literal">PyLong_GetNativeLayout()</tt>? The <tt class="docutils literal">PyLong_GetNativeLayout()</tt> function returns a C structure which is more convenient to use in C than <tt class="docutils literal">sys.int_info</tt> which uses Python objects.</li> <li>Should we use anonymous union.</li> </ul> <p>It was decided to leave <tt class="docutils literal">sys.int_info</tt> unchanged and keep <tt class="docutils literal">PyLong_GetNativeLayout()</tt>.</p> <p>It was also decided to avoid anonymous union to avoid any risk of compatibility issue with old C versions.</p> <p><a class="reference external" href="https://peps.python.org/pep-0757/#benchmarks">Benchmarks</a> measured the abstraction cost: it is between 1.04x slower and 1.27x faster. It means that the abstraction has no significant impact on the performance. In short (geometric mean):</p> <ul class="simple"> <li>Export: 1.05x faster</li> <li>Import: 1.03x slower</li> </ul> </div> <div class="section" id="overwhelmed"> <h2>Overwhelmed</h2> <p>After months of discussions and many back and forth on the API, <a class="reference external" href="https://discuss.python.org/t/pep-757-c-api-to-import-export-python-integers/63895/66">I got overwhelmed</a> and close to give up. Hopefully, I didn't give up.</p> </div> <div class="section" id="c-api-working-group-and-steering-council"> <h2>C API Working Group and Steering Council</h2> <p>In October, I opened a C API Working Group vote on PEP 757: <a class="reference external" href="https://github.com/capi-workgroup/decisions/issues/45">decision issue #45</a> which got 40 messages.</p> <p>At November 28, 2024, the C API WG accepted the PEP and I <a class="reference external" href="https://github.com/python/steering-council/issues/264">submitted the PEP</a> to the Steering Council.</p> <p>One week later, at December 8, the Steering Council accepted PEP 757 as well!</p> </div> <div class="section" id="final-api"> <h2>Final API</h2> <p>After many iterations, the final API is:</p> <pre class="literal-block"> // Layout API typedef struct PyLongLayout { uint8_t bits_per_digit; uint8_t digit_size; int8_t digits_order; int8_t digit_endianness; } PyLongLayout; const PyLongLayout* PyLong_GetNativeLayout(void) // Export API typedef struct PyLongExport { int64_t value; uint8_t negative; Py_ssize_t ndigits; const void *digits; Py_uintptr_t _reserved; } PyLongExport; int PyLong_Export(PyObject *obj, PyLongExport *export_long) void PyLong_FreeExport(PyLongExport *export_long) // Import API PyLongWriter* PyLongWriter_Create(int negative, Py_ssize_t ndigits, void **digits) PyObject* PyLongWriter_Finish(PyLongWriter *writer) void PyLongWriter_Discard(PyLongWriter *writer) </pre> <p>The <tt class="docutils literal">decimal</tt> extension and <tt class="docutils literal">Python/marshal.c</tt> have been modified to use this API.</p> </div> My Python commits: February 20252025-03-11T15:00:00+01:002025-03-11T15:00:00+01:00Victor Stinnertag:vstinner.github.io,2025-03-11:/python-commits-february-2025.html<p>Here is a report on my 18 commits merged into Python in February 2025:</p> <ul class="simple"> <li>Reorganize C API tests</li> <li>Use PyErr_FormatUnraisable()</li> <li>Reorganize includes</li> <li>C API: Remove PySequence_Fast()</li> <li>C API: Fix function signatures</li> <li>C API: Deprecate private _PyUnicodeWriter</li> <li>Documentation</li> <li>Misc changes</li> </ul> <a class="reference external image-reference" href="https://en.wikipedia.org/wiki/Luncheon_of_the_Boating_Party"> <img alt="Le Déjeuner des Canotiers by Auguste Renoir" src="https://vstinner.github.io/images/dejeuner_canotiers.jpg" /> </a> <p>Painting: <em>Le Déjeuner des Canotiers</em> (1881) by <em>Auguste Renoir</em>.</p> <div class="section" id="gh-93649-reorganize-c-api-tests"> <h2>gh-93649: Reorganize …</h2></div><p>Here is a report on my 18 commits merged into Python in February 2025:</p> <ul class="simple"> <li>Reorganize C API tests</li> <li>Use PyErr_FormatUnraisable()</li> <li>Reorganize includes</li> <li>C API: Remove PySequence_Fast()</li> <li>C API: Fix function signatures</li> <li>C API: Deprecate private _PyUnicodeWriter</li> <li>Documentation</li> <li>Misc changes</li> </ul> <a class="reference external image-reference" href="https://en.wikipedia.org/wiki/Luncheon_of_the_Boating_Party"> <img alt="Le Déjeuner des Canotiers by Auguste Renoir" src="https://vstinner.github.io/images/dejeuner_canotiers.jpg" /> </a> <p>Painting: <em>Le Déjeuner des Canotiers</em> (1881) by <em>Auguste Renoir</em>.</p> <div class="section" id="gh-93649-reorganize-c-api-tests"> <h2>gh-93649: Reorganize C API tests</h2> <p>Tests on the C API are written in Python and C. The C part is made of a big file <tt class="docutils literal">Modules/_testcapimodule.c</tt> (4,410 lines) and 37 C files in the <tt class="docutils literal">Modules/_testcapi/</tt> directory. At the beginning, <tt class="docutils literal">_testcapimodule.c</tt> was the only file and there is a work-in-progress to split it into smaller files.</p> <p>I moved more codes from <tt class="docutils literal">_testcapimodule.c</tt> into sub-files:</p> <ul class="simple"> <li>Add <tt class="docutils literal">Modules/_testcapi/frame.c</tt> file.</li> <li>Add <tt class="docutils literal">Modules/_testcapi/type.c</tt> file.</li> <li>Add <tt class="docutils literal">Modules/_testcapi/function.c</tt> file.</li> <li>Move <tt class="docutils literal">_testcapi</tt> tests to specific files.</li> </ul> <p><tt class="docutils literal">_testcapimodule.c</tt> size before/after my changes:</p> <ul class="simple"> <li>Before: <strong>4,410</strong> lines</li> <li>After: <strong>3,375 lines</strong> (-1,035 lines: 23% smaller)</li> </ul> </div> <div class="section" id="gh-129354-use-pyerr-formatunraisable"> <h2>gh-129354: Use PyErr_FormatUnraisable()</h2> <p>When an error occurs, Python usually raises an exception to let the developer decides how to handle the error. In some rare cases, exceptions cannot be raised and <a class="reference external" href="https://docs.python.org/dev/library/sys.html#sys.unraisablehook">sys.unraisablehook</a> is called instead.</p> <p>Before, many of these &quot;unraisable exceptions&quot; were logged with limited or no context. I modified these functions to explain why these errors were logged.</p> <p>Example of change:</p> <pre class="literal-block"> - PyErr_WriteUnraisable(self); + PyErr_FormatUnraisable(&quot;Exception ignored &quot; + &quot;while finalizing file %R&quot;, self); </pre> <p>Before, only the <em>self</em> object was logged with a generic error message. Now the &quot;Exception ignored while finalizing file&quot; specific message is logged which explains where the error comes from.</p> <p>I replaced <tt class="docutils literal">PyErr_FormatUnraisable()</tt> with <tt class="docutils literal">PyErr_FormatUnraisable()</tt> in 20 C files. And I had to update 7 related Python test files.</p> </div> <div class="section" id="gh-129539-reorganize-includes"> <h2>gh-129539: Reorganize includes</h2> <p>The <a class="reference external" href="https://github.com/python/cpython/blob/052cb717f5f97d08d2074f4118fd2c21224d3015/Modules/posixmodule.c">posixmodule.c</a> file is the biggest C file of the Python project: it is made of 18,206 lines of C code.</p> <p>It starts with 600 lines of code to include 103 header files. These lines were not well organized leading to <a class="reference external" href="https://github.com/python/cpython/issues/129539">a bug (EX_OK symbol)</a>.</p> <p>I <a class="reference external" href="https://github.com/python/cpython/commit/df4a2f5bd74fc582d99e6a82e070058d7765f44d">reorganized these 600 lines</a> to add sections, group similar includes, and add a comment explaining why each include is needed. For example, the <tt class="docutils literal">&lt;unistd.h&gt;</tt> header is needed to get the <tt class="docutils literal">symlink()</tt> function:</p> <pre class="literal-block"> #ifdef HAVE_UNISTD_H # include &lt;unistd.h&gt; // symlink() #endif </pre> </div> <div class="section" id="gh-91417-c-api-remove-pysequence-fast"> <h2>gh-91417, C API: Remove PySequence_Fast()</h2> <p>While digging into <a class="reference external" href="https://github.com/python/cpython/issues?q=state%3Aopen%20label%3A%22topic-C-API%22">open C API issues</a>, I found an <a class="reference external" href="https://github.com/python/cpython/issues/91417">old bug</a> (2022) about the <tt class="docutils literal">PySequence_Fast()</tt> function in the limited C API.</p> <p>The <tt class="docutils literal">PySequence_Fast()</tt> function should be used with <tt class="docutils literal">PySequence_Fast_GET_SIZE()</tt> and <tt class="docutils literal">PySequence_Fast_GET_ITEM()</tt> macros, but these macros don't work in the limited C API.</p> <p>I decided to <a class="reference external" href="https://github.com/python/cpython/commit/2ad069d906c6952250dabbffbcb882676011b310">remove PySequence_Fast()</a> and these macros from the limited C API. The function never worked with the limited C API. It was added by mistake.</p> <p>Sadly, one month later, my colleague Karolina Surma <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=2345504">discovered</a> that <a class="reference external" href="https://github.com/python/cpython/issues/130947">PyQt6 is broken by Python 3.14a5</a>: PyQt6 uses the removed <tt class="docutils literal">PySequence_Fast()</tt>! I'm <a class="reference external" href="https://github.com/python/cpython/pull/130948">working on adding the function back</a>.</p> </div> <div class="section" id="gh-111178-c-api-fix-function-signatures"> <h2>gh-111178, C API: Fix function signatures</h2> <p>When Python is built with <tt class="docutils literal">clang <span class="pre">-fsanitize=undefined</span></tt>, Python fails quickly on calling functions with the wrong ABI. For example, the <tt class="docutils literal">tp_dealloc</tt> ABI is:</p> <pre class="literal-block"> void tp_dealloc(PyObject *self) </pre> <p>whereas the built-in <tt class="docutils literal">list</tt> type used the ABI:</p> <pre class="literal-block"> void list_dealloc(PyListObject *op) </pre> <p><tt class="docutils literal">PyObject*</tt> and <tt class="docutils literal">PyListObject*</tt> are not the same type causing an <a class="reference external" href="https://en.wikipedia.org/wiki/Undefined_behavior">undefined behavior</a>.</p> <p>The correct function signature is:</p> <pre class="literal-block"> void list_dealloc(PyObject *op) </pre> <p>In February, I fixed the function signature in 4 files:</p> <ul class="simple"> <li><tt class="docutils literal">symtable.c</tt></li> <li><tt class="docutils literal">namespaceobject.c</tt></li> <li><tt class="docutils literal">instruction_sequence.c</tt></li> <li><tt class="docutils literal">sliceobject.c</tt></li> </ul> <p>Since October 2023, there is a <a class="reference external" href="https://github.com/python/cpython/issues/111178">long on-going work-in-progress</a> to fix all function signatures. It's a lot of work. At the end of February 2025, 97 pull requests have already been merged to fix signatures.</p> </div> <div class="section" id="gh-128863-c-api-deprecate-private-pyunicodewriter"> <h2>gh-128863, C API: Deprecate private _PyUnicodeWriter</h2> <p>I added a <a class="reference external" href="https://docs.python.org/dev/c-api/unicode.html#pyunicodewriter">new public PyUnicodeWriter C API</a> to Python 3.14. So I deprecated the old private <tt class="docutils literal">_PyUnicodeWriter</tt> C API:</p> <ul class="simple"> <li><tt class="docutils literal">_PyUnicodeWriter_Init()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_Finish()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_Dealloc()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteChar()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteStr()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteSubstring()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteASCIIString()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteLatin1String()</tt></li> </ul> <p>This deprecation was controversial and has to go through a <a class="reference external" href="https://github.com/capi-workgroup/decisions/issues/57">C API Working Group decision</a>.</p> </div> <div class="section" id="documentation"> <h2>Documentation</h2> <ul class="simple"> <li>gh-129342: <a class="reference external" href="https://github.com/python/cpython/commit/632ca568219f86679661bc288f46fa5838102ede">Explain how to replace Py_GetProgramName() in C</a></li> <li>gh-101944: <a class="reference external" href="https://github.com/python/cpython/commit/04264a286e5ddfe8ac7423f7376ca34a2ca8b7ba">Clarify PyModule_AddObjectRef() documentation</a></li> </ul> </div> <div class="section" id="misc-changes"> <h2>Misc changes</h2> <ul class="simple"> <li>gh-128911: Use the new <a class="reference external" href="https://docs.python.org/dev/c-api/import.html#c.PyImport_ImportModuleAttr">PyImport_ImportModuleAttr()</a> function:<ul> <li>Replace <tt class="docutils literal">PyImport_ImportModule()</tt> + <tt class="docutils literal">PyObject_GetAttr()</tt> with <tt class="docutils literal">PyImport_ImportModuleAttr()</tt>.</li> <li>Replace <tt class="docutils literal">PyImport_ImportModule()</tt> + <tt class="docutils literal">PyObject_GetAttrString()</tt> with <tt class="docutils literal">PyImport_ImportModuleAttrString()</tt>.</li> </ul> </li> <li>gh-129363: <a class="reference external" href="https://github.com/python/cpython/commit/f1b81c408fb83beeee519ae4fb9d3a36dd4522b3">Add colors to tests run in sequentially mode</a>. First, write the test name without color. Then, write the test name and the result with color. Each test is displayed twice.</li> <li>gh-109959: Remove <tt class="docutils literal">test_glob.test_selflink()</tt> test. The test is not reliable, <a class="reference external" href="https://github.com/python/cpython/issues/109959#issuecomment-2577550700">it fails randomly on Linux</a>.</li> </ul> </div> PEP 741: C API to configure Python initialization2024-09-24T17:00:00+02:002024-09-24T17:00:00+02:00Victor Stinnertag:vstinner.github.io,2024-09-24:/pyconfig-pep-741.html<div class="section" id="pep-741-story"> <h2>PEP 741 story</h2> <a class="reference external image-reference" href="https://en.wikipedia.org/wiki/The_Starry_Night"> <img alt="The Starry Night (1889) by Vincent Van Gogh" src="https://vstinner.github.io/images/starry_night_van_gogh.jpg" /> </a> <p>Sometimes, writing a PEP can be a wild ride. It took two whole years between the early discussions and getting <a class="reference external" href="https://peps.python.org/pep-0741/">PEP 741</a> eventually accepted by the Steering Council. The API is only made of 18 functions, but it took more than 200 messages to design properly these …</p></div><div class="section" id="pep-741-story"> <h2>PEP 741 story</h2> <a class="reference external image-reference" href="https://en.wikipedia.org/wiki/The_Starry_Night"> <img alt="The Starry Night (1889) by Vincent Van Gogh" src="https://vstinner.github.io/images/starry_night_van_gogh.jpg" /> </a> <p>Sometimes, writing a PEP can be a wild ride. It took two whole years between the early discussions and getting <a class="reference external" href="https://peps.python.org/pep-0741/">PEP 741</a> eventually accepted by the Steering Council. The API is only made of 18 functions, but it took more than 200 messages to design properly these functions!</p> <p>PEP 741 is new C API to configure the Python initialization using strings for option names. It also provides a new API to get the current runtime Python configuration.</p> <p>In 2019, I wrote <a class="reference external" href="https://peps.python.org/pep-0587/">PEP 587 – Python Initialization Configuration</a>. It was supposed to be the only API replacing all scattered existing APIs. Well, it seems like it wasn't complete enough and its design shown some issues since I decided to write a new PEP 741!</p> <p>Painting: <em>The Starry Night (1889) by Vincent Van Gogh</em>.</p> </div> <div class="section" id="august-2022-cve-2020-10735-fix"> <h2>August 2022: CVE-2020-10735 fix</h2> <p>In August 2022, Gregory P. Smith opened <a class="reference external" href="https://discuss.python.org/t/fr-allow-private-runtime-config-to-enable-extending-without-breaking-the-pyconfig-abi/18004">FR: Allow private runtime config to enable extending without breaking the PyConfig ABI</a> discussion to propose supporting configuration as text. Example:</p> <pre class="literal-block"> check_hash_pycs_mode=always unknownok:avoid_medusas_gaze=yes </pre> <p>The need was to add a new option to fix CVE-2020-10735 vulnerability. It will become <tt class="docutils literal">PyConfig.int_max_str_digits</tt> in Python 3.12. The problem is to add a new <tt class="docutils literal">PyConfig</tt> member without breaking the ABI in stable Python versions (such as Python 3.11). At the end, the problem was worked around by adding a separated global variable (<tt class="docutils literal">_Py_global_config_int_max_str_digits</tt>).</p> </div> <div class="section" id="august-2023-first-implementation"> <h2>August 2023: First implementation</h2> <p>In August 2023, I created <a class="reference external" href="https://github.com/python/cpython/issues/107954">an issue</a> to implement Gregory's idea. I wrote a proof-of-concept to accept configuration as text in a format similar to TOML. Example:</p> <pre class="literal-block"> # int bytes_warning = 2 # string filesystem_encoding = &quot;utf8&quot; # comment # list argv = ['python', '-c', 'code'] # you can put comments for the fun verbose = 1 # comment here as well # after, anywhere! </pre> <p>Quickly, I ran into parsing issues with quotes and escaping characters such newlines and quotes.</p> </div> <div class="section" id="october-2023"> <h2>October 2023</h2> <div class="section" id="rewrite"> <h3>Rewrite</h3> <p>I decided to write a new implementation using configuration option names as strings and values as integer, string, or string list. Example:</p> <pre class="literal-block"> if (PyInitConfig_SetInt(config, &quot;dev_mode&quot;, 1) &lt; 0) { goto error; } </pre> <p>The Python initialization is a complex beast. How to allocate memory when the memory allocator is not configured yet? Which encoding should be used, knowing that the locale encoding is not configured yet?</p> <p>I started with wide string (<tt class="docutils literal">wchar_t*</tt>) and bytes string (<tt class="docutils literal">char*</tt>). The bytes strings should be decoded from the locale encoding which requires to preinitialize Python to configure the locale encoding.</p> </div> <div class="section" id="getter-functions"> <h3>Getter functions</h3> <p>I was asked to add getter functions such as <tt class="docutils literal">PyInitConfig_GetInt()</tt> and <tt class="docutils literal">PyInitConfig_GetStr()</tt>.</p> </div> <div class="section" id="current-configuration"> <h3>Current configuration</h3> <p>I was also asked to add functions to get the current runtime configuration. I proposed the following API:</p> <pre class="literal-block"> int PyConfig_GetInt(const char *key, int64_t *value); int PyConfig_GetStr(const char *key, PyObject **value); int PyConfig_GetStrList(const char *key, PyObject **value); </pre> <ul class="simple"> <li>Raise <tt class="docutils literal">ValueError</tt> if the key doesn't exist.</li> <li>Raise <tt class="docutils literal">TypeError</tt> if it's the wrong type.</li> <li><tt class="docutils literal">PyConfig_GetInt()</tt> raises OverflowError if the value doesn’t fit into <tt class="docutils literal">int64_t</tt>. It cannot happen with the current implementation.</li> </ul> <p>I wrote <a class="reference external" href="https://github.com/python/cpython/pull/112609">an implementation</a> to play with the API.</p> </div> <div class="section" id="custom-options"> <h3>Custom options</h3> <p>With my proposed <tt class="docutils literal">PyInitConfig</tt> API, we can accept custom options and store them in a separated hash table, and later expose them as a dict.</p> <p>Example:</p> <pre class="literal-block"> PyInitConfig_SetInt(&quot;accept_custom_options&quot;, 1); PyInitConfig_SetStr(&quot;my_custom_key&quot;, &quot;value&quot;); </pre> <p>And later retrieve it in Python:</p> <pre class="literal-block"> my_custom_key = sys.get_config()['my_custom_key'] # str </pre> </div> </div> <div class="section" id="january-2024-create-pep-741"> <h2>January 2024: Create PEP 741</h2> <p>In January 2024, I decide to write <a class="reference external" href="https://peps.python.org/pep-0741/">PEP 741 – Python Configuration C API</a> since it became difficult to follow the discussion which has a long history (since August 2022). I <a class="reference external" href="https://discuss.python.org/t/pep-741-python-configuration-c-api/43637">announced PEP 741</a> and the discussion continued there.</p> <div class="section" id="specification"> <h3>Specification</h3> <p>First proposed API.</p> <p>C API:</p> <ul class="simple"> <li><tt class="docutils literal">PyInitConfig</tt> structure</li> <li><tt class="docutils literal">PyInitConfig_Python_New()</tt></li> <li><tt class="docutils literal">PyInitConfig_Isolated_New()</tt></li> <li><tt class="docutils literal">PyInitConfig_Free(config)</tt></li> <li><tt class="docutils literal">PyInitConfig_SetInt(config, name, value)</tt></li> <li><tt class="docutils literal">PyInitConfig_SetStr(config, name, value)</tt></li> <li><tt class="docutils literal">PyInitConfig_SetWStr(config, name, value)</tt></li> <li><tt class="docutils literal">PyInitConfig_SetStrList(config, name, length, items)</tt></li> <li><tt class="docutils literal">PyInitConfig_SetWStrList(config, name, length, items)</tt></li> <li><tt class="docutils literal">Py_InitializeFromInitConfig(config)</tt></li> <li><tt class="docutils literal">PyInitConfig_Exception(config)</tt></li> <li><tt class="docutils literal">PyInitConfig_GetError(config, &amp;err_msg)</tt></li> <li><tt class="docutils literal">PyInitConfig_GetExitCode(config, &amp;exitcode)</tt></li> <li><tt class="docutils literal">Py_ExitWithInitConfig(config)</tt></li> <li><tt class="docutils literal">PyConfig_Get(name)</tt></li> <li><tt class="docutils literal">PyConfig_GetInt(name, &amp;value)</tt></li> </ul> <p>Python API:</p> <ul class="simple"> <li><tt class="docutils literal">sys.get_config(name)</tt></li> </ul> </div> <div class="section" id="discussions"> <h3>Discussions</h3> <p>It was proposed to switch to UTF-8 for strings, instead of using the locale encoding.</p> <p>It was asked to not add PEP 741 API to the limited C API, whereas it has been asked by multiple users.</p> <p>It was asked to get rid of the preinitialization which causes tricky implementation issues with the locale encoding and the memory allocator.</p> </div> </div> <div class="section" id="february-2024-second-version-of-pep-741"> <h2>February 2024: Second version of PEP 741</h2> <div class="section" id="second-version"> <h3>Second version</h3> <p>In February 2024, I wrote a major second version: <a class="reference external" href="https://discuss.python.org/t/pep-741-python-configuration-c-api-second-version/45403">PEP 741: Python Configuration C API (second version)</a>.</p> <ul class="simple"> <li>Use UTF-8 for strings, instead of the locale encoding.</li> <li>Add locale encoding strings, such as <tt class="docutils literal">PyInitConfig_SetStrLocale()</tt>. So the API now has 3 kinds of strings.</li> <li>Remove support for custom configuration options.</li> </ul> </div> <div class="section" id="api-to-set-the-current-runtime-configuration"> <h3>API to set the current runtime configuration</h3> <p>I decided to add <tt class="docutils literal">PyConfig_Set()</tt> to <strong>set</strong> configuration options at runtime:</p> <ul class="simple"> <li>Return <tt class="docutils literal">0</tt> on success.</li> <li>Set an error in config and return <tt class="docutils literal"><span class="pre">-1</span></tt> on error.</li> </ul> <p>The problem was to decide which options should be read-only and which options can be modified.</p> <p>I decided to allow modifying options which can already be modified with an existing API. For example, the <tt class="docutils literal">argv</tt> option is read from <tt class="docutils literal">sys.argv</tt> which can modified. So this option can be modified with <tt class="docutils literal">PyConfig_Set()</tt>.</p> <p>I also decided to allow modifying some <tt class="docutils literal">sys.flags</tt> flags, but not all of them. For example, it becomes possible to modify <tt class="docutils literal">bytes_warning</tt> which gets <tt class="docutils literal">sys.flags.bytes_warning</tt>.</p> </div> </div> <div class="section" id="april-2024-steering-council-feedback"> <h2>April 2024: Steering Council feedback</h2> <p>In April 2024, the Steering Council wrote that <a class="reference external" href="https://discuss.python.org/t/pep-741-python-configuration-c-api-second-version/45403/38">they had is having a tough time evaluating PEP 741</a>.</p> <p>Their main concerns were:</p> <ul class="simple"> <li>The number of string types (3).</li> <li>The stable ABI.</li> <li>The locale encoding.</li> </ul> </div> <div class="section" id="may-2024-third-pep-version"> <h2>May 2024: Third PEP version</h2> <p>I <a class="reference external" href="https://discuss.python.org/t/pep-741-python-configuration-c-api-second-version/45403/62">rewrote PEP 741 (3rd major version)</a> to make it the most likely to be accepted the Steering Council:</p> <ul class="simple"> <li>Remove string types other than UTF-8 (1 string type instead of 3).</li> <li>Exclude the API from the limited C API.</li> <li>Remove the explicit preconfiguration.</li> <li>Remove the rationale about the limited C API / stable ABI.</li> <li>Remove the &quot;Python Configuration&quot;, only keep the &quot;Isolated Configuration&quot;.</li> </ul> </div> <div class="section" id="august-2024-pep-approved"> <h2>August 2024: PEP approved</h2> <p>In August 2024, the Steering Council eventually <a class="reference external" href="https://discuss.python.org/t/pep-741-python-configuration-c-api-second-version/45403/88">accepted PEP 741</a>.</p> <p>Once it was approved, I merged PEP 741 implementation. It's now available for testing in the future Python 3.14 version!</p> </div> <div class="section" id="example"> <h2>Example</h2> <p>It becomes possible to modify some <tt class="docutils literal">sys.flags</tt> which were read-only previously. Example on Python 3.14 using the <tt class="docutils literal">_testcapi</tt> (which must not be used in production, using for testing!):</p> <pre class="literal-block"> $ ./python &gt;&gt;&gt; import sys &gt;&gt;&gt; import _testcapi # BytesWarning is disabled by default &gt;&gt;&gt; b'bytes' == 'unicode' False &gt;&gt;&gt; _testcapi.config_get('bytes_warning') 0 &gt;&gt;&gt; sys.flags.bytes_warning 0 # Set bytes_warning option &gt;&gt;&gt; _testcapi.config_set('bytes_warning', 1) &gt;&gt;&gt; _testcapi.config_get('bytes_warning') 1 &gt;&gt;&gt; sys.flags.bytes_warning 1 # Comparison now emits BytesWarning &gt;&gt;&gt; b'bytes' == 'unicode' &lt;python-input-8&gt;:1: BytesWarning: Comparison between bytes and string b'bytes' == 'unicode' False </pre> </div> <div class="section" id="statistics"> <h2>Statistics</h2> <p>Statistics on Discourse threads:</p> <ul class="simple"> <li>First thread: 62 messages</li> <li>Second thread: 55 messages</li> <li>Third thread: 89 messages</li> </ul> <p>Total: <strong>206</strong> messages!</p> </div> Add PyUnicodeWriter C API2024-07-04T18:00:00+02:002024-07-04T18:00:00+02:00Victor Stinnertag:vstinner.github.io,2024-07-04:/pyunicodewriter-c-api.html<img alt="La Danse - Matisse" src="https://vstinner.github.io/images/matisse_la_danse.jpg" /> <p>In May, I designed a new C API to build a Python str object: the <a class="reference external" href="https://docs.python.org/dev/c-api/unicode.html#pyunicodewriter">PyUnicodeWriter API</a>. Many people were involved in the design and the discussion was quite long. The C API Working Group helped to design a better and more convenient API. It took me basically a whole …</p><img alt="La Danse - Matisse" src="https://vstinner.github.io/images/matisse_la_danse.jpg" /> <p>In May, I designed a new C API to build a Python str object: the <a class="reference external" href="https://docs.python.org/dev/c-api/unicode.html#pyunicodewriter">PyUnicodeWriter API</a>. Many people were involved in the design and the discussion was quite long. The C API Working Group helped to design a better and more convenient API. It took me basically a whole month to get the design done and fully implement the API.</p> <p>Painting: <a class="reference external" href="https://en.wikipedia.org/wiki/Dance_(Matisse)">La Danse</a> by Matisse (1910).</p> <div class="section" id="initial-api"> <h2>Initial API</h2> <p>Building a Python <tt class="docutils literal">str</tt> object in C is not easy. I wrote the private <tt class="docutils literal">_PyUnicodeWriter</tt> C API 9 years ago (see <a class="reference external" href="https://vstinner.github.io/pybyteswriter.html">my previous article</a>), but it's not usable outside Python since it's a private API. So I proposed to make it public.</p> <p>On May 19, I create <a class="reference external" href="https://github.com/python/cpython/issues/119182">an issue</a> and <a class="reference external" href="https://github.com/python/cpython/pull/119184">a pull request</a> to discuss the API. The initial API was:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">PyUnicodeWriter</span><span class="w"> </span><span class="n">PyUnicodeWriter</span><span class="p">;</span> <span class="n">PyAPI_FUNC</span><span class="p">(</span><span class="n">PyUnicodeWriter</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="n">PyUnicodeWriter_Create</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="n">PyAPI_FUNC</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="n">PyUnicodeWriter_Free</span><span class="p">(</span><span class="n">PyUnicodeWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">);</span> <span class="n">PyAPI_FUNC</span><span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="n">PyUnicodeWriter_Finish</span><span class="p">(</span><span class="n">PyUnicodeWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">);</span> <span class="n">PyAPI_FUNC</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="n">PyUnicodeWriter_SetOverallocate</span><span class="p">(</span> <span class="w"> </span><span class="n">PyUnicodeWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">overallocate</span><span class="p">);</span> <span class="n">PyAPI_FUNC</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">PyUnicodeWriter_WriteChar</span><span class="p">(</span> <span class="w"> </span><span class="n">PyUnicodeWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span> <span class="w"> </span><span class="n">Py_UCS4</span><span class="w"> </span><span class="n">ch</span><span class="p">);</span> <span class="n">PyAPI_FUNC</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">PyUnicodeWriter_WriteStr</span><span class="p">(</span> <span class="w"> </span><span class="n">PyUnicodeWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span> <span class="w"> </span><span class="n">PyObject</span><span class="w"> </span><span class="o">*</span><span class="n">str</span><span class="p">);</span> <span class="n">PyAPI_FUNC</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">PyUnicodeWriter_WriteSubstring</span><span class="p">(</span> <span class="w"> </span><span class="n">PyUnicodeWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span> <span class="w"> </span><span class="n">PyObject</span><span class="w"> </span><span class="o">*</span><span class="n">str</span><span class="p">,</span> <span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">start</span><span class="p">,</span> <span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">stop</span><span class="p">);</span> <span class="n">PyAPI_FUNC</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">PyUnicodeWriter_WriteASCIIString</span><span class="p">(</span> <span class="w"> </span><span class="n">PyUnicodeWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="p">,</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">ascii</span><span class="p">,</span> <span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">len</span><span class="p">);</span> </pre></div> </div> <div class="section" id="api-changes"> <h2>API changes</h2> <div class="section" id="pyunicodewriter-writeutf8"> <h3>PyUnicodeWriter_WriteUTF8()</h3> <p>My first implementation made the assumption that the caller would only pass ASCII characters to <tt class="docutils literal">PyUnicodeWriter_WriteASCIIString()</tt> which is a bold assumption. It would crash if non-ASCII characters would be passed by mistake. UTF-8 is more common and Python has a fast UTF-8 decoder. The first change was to replace <tt class="docutils literal">PyUnicodeWriter_WriteASCIIString()</tt> with <tt class="docutils literal">PyUnicodeWriter_WriteUTF8()</tt>.</p> </div> <div class="section" id="pyunicodewriter-writestr"> <h3>PyUnicodeWriter_WriteStr()</h3> <p>I really wanted <tt class="docutils literal">PyUnicodeWriter_WriteStr()</tt> to only accept a Python str object. Others insisted to accept any Python object and write <tt class="docutils literal">str(obj)</tt> instead. I changed <tt class="docutils literal">PyUnicodeWriter_WriteStr()</tt> to implement that.</p> </div> <div class="section" id="pyunicodewriter-writerepr"> <h3>PyUnicodeWriter_WriteRepr()</h3> <p>Since <tt class="docutils literal">str(obj)</tt> was there, <tt class="docutils literal">repr(obj)</tt> becomes the next question: should we added it? It was decided to add <tt class="docutils literal">PyUnicodeWriter_WriteRepr(obj)</tt> to write <tt class="docutils literal">repr(obj)</tt>. It's convenient to use.</p> </div> <div class="section" id="pyunicodewriter-format"> <h3>PyUnicodeWriter_Format()</h3> <p>While discussing, it was proposed to add many functions to write various formats. I proposed to add <tt class="docutils literal">PyUnicodeWriter_FromFormat(format, <span class="pre">...)</span></tt> similiar to <tt class="docutils literal">PyUnicode_FromFormat()</tt>. It was decided to add it under the name: <tt class="docutils literal">PyUnicodeWriter_Format()</tt>. Its implementation is efficient since multiple formats write directly into the writer, without having to create a temporary string object.</p> </div> <div class="section" id="pyunicodewriter-create"> <h3>PyUnicodeWriter_Create()</h3> <p>The initial version of <tt class="docutils literal">PyUnicodeWriter_Create()</tt> had no argument. It was asked to add a size parameter to preallocate the internal buffer: <tt class="docutils literal">PyUnicodeWriter_Create(size)</tt>.</p> </div> <div class="section" id="remove-pyunicodewriter-setoverallocate"> <h3>Remove PyUnicodeWriter_SetOverallocate()</h3> <p>I tried to justify that calling <tt class="docutils literal">PyUnicodeWriter_SetOverallocate(0)</tt> before the last write was a killer feature for performance, but it looked too complicated to others and it was decided to simply remove this API.</p> </div> </div> <div class="section" id="c-api-working-group-discussion"> <h2>C API Working Group discussion</h2> <p>On May 24, once most of the API was stable, I created a <a class="reference external" href="https://github.com/capi-workgroup/decisions/issues/27">decision issue</a> for the API to the C API Working Group.</p> <p>On June 7, the API was approved by a majority vote.</p> <p>On June 10, Marc-Andre Lemburg reopened the issue since he had concerns about the incomplete UTF-8 Decoder API and the fact that the functions were not atomic: on error, the behavior was undefined.</p> <p>I modified my implementation to make all functions atomic: either the whole string is written, or nothing is written (restore the writer to its previous state).</p> <p>I also proposed to extend the <tt class="docutils literal">PyUnicodeWriter</tt> API once we agreed on an minimum API.</p> <p>On June 17, issue was closed again and I merged my implementation.</p> </div> <div class="section" id="extensions"> <h2>Extensions</h2> <div class="section" id="pyunicodewriter-writewidechar"> <h3>PyUnicodeWriter_WriteWideChar()</h3> <p>I added a function to write wide strings (<tt class="docutils literal">wchar_t*</tt>) which are common on Windows.</p> </div> <div class="section" id="pyunicodewriter-decodeutf8stateful"> <h3>PyUnicodeWriter_DecodeUTF8Stateful()</h3> <p>I added a stateful UTF-8 decoder as an answer to Marc-Andre's request. API:</p> <pre class="literal-block"> int PyUnicodeWriter_DecodeUTF8Stateful( PyUnicodeWriter *writer, const char *string, Py_ssize_t length, const char *errors, Py_ssize_t *consumed); </pre> </div> <div class="section" id="pyunicodewriter-writeucs4"> <h3>PyUnicodeWriter_WriteUCS4()</h3> <p>While less common, UCS-4 strings are convenient to manipulate Unicode code points. I added an API to support natively this string format.</p> </div> </div> <div class="section" id="documentation"> <h2>Documentation</h2> <p>Read the <a class="reference external" href="https://docs.python.org/dev/c-api/unicode.html#pyunicodewriter">PyUnicodeWriter API documentation</a>.</p> </div> <div class="section" id="example-of-contextvar-tp-repr"> <h2>Example of contextvar_tp_repr()</h2> <p>Simplified code:</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="n">PyObject</span><span class="w"> </span><span class="o">*</span> <span class="nf">contextvar_tp_repr</span><span class="p">(</span><span class="n">PyContextVar</span><span class="w"> </span><span class="o">*</span><span class="n">self</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="c1">// &quot;&lt;ContextVar name=&#39;a&#39; at 0x1234567812345678&gt;&quot;</span> <span class="w"> </span><span class="n">Py_ssize_t</span><span class="w"> </span><span class="n">estimate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">43</span><span class="p">;</span> <span class="w"> </span><span class="n">PyUnicodeWriter</span><span class="w"> </span><span class="o">*</span><span class="n">writer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyUnicodeWriter_Create</span><span class="p">(</span><span class="n">estimate</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">writer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">PyUnicodeWriter_WriteUTF8</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&lt;ContextVar name=&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">17</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">error</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">PyUnicodeWriter_WriteRepr</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">self</span><span class="o">-&gt;</span><span class="n">var_name</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">error</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">PyUnicodeWriter_Format</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; at %p&gt;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">error</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyUnicodeWriter_Finish</span><span class="p">(</span><span class="n">writer</span><span class="p">);</span> <span class="nl">error</span><span class="p">:</span> <span class="w"> </span><span class="n">PyUnicodeWriter_Discard</span><span class="p">(</span><span class="n">writer</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span> </pre></div> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>Thanks for great discussions, the final <tt class="docutils literal">PyUnicodeWriter</tt> API is better, more convenient, less error-prone, and maybe even a little bit more efficient!</p> <p>Thanks to everyone who was involved in these discussions!</p> </div> Status of the Python Limited C API (March 2024)2024-03-20T17:00:00+01:002024-03-20T17:00:00+01:00Victor Stinnertag:vstinner.github.io,2024-03-20:/status-limited-c-api-march-2024.html<a class="reference external image-reference" href="https://danielazconegui.com/en/prints/ghibli-spyrited-away.html"> <img alt="Ghibli - Spirited Away" src="https://vstinner.github.io/images/ghibli-spyrited-away.jpg" /> </a> <p>In Python 3.13, I made multiple enhancements to make the limited C API more usable:</p> <ul class="simple"> <li>Add 14 functions to the limited C API.</li> <li>Make the special debug build <tt class="docutils literal">Py_TRACE_REFS</tt> compatible with the limited C API.</li> <li>Enhance Argument Clinic to generate C code using the limited C API.</li> <li>Add an …</li></ul><a class="reference external image-reference" href="https://danielazconegui.com/en/prints/ghibli-spyrited-away.html"> <img alt="Ghibli - Spirited Away" src="https://vstinner.github.io/images/ghibli-spyrited-away.jpg" /> </a> <p>In Python 3.13, I made multiple enhancements to make the limited C API more usable:</p> <ul class="simple"> <li>Add 14 functions to the limited C API.</li> <li>Make the special debug build <tt class="docutils literal">Py_TRACE_REFS</tt> compatible with the limited C API.</li> <li>Enhance Argument Clinic to generate C code using the limited C API.</li> <li>Add an convenient API to format a type fully qualified name using the limited C API (PEP 737).</li> <li>Add <tt class="docutils literal">_testlimitedcapi</tt> extension.</li> <li>Convert 16 stdlib extensions to the limited C API.</li> </ul> <p>What's Next?</p> <ul class="simple"> <li>PEP 741: Python Configuration C API.</li> <li>Py_GetConstant().</li> <li>Cython and PyO3.</li> </ul> <p><em>Drawing: Ghibli - Spirited Away by Daniel Azconegui.</em></p> <div class="section" id="new-functions"> <h2>New Functions</h2> <p>I added 14 functions to the limited C API:</p> <ul class="simple"> <li><tt class="docutils literal">PyDict_GetItemRef()</tt></li> <li><tt class="docutils literal">PyDict_GetItemStringRef()</tt></li> <li><tt class="docutils literal">PyImport_AddModuleRef()</tt></li> <li><tt class="docutils literal">PyLong_AsInt()</tt></li> <li><tt class="docutils literal">PyMem_RawCalloc()</tt></li> <li><tt class="docutils literal">PyMem_RawFree()</tt></li> <li><tt class="docutils literal">PyMem_RawMalloc()</tt></li> <li><tt class="docutils literal">PyMem_RawRealloc()</tt></li> <li><tt class="docutils literal">PySys_Audit()</tt></li> <li><tt class="docutils literal">PySys_AuditTuple()</tt></li> <li><tt class="docutils literal">PyType_GetFullyQualifiedName()</tt></li> <li><tt class="docutils literal">PyType_GetModuleName()</tt></li> <li><tt class="docutils literal">PyWeakref_GetRef()</tt></li> <li><tt class="docutils literal">Py_IsFinalizing()</tt></li> </ul> <p>It makes code using these functions <strong>compatible with the limited C API</strong>.</p> </div> <div class="section" id="py-trace-refs"> <h2>Py_TRACE_REFS</h2> <p>I modified the special debug build <tt class="docutils literal">Py_TRACE_REFS</tt>. Instead of adding two members to <tt class="docutils literal">PyObject</tt> to create a double linked list of all objects, I added an hash table to track all objects.</p> <p>Since the <tt class="docutils literal">PyObject</tt> structure is no longer modified, this special debug build is now <strong>ABI compatible</strong> with the <strong>release build</strong>! Moreover, it also becomes compatible with the <strong>limited C API</strong>!</p> </div> <div class="section" id="argument-clinic"> <h2>Argument Clinic</h2> <p>I modified Argument Clinic (AC) to generate C code compatible with the limited C API.</p> <p>First, I moved private functions used by Argument Clinic to the internal C API and modified Argument Clinic to generate <tt class="docutils literal">#include</tt> to get these functions. Then I modified Argument Clinic to use only the limited C API and to not generate these <tt class="docutils literal">#include</tt>.</p> <p>At the beginning, only some converteres were supported and only the slower <tt class="docutils literal">METH_VARARGS</tt> calling convention was supported.</p> <p>Now, more and more converters and formats are supported, and the regular efficient <tt class="docutils literal">METH_FASTCALL</tt> calling convention is used.</p> <div class="section" id="example"> <h3>Example</h3> <p>Example from the <tt class="docutils literal">grp</tt> extension:</p> <pre class="literal-block"> /*[clinic input] grp.getgrgid id: object Return the group database entry for the given numeric group ID. </pre> <p>Python 3.12 uses the <strong>private</strong> <tt class="docutils literal">_PyArg_UnpackKeywords()</tt> functions:</p> <pre class="literal-block"> args = _PyArg_UnpackKeywords(args, nargs, NULL, kwnames, &amp;_parser, 1, 1, 0, argsbuf); if (!args) { goto exit; } id = args[0]; return_value = grp_getgrgid_impl(module, id); </pre> <p>Python 3.13 now uses the public <tt class="docutils literal">PyArg_ParseTupleAndKeywords()</tt> function of the <strong>limited C API</strong>:</p> <pre class="literal-block"> if (!PyArg_ParseTupleAndKeywords(args, kwargs, &quot;O:getgrgid&quot;, _keywords, &amp;id)) goto exit; return_value = grp_getgrgid_impl(module, id); </pre> </div> </div> <div class="section" id="pep-737-format-type-name"> <h2>PEP 737: Format Type Name</h2> <p>One issue that I had with Argument Clinic was to <strong>format an error message</strong> with the limited C API. I cannot use the private <tt class="docutils literal">_PyArg_BadArgument()</tt> function, nor access to <tt class="docutils literal">PyTypeObject.tp_name</tt> (opaque structure in the limited C API) to format a type name. While the limited C API provides <tt class="docutils literal">PyType_GetName()</tt> and <tt class="docutils literal">PyType_GetQualName()</tt>, it's still different than how Python formats type names in error messages.</p> <p>I proposed different APIs but there was no agreement. So I decided to write <a class="reference external" href="https://peps.python.org/pep-0737/">PEP 737</a> &quot;C API to format a type fully qualified name&quot;.</p> <p>After four months of discussions, the <strong>Steering Council</strong> decided to accept it in Python 3.13.</p> <p>Changes:</p> <ul class="simple"> <li>Add <tt class="docutils literal">PyType_GetFullyQualifiedName()</tt> function.</li> <li>Add <tt class="docutils literal">PyType_GetModuleName()</tt> function.</li> <li>Add <tt class="docutils literal">%T</tt>, <tt class="docutils literal">%#T</tt>, <tt class="docutils literal">%N</tt> and <tt class="docutils literal">%#N</tt> formats to <tt class="docutils literal">PyUnicode_FromFormat()</tt>.</li> </ul> <p>I also proposed adding a new <tt class="docutils literal">type.__fully_qualified_name__</tt> attribute, and a few methods to format a the fully qualified name of type in Python. But the Steering Council was not convinced and asked me to <strong>remove these Python changes</strong> until someone comes with a strong use case for this attribute and methods.</p> <p>In <strong>2018</strong>, I made a <strong>first attempt</strong>: I made a similar change, but I had to revert it. I created a discussion on the python-dev mailing list, but we failed to reach a consensus.</p> <p>In <strong>2011</strong>, I already asked to stop the <strong>cargo cult</strong> of truncating type names, but I didn't implement my idea by proactively stop truncating type names.</p> <div class="section" id="example-1"> <h3>Example</h3> <p>Example of the code generating an error message in the <tt class="docutils literal">pwd</tt> extension.</p> <p>Python 3.12 uses the <strong>private</strong> <tt class="docutils literal">_PyArg_BadArgument()</tt> private:</p> <pre class="literal-block"> _PyArg_BadArgument(&quot;getpwnam&quot;, &quot;argument&quot;, &quot;str&quot;, arg); </pre> <p>Python 3.13 now uses the new <tt class="docutils literal">%T</tt> format (PEP 737) of the <strong>limited C API</strong>:</p> <pre class="literal-block"> PyErr_Format(PyExc_TypeError, &quot;getpwnam() argument must be str, not %T&quot;, arg); </pre> </div> </div> <div class="section" id="add-testlimitedcapi-extension"> <h2>Add _testlimitedcapi extension</h2> <p>In Python 3.12, C API tests are splitted in two categories:</p> <ul class="simple"> <li><tt class="docutils literal">_testcapi</tt>: public C API</li> <li><tt class="docutils literal">_testinternalcapi</tt>: internal C API (<tt class="docutils literal">Py_BUILD_CORE</tt>)</li> </ul> <p>I added a third <tt class="docutils literal">_testlimitedcapi</tt> extension to test the limited C API (<tt class="docutils literal">Py_LIMITED_API</tt>). I moved tests using the limited C C API from <tt class="docutils literal">_testcapi</tt> to <tt class="docutils literal">_testlimitedcapi</tt>.</p> <p>The difference between <tt class="docutils literal">_testcapi</tt> and <tt class="docutils literal">_testlimitedcapi</tt> is that the <tt class="docutils literal">_testlimitedcapi</tt> extension is built with the <tt class="docutils literal">Py_LIMITED_API</tt> macro defined, and so can only access the internal C API.</p> </div> <div class="section" id="convert-stdlib-extensions-to-the-limited-c-api"> <h2>Convert stdlib extensions to the limited C API</h2> <p>At August 2023, I proposed to: <a class="reference external" href="https://discuss.python.org/t/use-the-limited-c-api-for-some-of-our-stdlib-c-extensions/32465">Use the limited C API for some of our stdlib C extensions</a>.</p> <p>In March 2024, there are now <strong>16</strong> C extensions built with the limited C API:</p> <ul class="simple"> <li><tt class="docutils literal">_ctypes_test</tt></li> <li><tt class="docutils literal">_multiprocessing.posixshmem</tt></li> <li><tt class="docutils literal">_scproxy</tt></li> <li><tt class="docutils literal">_stat</tt></li> <li><tt class="docutils literal">_statistics</tt></li> <li><tt class="docutils literal">_testimportmultiple</tt></li> <li><tt class="docutils literal">_testlimitedcapi</tt></li> <li><tt class="docutils literal">_uuid</tt></li> <li><tt class="docutils literal">errno</tt></li> <li><tt class="docutils literal">fcntl</tt></li> <li><tt class="docutils literal">grp</tt></li> <li><tt class="docutils literal">md5</tt></li> <li><tt class="docutils literal">pwd</tt></li> <li><tt class="docutils literal">resource</tt></li> <li><tt class="docutils literal">termios</tt></li> <li><tt class="docutils literal">winsound</tt></li> </ul> <p>Other stdlib C extensions use the internal C API for various reasons or are using functions which are missing in the limited C API. Remaining issues should be analyzed on a case by case basis.</p> <p>This work shows that non-trivial C extensions can be written using only the limited C API version 3.13.</p> </div> <div class="section" id="what-s-next"> <h2>What's Next?</h2> <div class="section" id="pep-741-python-configuration-c-api"> <h3>PEP 741: Python Configuration C API</h3> <p>In Python 3.8, I added the <tt class="docutils literal">PyConfig</tt> API to configure the Python initialization. Problem: it has no stable ABI and is excluded from the limited C API.</p> <p>Recently, I proposed <a class="reference external" href="https://peps.python.org/pep-0741/">PEP 741: Python Configuration C API</a> which is built on top of the <tt class="docutils literal">PyConfig</tt>, provides a stable ABI, and is compatible with the limited C API. I submitted PEP 741 to the Steering Council.</p> </div> <div class="section" id="py-getconstant"> <h3>Py_GetConstant()</h3> <p>Accessing constants reads private ABI symbols. For example, <tt class="docutils literal">Py_None</tt> API reads the private <tt class="docutils literal">_Py_NoneStruct</tt> symbol at the stable ABI level.</p> <p>I <a class="reference external" href="https://github.com/python/cpython/pull/116883">proposed</a> to change the constant implementations to use function calls instead. For example, reading <tt class="docutils literal">Py_None</tt> would call <tt class="docutils literal">Py_GetConstant(Py_CONSTANT_NONE)</tt>. The advantage is that it adds 5 more constants: zero, one, empty string, empty bytes string, and empty tuple. For example, <tt class="docutils literal">Py_GetConstant(Py_CONSTANT_ZERO)</tt> gives the number <tt class="docutils literal">0</tt> and the function cannot fail.</p> </div> <div class="section" id="cython-and-pyo3"> <h3>Cython and PyO3</h3> <p>Cython and PyO3 projects are two big consumers of the C API.</p> <p>While Cython has an experimental build mode for the limited C API, it's still incomplete. It would be nice to complete it to cover more use cases and more APIs.</p> <p>PyO3 can use the limited API but can still use the non-limited API for some use cases. It would be interersting to only use the limited C API. The PEP 741 to embed Python in Rust would be interesting for that.</p> </div> </div> Remove private C API functions2023-12-15T23:00:00+01:002023-12-15T23:00:00+01:00Victor Stinnertag:vstinner.github.io,2023-12-15:/remove-c-api-funcs-313.html<a class="reference external image-reference" href="https://en.wikipedia.org/wiki/The_Seasons_(Mucha)"> <img alt="Mucha paintaing: the 4 seasons" src="https://vstinner.github.io/images/mucha_seasons.jpg" /> </a> <p>In Python 3.13 alpha 1, I removed more than 300 private C API functions. Even if I announced my plan early in July, users didn't &quot;embrace&quot; my plan and didn't agree with the rationale. I reverted 50 functions in the alpha 2 release to calm down the situation and …</p><a class="reference external image-reference" href="https://en.wikipedia.org/wiki/The_Seasons_(Mucha)"> <img alt="Mucha paintaing: the 4 seasons" src="https://vstinner.github.io/images/mucha_seasons.jpg" /> </a> <p>In Python 3.13 alpha 1, I removed more than 300 private C API functions. Even if I announced my plan early in July, users didn't &quot;embrace&quot; my plan and didn't agree with the rationale. I reverted 50 functions in the alpha 2 release to calm down the situation and have more time to replace private functions with public functions.</p> <p><em>Painting: The Seasons by Czech visual artist Alphonse Mucha (1900)</em></p> <div class="section" id="remove-private-functions"> <h2>Remove private functions</h2> <p>On June 25th, I created <a class="reference external" href="https://github.com/python/cpython/issues/106084">issue gh-106084</a>: &quot;Remove private C API functions from abstract.h&quot;.</p> <blockquote> Over the years, we accumulated many <strong>private</strong> functions as part of the <strong>public</strong> C API in abstract.h header file. I propose to remove them: move them to the <strong>internal</strong> C API.</blockquote> <p>On July 1st, I created the meta <a class="reference external" href="https://github.com/python/cpython/issues/106320">issue gh-106320</a>: &quot;Remove private C API functions&quot;. The issue has 63 pull requests (a lot!), 53 comments and more than 300 events (created by commits and pull requests) which make the issue hard to navigate.</p> <p>On July 3rd, <strong>Petr Viktorin</strong> shared his concerns:</p> <blockquote> <p>Please be careful about assuming that the <strong>underscore</strong> means a function is <strong>private</strong>. AFAIK, that rule first appears for <a class="reference external" href="https://docs.python.org/3.10/c-api/stable.html#stable">3.10</a>, and was only properly formalized in <a class="reference external" href="https://peps.python.org/pep-0689/">PEP 689</a>, for Python 3.12.</p> <p>For older functions, please consider if they should be added to the unstable API. IMO it's better to call them “underscored” than “private”.</p> <p>See also: historical note in the <a class="reference external" href="https://devguide.python.org/developer-workflow/c-api/index.html#private-names">devguide</a>.</p> </blockquote> <p>On July 4th, <strong>Petr</strong> posted on Discourse: <a class="reference external" href="https://discuss.python.org/t/pssst-lets-treat-all-api-in-public-headers-as-public/28916">(pssst) Let's treat all API in public headers as public</a>.</p> </div> <div class="section" id="remove-more-private-functions"> <h2>Remove more private functions</h2> <p>On July 4th, I removed <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1620749616">181 private functions</a> so far.</p> <p>On July 4th, I identified that <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1620773057">34 projects</a> on PyPI top 5,000 are affected by these removals.</p> <p>On July 7th, I <a class="reference external" href="https://github.com/python/pythoncapi-compat/pull/62">added PyObject_Vectorcall()</a> to the pythoncapi-compat project.</p> <p>On July 9th, I started the discussion: <a class="reference external" href="https://discuss.python.org/t/c-api-how-much-private-is-the-private-py-identifier-api/29190">C API: How much private is the private _Py_IDENTIFIER() API?</a></p> <p>On July 13th, I asked if <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1633302147">the PyComplex API</a> should be made private or not. Petr noticed that this API was documented.</p> <p>On July 23th, I tried to build numpy, but I was blocked by Cython which was broken by my changes. I created the <a class="reference external" href="https://github.com/python/cpython/issues/107076">issue gh-107076</a>: &quot;C API: Cython 3.0 uses private functions removed in Python 3.13 (numpy 1.25.1 fails to build)&quot;.</p> <p>On July 23th, I found that the private <tt class="docutils literal">_PyTuple_Resize()</tt> function is documented. I proposed <a class="reference external" href="https://github.com/python/cpython/pull/107139">adding a new internal _PyTupleBuilder API</a> to replace <tt class="docutils literal">_PyTuple_Resize()</tt>.</p> <p>On July 23th, I proposed: <a class="reference external" href="https://discuss.python.org/t/c-api-my-plan-to-clarify-private-vs-public-functions-in-python-3-13/30131">C API: My plan to clarify private vs public functions in Python 3.13</a>.</p> <blockquote> Private API has multiple issues: they are usually <strong>not documented</strong>, <strong>not tested</strong>, and so their <strong>behavior may change</strong> without any warning or anything. Also, they can be <strong>removed anytime</strong> without any notice.</blockquote> <ul class="simple"> <li>Phase 1: Remove as many private API as possible</li> <li>Phase 2 (Python 3.13 alpha 1): revert removals if needed to make sure that Cython, numpy and pip work.</li> <li>Phase 3 (Python 3.13 beta 1): consider reverting more removals if needed.</li> </ul> <p>On July 24th, I created the PR <a class="reference external" href="https://github.com/python/cpython/pull/107068">Remove private _PyCrossInterpreterData API</a>. <strong>Eric Snow</strong> asked me to keep this private API since it's used by 3rd party C extensions.</p> <p>On August 24th, I created <a class="reference external" href="https://github.com/python/cpython/issues/108444">issue gh-108444</a> to add <tt class="docutils literal">PyLong_AsInt()</tt> public function, replacing the removed <tt class="docutils literal">_PyLong_AsInt()</tt> function.</p> <p>On September 4th, I looked at the <tt class="docutils literal">_PyArg</tt> API. I started the discussion: <a class="reference external" href="https://discuss.python.org/t/use-the-limited-c-api-for-some-of-our-stdlib-c-extensions/32465">Use the limited C API for some of our stdlib C extensions</a>.</p> <p>On September 4th, <a class="reference external" href="https://discuss.python.org/t/c-api-my-plan-to-clarify-private-vs-public-functions-in-python-3-13/30131/9">I declared</a>:</p> <blockquote> I declare that the Python 3.13 <strong>season of “removing as many private C API as possible” ended</strong>! I stop here until Python 3.14.</blockquote> <p>Python 3.12 exports <strong>385</strong> private functions. After the cleanup, Python 3.13 only exported <strong>86</strong> private functions: I removed 299 functions. I closed the issue.</p> </div> <div class="section" id="python-3-13-alpha-1-negative-feedback"> <h2>Python 3.13 alpha 1 negative feedback</h2> <p>On October 13th, <strong>Python 3.13 alpha 1 was released</strong> with my changes removing around 300 private C API functions.</p> <p>On October 14th, <strong>Guido van Rossum</strong> <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1762755146">asked</a>:</p> <blockquote> Thanks for the list. Should we <strong>encourage</strong> various <strong>projects to test 3.13a1</strong>, which just came out? Is there a way we can encourage them more?</blockquote> <p>On October 30th, <strong>Stefan Behnel</strong>, Cython creator, posted the message: <a class="reference external" href="https://discuss.python.org/t/python-3-13-alpha-1-contains-breaking-changes-whats-the-plan/37490">Python 3.13 alpha 1 contains breaking changes, what's the plan?</a>. He also <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1772735064">commented the issue</a>. Extract:</p> <blockquote> I just came across this issue. Let me express my general disapproval regarding deliberate breakage, which this issue appears to be entirely about. As far as I can see, none of these removals was motivated. The mere idea of removing existing API &quot;because we can&quot; is entirely foreign to me.</blockquote> <p>On October 31th, <strong>Petr</strong> asked the Steering Council: <a class="reference external" href="https://github.com/python/steering-council/issues/212">Is it OK to remove _PyObject_Vectorcall?</a> about the removal of old aliases with underscore, such as <tt class="docutils literal">_PyObject_Vectorcall</tt>. I didn't know that these names were part of <a class="reference external" href="https://peps.python.org/pep-0590/">PEP 590 – Vectorcall: a fast calling protocol for CPython</a>, nothing was written about that in the header files.</p> <p>On November 2nd, <strong>Guido</strong> <a class="reference external" href="https://github.com/python/cpython/issues/106320#issuecomment-1790832433">wrote</a> (where WG stands for C API Working Group):</p> <blockquote> <p>We can talk till we’re blue in the face but please no more action (i.e., no more moving/removing APIs) until the full WG has had a chance to discuss this and make a decision.</p> <p>(Restoring removed APIs at users’ requests is fine.)</p> </blockquote> <p>On November 3rd, <strong>Gregory P. Smith</strong> <a class="reference external" href="https://github.com/python/cpython/issues/111481#issuecomment-1794211126">wrote</a>:</p> <blockquote> <p>I'd much prefer 'revert' for any API anyone is found using in 3.13.</p> <p>We need to treat 3.13 as a more special than usual release and aim to minimize compatibility headaches for existing project code. That way more things that build and run on 3.12 build can run on 3.13 as is or with minimal work.</p> <p>This will enable ecosystem code owners to focus on the bigger picture task of enabling existing code to be built and tested on an experimental pep703 free-threading build rather than having a pile of unrelated cleanup trivia blocking that.</p> </blockquote> <p>On November 7th, my colleague <strong>Karolina Surma</strong> posted a report: <a class="reference external" href="https://discuss.python.org/t/ongoing-packages-rebuild-with-python-3-13-in-fedora/38134">Ongoing packages' rebuild with Python 3.13 in Fedora</a>. She did a great bug triage work on counting build failures per C API issue by recompiling 4000+ Python packages in Fedora with Python 3.13.</p> <p>On November 13th, <strong>Petr</strong> also identified that the private PyComplex API, such as <tt class="docutils literal">_Py_c_sum()</tt> function, was documented. Moreover, the <a class="reference external" href="https://github.com/python/cpython/issues/112019">issue gh-112019</a> was created to ask to revert these APIs.</p> </div> <div class="section" id="revert-in-python-3-13-alpha-2"> <h2>Revert in Python 3.13 alpha 2</h2> <p>On November 13th, I created <a class="reference external" href="https://github.com/python/cpython/issues/112026">issue gh-112026</a>: &quot;[C API] Revert of private functions removed in Python 3.13 causing most problems&quot;. I made 4 changes:</p> <ul class="simple"> <li>Add again <tt class="docutils literal">&lt;unistd.h&gt;</tt> include in Python.h</li> <li>Restore removed private C API</li> <li>Restore removed _PyDict_GetItemStringWithError()</li> <li>Add again _PyThreadState_UncheckedGet() function</li> </ul> <p>I selected functions by looking at bug reports, <strong>Karolina</strong>'s report, and by trying to build numpy and cffi. With my reverts, numpy built successfully, and cffi built successfully with a minor change that I reported upstream (<a class="reference external" href="https://github.com/python-cffi/cffi/pull/34">cffi: Use PyErr_FormatUnraisable() on Python 3.13</a>).</p> <p>In total, I restored <a class="reference external" href="https://github.com/python/cpython/issues/112026#issuecomment-1813191948">50 private functions</a>.</p> <p>On November 22th, <strong>Python 3.13 alpha 2 was released</strong> with these restored functions. It seems like the situation is calmer now.</p> <p>Reverting was part of my initial plan, it was clearly announced since the beginning. But I didn't expect that so many people would test Python 3.13 alpha 1 as soon as it was released (October)! Usually, we only start to get feedback around beta 1 (May). I had like <strong>2 weeks to fix most issues instead of 7 months</strong>. It was really stressful for me.</p> <p>I <a class="reference external" href="https://discuss.python.org/t/python-3-13-alpha-1-contains-breaking-changes-whats-the-plan/37490/29">posted a message to apologize</a> and to give the context of this work. Extract:</p> <blockquote> <p>Following the announced plan 22, I reverted 50 private APIs 20 which were removed in Python 3.13 alpha 1. These APIs will be available again in the incoming Python 3.13 alpha 2 (scheduled next Tuesday).</p> <p>I <strong>planned to make Cython, numpy and cffi compatible</strong> with Python 3.13 <strong>alpha 1</strong>. Well, I missed this release. With reverted changes, numpy 1.26.2 can be built successfully, and cffi 1.16.0 just requires a single change 13. So we should be good (or almost good) for Python 3.13 <strong>alpha 2</strong>.</p> <p>(...)</p> <p>I’m sorry if some people felt that this C API work was forced on them and their opinion was not taken in account. We heard you and we took your feedback in account. It took me time to adjust my plan according to early received feedback. I expected to have 6 months to work step by step. Well, I had 2 weeks instead 🙂</p> </blockquote> </div> <div class="section" id="add-public-functions"> <h2>Add public functions</h2> <p>On October 30th, I created <a class="reference external" href="https://github.com/python/cpython/issues/111481">issue gh-111481</a>: &quot;[C API] Meta issue: add new public functions with doc+tests to replace removed private functions&quot;.</p> <p>So far, I added 7 public functions to Python 3.13:</p> <ul class="simple"> <li><tt class="docutils literal">PyDict_Pop()</tt></li> <li><tt class="docutils literal">PyDict_PopString()</tt></li> <li><tt class="docutils literal">PyList_Clear()</tt></li> <li><tt class="docutils literal">PyList_Extend()</tt></li> <li><tt class="docutils literal">PyLong_AsInt()</tt></li> <li><tt class="docutils literal">Py_HashPointer()</tt></li> <li><tt class="docutils literal">Py_IsFinalizing()</tt></li> </ul> <p>More functions are coming soon, I have many open pull requests!</p> <p>Adding new functions is slower than what I expected. The good part is that many people are reviewing the APIs, and that the new public APIs are way better than the old private ones: less error prone, can be more efficient, etc. At least, the conversion of private to public is moving steadily, functions are added one by one.</p> </div> Design the API of a new PyDict_GetItemRef() function2023-11-16T20:00:00+01:002023-11-16T20:00:00+01:00Victor Stinnertag:vstinner.github.io,2023-11-16:/c-api-dict-getitemref.html<p>Last June, I proposed adding a new <tt class="docutils literal">PyDict_GetItemRef()</tt> function to Python 3.13 C API. Every aspect of the API design was discussed in length. I will explain how the API was designed, to finish with the future creation of C API Working Group.</p> <img alt="Psyche Revived by Cupid's Kiss" src="https://vstinner.github.io/images/amour_psychee.jpg" /> <p>Photo: <em>Psyche Revived by Cupid's Kiss …</em></p><p>Last June, I proposed adding a new <tt class="docutils literal">PyDict_GetItemRef()</tt> function to Python 3.13 C API. Every aspect of the API design was discussed in length. I will explain how the API was designed, to finish with the future creation of C API Working Group.</p> <img alt="Psyche Revived by Cupid's Kiss" src="https://vstinner.github.io/images/amour_psychee.jpg" /> <p>Photo: <em>Psyche Revived by Cupid's Kiss</em> sculpture by Antonio Canova.</p> <div class="section" id="add-pyimport-addmoduleref-function"> <h2>Add PyImport_AddModuleRef() function</h2> <p>In June, while reading Python C code, I found a <a class="reference external" href="https://github.com/python/cpython/blob/8cd70eefc7f3363cfa0d43f34522c3072fa9e160/Python/import.c#L345-L369">surprising code</a>: the <tt class="docutils literal">PyImport_AddModuleObject()</tt> function creates a <strong>weak reference</strong> on the module returned by <tt class="docutils literal">import_add_module()</tt>, call <tt class="docutils literal">Py_DECREF()</tt> on the module, and then try to get the module back from the weak reference: it can be NULL if the reference count was one. I expected to have just <tt class="docutils literal">Py_DECREF()</tt>, but no, complicated code involving a weak reference is needed to prevent a crash.</p> <p>So I <a class="reference external" href="https://github.com/python/cpython/issues/105922">added</a> the new <a class="reference external" href="https://docs.python.org/dev/c-api/import.html#c.PyImport_AddModuleRef">PyImport_AddModuleRef() function</a> to return directly the strong reference, and avoid having to create a temporary weak reference.</p> <p>Note: The API of the new PyImport_AddModuleObject() function is <a class="reference external" href="https://github.com/python/cpython/issues/106915">still being discussed and may change in the near future</a>.</p> </div> <div class="section" id="add-pyweakref-getref-function"> <h2>Add PyWeakref_GetRef() function</h2> <p>Shortly after, I <a class="reference external" href="https://github.com/python/cpython/issues/105927">added</a> the new <a class="reference external" href="https://docs.python.org/dev/c-api/weakref.html#c.PyWeakref_GetRef">PyWeakref_GetRef() function</a>. It is similar to <tt class="docutils literal">PyWeakref_GetObject()</tt>, but returns a strong reference instead of a borrowed reference.</p> <p>Since I listed <a class="reference external" href="https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references">Bad C API</a> in my &quot;Design a new better C API for Python&quot; project in 2018, I am now fighting against borrowed references since they cause multiple issues such as:</p> <ul class="simple"> <li>Subtle crashes in C extensions.</li> <li>Make the C API implementation in PyPy more complicated: see <a class="reference external" href="https://www.pypy.org/posts/2018/09/inside-cpyext-why-emulating-cpython-c-8083064623681286567.html">Inside cpyext: Why emulating CPython C API is so Hard</a> (2018) by Antonio Cuni.</li> <li>Unknown objects lifetime preventing optimization opportunities.</li> <li>Make the C API less regular and harder to use: some functions return a new reference, others return borrowed reference.</li> </ul> <p>In 2020, my first attempt to <a class="reference external" href="https://github.com/python/cpython/issues/86460">add a new PyTuple_GetItemRef() function</a> was rejected.</p> </div> <div class="section" id="pydict-getitemref-easy"> <h2>PyDict_GetItemRef(): easy!</h2> <p>Since it went well (quick discussion, no major disagreement) to add <tt class="docutils literal">PyImport_AddModuleRef()</tt> and <tt class="docutils literal">PyWeakref_GetRef()</tt> functions, I felt lucky and proposed <a class="reference external" href="https://github.com/python/cpython/issues/106004">adding a new PyDict_GetItemRef() function</a>. It should be easy as well, right? The discussion started in the issue and continued in the associated <a class="reference external" href="https://github.com/python/cpython/pull/106005">pull request</a>.</p> <p>The idea of <tt class="docutils literal">PyDict_GetItemRef()</tt> is to replace the <tt class="docutils literal">PyDict_GetItem()</tt> function which returns a borrowed reference and ignore all errors: <tt class="docutils literal">hash(key)</tt> error, <tt class="docutils literal">key == key2</tt> comparison error, <tt class="docutils literal">KeyboardInterrupt</tt>, etc.</p> <p>There is already the <tt class="docutils literal">PyDict_GetItemWithError()</tt> function which reports errors. But it returns a borrowed reference and its API has an issue: when it returns <tt class="docutils literal">NULL</tt>, the caller must check <tt class="docutils literal">PyErr_Occurred()</tt> to know if an exception is set, or if the key is missing. This problem was the <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/1">very first issue</a> created in the Problems project of the C API Working Group.</p> <p>This Problems project is a collaborative work to collect C API issues. By the way, the <a class="reference external" href="https://peps.python.org/pep-0733/">PEP 733 – An Evaluation of Python’s Public C API</a> was published at October 16: summary of these problems.</p> </div> <div class="section" id="pydict-getitemref-api-version-1"> <h2>PyDict_GetItemRef(): API version 1</h2> <p>I proposed the API:</p> <pre class="literal-block"> int PyDict_GetItemRef(PyObject *mp, PyObject *key, PyObject **pvalue) int PyDict_GetItemStringRef(PyObject *mp, const char *key, PyObject **pvalue) </pre> <p>Return <tt class="docutils literal">0</tt> on success, or <tt class="docutils literal"><span class="pre">-1</span></tt> on error. Simple, right?</p> <p><strong>Gregory Smith</strong> was supportive:</p> <blockquote> I'm in favor of this because I don't think we should have public APIs that (a) require a value check + <tt class="docutils literal">PyErr_Occurred()</tt> call pattern - a frequent source of lurking bugs - or (b) return borrowed references. Yes I know we already have them, that's missing the point. The point is that with these in place, we can promote their use over the others because these are better in all respects.</blockquote> <p>Later, I discovered that the draft <a class="reference external" href="https://peps.python.org/pep-0703/">PEP 703 – Making the Global Interpreter Lock Optional in CPython</a> proposed adding a <tt class="docutils literal">PyDict_FetchItem()</tt> similar to my proposed <tt class="docutils literal">PyDict_GetItemRef()</tt> function.</p> </div> <div class="section" id="api-version-2-change-the-return-value"> <h2>API version 2: Change the Return Value</h2> <p><strong>Mark Shannon</strong> asked:</p> <blockquote> What's the rationale for not distinguishing between found and not found in the return value? See: <a class="reference external" href="https://github.com/python/devguide/issues/1121">Document the preferred style for API functions with three, four or five-way returns</a>.</blockquote> <p>I modified the API to return <tt class="docutils literal">1</tt> if the key is present and return <tt class="docutils literal">0</tt> if the key is missing.</p> <p>By the way, <strong>Erlend Aasland</strong> added <a class="reference external" href="https://devguide.python.org/developer-workflow/c-api/index.html#guidelines-for-expanding-changing-the-public-api">C API guidelines</a> in the Python Developer Guide (devguide) about function return values.</p> </div> <div class="section" id="function-name"> <h2>Function Name</h2> <p><strong>Serhiy Storchaka</strong> had concerns about the name:</p> <blockquote> The only problem is that functions with so similar names have completely different interface. It is pretty confusing. Would not be better to name it <tt class="docutils literal">PyDict_LookupItem</tt> or like? It may be worth to add also <tt class="docutils literal">PyMapping_LookupItem</tt> for convenience.</blockquote> <p><strong>Mark Shannon</strong> added:</p> <blockquote> <p>Can we come up with a better name than <tt class="docutils literal">PyDict_GetItemRef</tt>? I see why you are adding <tt class="docutils literal">Ref</tt> to the end, but all API functions should return new references, so it is a bit like calling the function PyDict_GetItemNotWrong.</p> <p>Obviously, the ideal name [<tt class="docutils literal">PyDict_GetItem()</tt>] is already taken. Anyone have any suggestions for a better name?</p> </blockquote> <p><strong>Sam Gross</strong> wrote:</p> <blockquote> <p>In the context of PEP 703, I think it would be better to have variations that only change one axis of the semantics (e.g., new vs. borrowed, error vs. no error) and have the naming reflect that. For example, PEP 703 proposes:</p> <p><tt class="docutils literal">PyDict_FetchItem</tt> for <tt class="docutils literal">PyDict_GetItem</tt> and <tt class="docutils literal">PyDict_FetchItemWIthError</tt> for <tt class="docutils literal">PyDict_GetItemWithError</tt>.</p> </blockquote> <p>I created <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/52">Naming convention for new C API functions</a> to discuss the <tt class="docutils literal">Ref</tt> suffix for new functions returning a strong refererence.</p> <p>PEP 703 proposes <tt class="docutils literal">PyDict_FetchItem()</tt> name.</p> </div> <div class="section" id="first-argument-type"> <h2>First Argument Type</h2> <p><strong>Mark Shannon</strong> had concerns about the first argument type:</p> <blockquote> Using <tt class="docutils literal">PyObject*</tt> is needlessly throwing away type information.</blockquote> <p><strong>Erlend Aasland</strong> added:</p> <blockquote> Why not strongly typed, since it is a <tt class="docutils literal">PyDict_</tt> API?</blockquote> </div> <div class="section" id="pull-request-approvals-and-the-function-name-strikes-back"> <h2>Pull Request Approvals And The Function Name Strikes Back</h2> <p><strong>Erlend</strong> and <strong>Gregory</strong> approved my pull request.</p> <p><strong>Erlend</strong> wrote:</p> <blockquote> I'm approving this. A new naming scheme makes sense for a new API; I'm not sure it makes sense to try and enforce a new scheme in the current API. For now, there is already precedence of the <tt class="docutils literal">Ref</tt> suffix in the current API; I'm ok with that. Also, the current API uses <tt class="docutils literal">PyObject*</tt> all over the place. If we are to change this, we practically will end up with a completely new API; AFAICS, there is no problem with sticking to the current practice.</blockquote> <p>Then the discussion about the function name came back. So <strong>Gregory</strong> asked the Steering Council: <a class="reference external" href="https://github.com/python/steering-council/issues/201">Should we add non-borrowed-ref public C APIs, if so, is there a naming convention?</a>. He asked two questions:</p> <ul class="simple"> <li>Q1: Should we add non-borrowed-reference public C APIs where only borrowed-reference ones exist.</li> <li>Q2: if yes to Q1, is there a preferred naming convention to use for new public C APIs that return a strong reference when the earlier APIs these would be parallel versions of only returned a borrowed reference.</li> </ul> <p>Later, <strong>Serhiy Storchaka</strong> also approved the pull request:</p> <blockquote> <p>In general, I support adding this function. The benefits:</p> <ul class="simple"> <li>Returns a strong reference. It will save from some errors and may be better for PyPy.</li> <li>Save CPU time for calling PyErr Occurred().</li> </ul> </blockquote> <p>The PR had a total of 3 approvals.</p> </div> <div class="section" id="api-version-3-use-pydictobject"> <h2>API version 3: use PyDictObject</h2> <p>When I asked again <strong>Mark</strong> his opinion on the API, he wrote:</p> <blockquote> I'm opposed because making ad-hoc changes like this is going to make the C-API worse, not better.</blockquote> <p>I made the change asked by <strong>Mark</strong>, change the first parameter type from <tt class="docutils literal">PyObject*</tt> to <tt class="docutils literal">PyDictObject*</tt>. API version 3:</p> <pre class="literal-block"> int PyDict_GetItemRef(PyDictObject *op, PyObject *key, PyObject **pvalue) </pre> </div> <div class="section" id="disagreement-on-the-pydictobject-type"> <h2>Disagreement On The PyDictObject Type</h2> <p><strong>Serhiy</strong> was against the change:</p> <blockquote> I dislike using concrete struct types instead of <tt class="docutils literal">PyObject*</tt> in API, especially in public API. Isn't there a rule forbidding this?</blockquote> <p>In May, <strong>Mark</strong> created <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/31">The C API is weakly typed</a> discussion in the Problems project.</p> <p>During the discussion, <strong>Erlend</strong> created <a class="reference external" href="https://github.com/python/devguide/issues/1127">Document guidelines for when to use dynamically typed APIs</a> in the devguide to try to find a consensus regarding guidelines for weakly/stronly typed APIs.</p> <p>There are two questions:</p> <ul class="simple"> <li>Use <tt class="docutils literal">PyObject*</tt> or <tt class="docutils literal">PyDictObject*</tt> type for the parameter.</li> <li>Check the type at runtime, or don't check for best performance (use an assertion in debug mode).</li> </ul> <p><strong>Serhiy</strong> wrote:</p> <blockquote> <p>It is not about runtime checking.</p> <p>It is about requiring to cast the argument to <tt class="docutils literal">PyDictObject*</tt> every time you use the function: <tt class="docutils literal"><span class="pre">PyDict_GetItemRef((PyDictObject*)foo,</span> bar, &amp;baz)</tt>.</p> <p>It is tiresome, and it is unsafe, because the compiler will not reject the code if <tt class="docutils literal">foo</tt> is <tt class="docutils literal">int</tt> or <tt class="docutils literal">const char*</tt>.</p> </blockquote> <p><strong>Gregory</strong> added:</p> <blockquote> Our C API only accepts plain <tt class="docutils literal">PyObject*</tt> as input to all our public APIs. Otherwise user code will be littered with typecasts all over the place.</blockquote> <p><strong>Gregory</strong> removed his approval.</p> </div> <div class="section" id="revert-back-to-pyobject-type-api-version-2"> <h2>Revert: Back To PyObject Type (API Version 2)</h2> <p>Since <strong>Serhiy</strong> and <strong>Gregory</strong> were against the change, I reverted it to move back to the <tt class="docutils literal">PyObject*</tt> type. <strong>Serhiy</strong> and <strong>Erlend</strong> confirmed their approval.</p> <p>I created the issue <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/55">Design a brand new C API with new PyCAPI_ prefix where all functions respect new guidelines</a> in the Problems project to discuss the creation of a branch new API. I suggested <strong>Mark</strong> to only consider changing weakly type <tt class="docutils literal">PyObject*</tt> type to strongly typed <tt class="docutils literal">PyDictObject*</tt> in such new API.</p> </div> <div class="section" id="more-changes-api-version-4"> <h2>More changes? API version 4</h2> <p><strong>Petr Viktorin</strong> joined the discussion and proposed a late change:</p> <blockquote> FWIW, here's a possible new variant: you could set result to <tt class="docutils literal">NULL</tt> in which case the result isn't stored/incref'd. And that would start a convention of how to turn a get operation into a membership test. (And the Lookup name would fit that better.)</blockquote> <p>I didn't take <strong>Petr</strong>'s suggestion since <strong>Serhiy</strong> pointed out that there is already the <tt class="docutils literal">PyDict_Contains()</tt> function to test is a dictionary contains a key.</p> <p><strong>Mark Shannon</strong> wrote:</p> <blockquote> If this function is to take <tt class="docutils literal">PyObject*</tt>, as <strong>Erlend</strong> seems to insist, then it shouldn't raise a <tt class="docutils literal">SystemError</tt> when passed something other than a dict. It should raise a <tt class="docutils literal">TypeError</tt>.</blockquote> <p>I modified the API (version 4) to raise <tt class="docutils literal">SystemError</tt> if the first argument is not a dictionary, instead raising <tt class="docutils literal">TypeError</tt>.</p> </div> <div class="section" id="merge-the-change"> <h2>Merge The Change</h2> <p>After around 1 month of intense discussions, I merged my change adding the <tt class="docutils literal">PyDict_GetItemRef()</tt> function (<a class="reference external" href="https://github.com/python/cpython/commit/41ca16455188db806bfc7037058e8ecff2755e6c">commit</a>) with <a class="reference external" href="https://github.com/python/cpython/pull/106005#issuecomment-1646249360">a summary of the discussion</a>.</p> <p>I also <a class="reference external" href="https://github.com/python/pythoncapi-compat/commit/eaff3c172f94ed32ac38860c38d7a8fa27483e57">added the function to pythoncapi-compat project</a>.</p> <p>Final API:</p> <pre class="literal-block"> int PyDict_GetItemRef(PyObject *p, PyObject *key, PyObject **result) int PyDict_GetItemStringRef(PyObject *p, const char *key, PyObject **result) </pre> <p>Documentation:</p> <ul class="simple"> <li><a class="reference external" href="https://docs.python.org/dev/c-api/dict.html#c.PyDict_GetItemRef">PyDict_GetItemRef</a></li> <li><a class="reference external" href="https://docs.python.org/dev/c-api/dict.html#c.PyDict_GetItemStringRef">PyDict_GetItemStringRef</a></li> </ul> <p>Using the <a class="reference external" href="https://pythoncapi-compat.readthedocs.io/">pythoncapi-compat project</a>, you can use this new API right now on all Python versions!</p> </div> <div class="section" id="how-to-take-decisions"> <h2>How To Take Decisions?</h2> <p>The discussions occurred at many multiple places:</p> <ul class="simple"> <li>My Python issue</li> <li>My Python pull request</li> <li>Multiple Problems issues</li> <li>Multiple devguide issues</li> <li>Steering Council issue</li> </ul> <p>The discussion was heated. <strong>Erlend</strong> decided to take a break:</p> <blockquote> I'm taking a break from the C API discussions; I'm removing myself from this PR for now</blockquote> <p>While the change was approved by 3 core developers, there was not strictly a consensus since <strong>Mark</strong> did not formally approve the change. Some people asked to wait until some general guidelines for new APIs are decided, <strong>before</strong> making further C API changes.</p> <p><strong>Gregory</strong> opened a Steering Council issue at July 2. I asked for an update at July 17. Three meetings later, they didn't have the opportunity to visit the question. They were busy discussing the heavy <a class="reference external" href="https://peps.python.org/pep-0703/">PEP 703 – Making the Global Interpreter Lock Optional in CPython</a>. I merged my changed before the Steering Council spoke up. I proposed to revert the change if needed. At July 25, <strong>Gregory</strong> replied in the name of the Steering Council:</p> <blockquote> The steering council chatted about non-borrowed-ref and naming conventions today. We want to <strong>delegate</strong> this to the <strong>C API working group</strong> to come back with a broader recommendation. <strong>Irit Katriel</strong> has put together the initial draft of <a class="reference external" href="https://github.com/capi-workgroup/problems/blob/main/capi_problems.rst">An Evaluation of Python's Public C API</a> for example.</blockquote> <p>The problem was that the C API Working Group was just a GitHub organization, it was not an organized group with designated members.</p> </div> <div class="section" id="c-api-working-group"> <h2>C API Working Group</h2> <p>From October 9 to 14, there was a Core Dev Sprint at Brno (Czech Republic). I gave a talk about the C API status and my C API agenda: <a class="reference external" href="https://github.com/vstinner/talks/blob/main/2023-CoreDevSprint-Brno/c-api.pdf">slides of my C API talk</a>. At the end, I called to create a formal C API Working Group to unblock the situation.</p> <p>During the sprint, after my talk, <strong>Guido van Rossum</strong> wrote <a class="reference external" href="https://peps.python.org/pep-0731/">PEP 731 – C API Working Group Charter</a> with 5 members:</p> <ul class="simple"> <li><strong>Steve Dower</strong></li> <li><strong>Irit Katriel</strong></li> <li><strong>Guido van Rossum</strong></li> <li><strong>Victor Stinner</strong> (me)</li> <li><strong>Petr Viktorin</strong></li> </ul> <p>Once the PEP was published, it was <a class="reference external" href="https://discuss.python.org/t/pep-731-c-api-working-group-charter/36117">discussed on discuss.python.org</a>. Two weeks later, <strong>Guido</strong> submitted the PEP to the Steering Council: <a class="reference external" href="https://github.com/python/steering-council/issues/210">PEP 731 -- C API Working Group Charter</a>.</p> <p>The Steering Council didn't take a decision yet. Previously, the Steering Council expressed their desire to delegate some C API decisions to a C API Working Group.</p> </div> My contributions to Python (July 2023)2023-07-08T23:00:00+02:002023-07-08T23:00:00+02:00Victor Stinnertag:vstinner.github.io,2023-07-08:/contrib-python-july-2023.html<p>In 2023, between May 4 and July 8, I made 144 commits in the Python main branch. In this article, I describe the most important Python contributions that I made to Python 3.12 and Python 3.13 in these months.</p> <a class="reference external image-reference" href="https://twitter.com/foxes_in_love/status/1668558475490742277"> <img alt="Foxes in Love: Cuddle" src="https://vstinner.github.io/images/foxes_in_love_cuddle.jpg" /> </a> <p><em>Drawing: Foxes in Love: Cuddle</em></p> <div class="section" id="summary"> <h2>Summary</h2> <ul class="simple"> <li>Add PyImport_AddModuleRef() and …</li></ul></div><p>In 2023, between May 4 and July 8, I made 144 commits in the Python main branch. In this article, I describe the most important Python contributions that I made to Python 3.12 and Python 3.13 in these months.</p> <a class="reference external image-reference" href="https://twitter.com/foxes_in_love/status/1668558475490742277"> <img alt="Foxes in Love: Cuddle" src="https://vstinner.github.io/images/foxes_in_love_cuddle.jpg" /> </a> <p><em>Drawing: Foxes in Love: Cuddle</em></p> <div class="section" id="summary"> <h2>Summary</h2> <ul class="simple"> <li>Add PyImport_AddModuleRef() and PyWeakref_GetRef().</li> <li>Py_INCREF() and Py_DECREF() as opaque function call in limited C API.</li> <li>PyList_SET_ITEM() and PyTuple_SET_ITEM() checks index bounds.</li> <li>Define &quot;Soft Deprecation&quot; in PEP 387; getopt and optparse are soft deprecated.</li> <li>Document how to replace imp with importlib.</li> <li>Remove 19 stdlib modules.</li> <li>Remove locale.resetlocale() and logging.Logger.warn().</li> <li>Remove 181 private C API functions.</li> </ul> </div> <div class="section" id="pep-594"> <h2>PEP 594</h2> <p>In Python 3.13, I removed 19 modules deprecated in Python 3.11 by PEP 594:</p> <ul class="simple"> <li>aifc</li> <li>audioop</li> <li>cgi</li> <li>cgitb</li> <li>chunk</li> <li>crypt</li> <li>imghdr</li> <li>mailcap</li> <li>nis</li> <li>nntplib</li> <li>ossaudiodev</li> <li>pipes</li> <li>sndhdr</li> <li>spwd</li> <li>sunau</li> <li>telnetlib</li> <li>uu</li> <li>xdrlib</li> </ul> <p><em>Zachary Ware</em> removed the last deprecated module, msilib, so the PEP 594 is now fully implemented in Python 3.13!</p> <p>I announced the change: <a class="reference external" href="https://discuss.python.org/t/pep-594-has-been-implemented-python-3-13-removes-20-stdlib-modules/27124">PEP 594 has been implemented: Python 3.13 removes 20 stdlib modules</a>.</p> <p>Removing imghdr caused me some troubles with building the Python documentation. Sphinx uses imghdr, but recent Sphinx versions no longer use it. I updated the Sphinx version to workaround this issue.</p> </div> <div class="section" id="c-api-strong-reference"> <h2>C API: Strong reference</h2> <p><strong>tl; dr I added PyImport_AddModuleRef() and PyWeakref_GetRef() to Python 3.13 to return strong references, instead of borrowed references.</strong></p> <p>When I <a class="reference external" href="https://pythoncapi.readthedocs.io/">analyzed issues of Python C API</a>., I quickly identified that the usage of borrowed references is causing a lot of troubles. By the way, I recently updated the <a class="reference external" href="https://pythoncapi.readthedocs.io/bad_api.html#functions">list of the 41 functions returning borrowed refererences</a>. This issue is also tracked as <a class="reference external" href="https://github.com/capi-workgroup/problems/issues/21">Returning borrowed references is fundamentally unsafe</a> in the recently created <a class="reference external" href="https://github.com/capi-workgroup/problems/">Problems</a> project of the new C API workgroup.</p> <p>In Python 3.10, I added <tt class="docutils literal">Py_NewRef()</tt> and <tt class="docutils literal">Py_XNewRef()</tt> functions which have a better semantics: they create a new strong reference to a Python object. I also added the <tt class="docutils literal">PyModule_AddObjectRef()</tt> function, variant of <tt class="docutils literal">PyModule_AddObject()</tt>, which returns a strong reference. And I added <a class="reference external" href="https://docs.python.org/dev/glossary.html#term-borrowed-reference">borrowed reference</a> and <a class="reference external" href="https://docs.python.org/dev/glossary.html#term-strong-reference">strong reference</a> terms to the glossary.</p> <p>In Python 3.13, I added two functions:</p> <ul class="simple"> <li><strong>PyImport_AddModuleRef()</strong>: variant of <tt class="docutils literal">PyImport_AddModule()</tt></li> <li><strong>PyWeakref_GetRef()</strong>: variant of <tt class="docutils literal">PyWeakref_GetObject()</tt>. I also deprecated <tt class="docutils literal">PyWeakref_GetObject()</tt> and <tt class="docutils literal">PyWeakref_GET_OBJECT()</tt> functions.</li> </ul> <p>I updated pythoncapi-compat to <a class="reference external" href="https://pythoncapi-compat.readthedocs.io/en/latest/api.html#python-3-13">provide these functions to Python 3.12 and older</a>.</p> <p>I also added <tt class="docutils literal">Py_TYPE()</tt> to <tt class="docutils literal">Doc/data/refcounts.dat</tt>: file listing how C functions handle references, it's maintained manually.</p> <p>Now I'm working on adding <strong>PyDict_GetItemRef()</strong> but the API and the function name are causing more frictions: see the <a class="reference external" href="https://github.com/python/cpython/pull/106005">pull request</a>. Recently, PyDict_GetItemRef() API was raised to the Steering Council: <a class="reference external" href="https://github.com/python/steering-council/issues/201">decision: Should we add non-borrowed-ref public C APIs, if so, is there a naming convention?</a></p> </div> <div class="section" id="c-api-pylist-set-item"> <h2>C API: PyList_SET_ITEM()</h2> <p><strong>tl;dr In Python 3.13, PyList_SET_ITEM() and PyTuple_SET_ITEM() now checks index bounds.</strong></p> <p>In Python 3.9, <tt class="docutils literal">Include/cpython/listobject.h</tt> was created for the PyList API excluded from the limited C API. <tt class="docutils literal">PyList_SET_ITEM()</tt> was implemented as:</p> <pre class="literal-block"> #define PyList_SET_ITEM(op, i, v) (_PyList_CAST(op)-&gt;ob_item[i] = (v)) </pre> <p>In Python 3.10, the <a class="reference external" href="https://github.com/python/cpython/issues/74644">return value was removed to fix as bug</a> by adding <tt class="docutils literal">(void)</tt> cast:</p> <pre class="literal-block"> #define PyList_SET_ITEM(op, i, v) ((void)(_PyList_CAST(op)-&gt;ob_item[i] = (v))) </pre> <p>In Python 3.11, <a class="reference external" href="https://peps.python.org/pep-0670/">PEP 670: Convert macros to functions in the Python C API</a> was accepted and I converted the macro to a static inline function:</p> <pre class="literal-block"> static inline void PyList_SET_ITEM(PyObject *op, Py_ssize_t index, PyObject *value) { PyListObject *list = _PyList_CAST(op); list-&gt;ob_item[index] = value; } </pre> <p>I tried to add an assertion in <tt class="docutils literal">PyTuple_SET_ITEM()</tt> to check index bounds , but I got assertion failures when running the Python test suite related to PyStructSequence which inherits from PyTuple.</p> <p>Recently, I tried again. I updated the PyStructSequence API to check the index bounds differently. The tricky part is that getting the number of fields of a PyStructSequence requires to get an item of dictionary, and <tt class="docutils literal">PyDict_GetItemWithError()</tt> can raise an exception. Moreover, <tt class="docutils literal">PyStructSequence_SET_ITEM()</tt> was still implemented as a macro in Python 3.12:</p> <pre class="literal-block"> #define PyStructSequence_SET_ITEM(op, i, v) PyTuple_SET_ITEM((op), (i), (v)) </pre> <p>Old PyStructSequence_SetItem() implementation:</p> <pre class="literal-block"> void PyStructSequence_SetItem(PyObject* op, Py_ssize_t i, PyObject* v) { PyStructSequence_SET_ITEM(op, i, v); } </pre> <p>New implementation:</p> <pre class="literal-block"> void PyStructSequence_SetItem(PyObject *op, Py_ssize_t index, PyObject *value) { PyTupleObject *tuple = _PyTuple_CAST(op); assert(0 &lt;= index); #ifndef NDEBUG Py_ssize_t n_fields = REAL_SIZE(op); assert(n_fields &gt;= 0); assert(index &lt; n_fields); #endif tuple-&gt;ob_item[index] = value; } </pre> <p>The <tt class="docutils literal">REAL_SIZE()</tt> macro is only available in <tt class="docutils literal">Objects/structseq.c</tt>. Exposing it in the public C API would be a bad idea. So I just converted PyStructSequence_SET_ITEM() macro to an alias to PyStructSequence_SetItem():</p> <pre class="literal-block"> #define PyStructSequence_SET_ITEM PyStructSequence_SetItem </pre> <p>This way, PyStructSequence_SET_ITEM() and PyStructSequence_SetItem() are implemented as opaque function calls.</p> <p>So it became possible to check index bounds in PyList_SET_ITEM():</p> <pre class="literal-block"> static inline void PyList_SET_ITEM(PyObject *op, Py_ssize_t index, PyObject *value) { PyListObject *list = _PyList_CAST(op); assert(0 &lt;= index); assert(index &lt; Py_SIZE(list)); list-&gt;ob_item[index] = value; } </pre> <p>I had to modify code calling PyList_SET_ITEM() <em>before</em> setting the list size: list_extend() and _PyList_AppendTakeRef() functions. The size is now set before calling PyList_SET_ITEM().</p> <p>I made a similar change to <tt class="docutils literal">PyTuple_SET_ITEM()</tt> to also checks the index.</p> <p>These bound checks are implemented with an assertion if Python is built in debug mode or if Python is built with assertions.</p> </div> <div class="section" id="c-api-python-3-12-py-incref"> <h2>C API: Python 3.12 Py_INCREF()</h2> <p><strong>tl; dr I changed Py_INCREF() and Py_DECREF() implementation as opaque function calls in any version of the limited C API if Python is built in debug mode.</strong></p> <p>In Python 3.12, <a class="reference external" href="https://peps.python.org/pep-0683/">PEP 683 – Immortal Objects, Using a Fixed Refcount</a> was implemented. It made Py_INCREF() and Py_DECREF() static inline functions even more complicated than before. The implementation required to expose private <tt class="docutils literal">_Py_IncRefTotal_DO_NOT_USE_THIS()</tt> and <tt class="docutils literal">_Py_DecRefTotal_DO_NOT_USE_THIS()</tt> functions in the stable ABI, whereas the function names say &quot;DO NOT USE THIS&quot;, for debug builds of Python.</p> <p>In Python 3.10, I modified Py_INCREF() and Py_DECREF() to implement them as opaque function calls in the limited C API version 3.10 or newer if Python is built in debug mode (if <tt class="docutils literal">Py_REF_DEBUG</tt> macro is defined). Thanks to this change, the limited C API is supported if Python is built in debug mode since Python 3.10.</p> <p>In Python 3.12, I <strong>modified Py_INCREF() and Py_DECREF() to implement them as opaque function calls in all limited C API version</strong>, not only in the limited C API version 3.10 and newer, if Python is built in debug mode. This way, implementation details are now hidden and no longer leaked in the stable ABI. I removed <tt class="docutils literal">_Py_NegativeRefcount()</tt> in the limited C API and I removed <tt class="docutils literal">_Py_IncRefTotal_DO_NOT_USE_THIS()</tt> and <tt class="docutils literal">_Py_DecRefTotal_DO_NOT_USE_THIS()</tt> in the stable ABI.</p> <p>Later, I discovered that my fix broke backward compatibility with Python 3.9. My implementation used <tt class="docutils literal">_Py_IncRef()</tt> and <tt class="docutils literal">_Py_DecRef()</tt> that I added to Python 3.10. I updated the implementation to use <tt class="docutils literal">Py_IncRef()</tt> and <tt class="docutils literal">Py_DecRef()</tt> on Python 3.9 and older, these functions are available since Python 2.4.</p> </div> <div class="section" id="c-api-py-incref-opaque-function-call"> <h2>C API: Py_INCREF() opaque function call</h2> <p><strong>tl; dr I changed Py_INCREF() and Py_DECREF() implementation as opaque function calls in the limited C API version 3.12.</strong> (also in the regular release build, not only in the debug build)</p> <p>In Python 3.8, I converted Py_INCREF() and Py_DECREF() macros to static inline functions. I already wanted to convert them as opaque function calls, but it can have an important cost on performance and so I left them as static inline functions.</p> <p>As a follow-up of my Python 3.12 Py_INCREF() fix for the debug build, I modified Py_INCREF() and Py_DECREF() in Python 3.12 to always implemented them as <strong>opaque function calls in the limited C API version 3.12</strong> and newer.</p> <ul class="simple"> <li>Discussion: <a class="reference external" href="https://discuss.python.org/t/limited-c-api-implement-py-incref-and-py-decref-as-function-calls/27592">Limited C API: implement Py_INCREF() and Py_DECREF() as function calls</a></li> <li><a class="reference external" href="https://github.com/python/cpython/pull/105388">Pull request</a></li> </ul> <p>For me, it's a <strong>major enhancement</strong> to make the stable ABI more <strong>future proof</strong> by leaking less implementation details.</p> <p><a class="reference external" href="https://github.com/python/cpython/blob/da98ed0aa040791ef08b24befab697038c8c9fd5/Include/object.h#L613-L622">Code</a>:</p> <pre class="literal-block"> static inline Py_ALWAYS_INLINE void Py_INCREF(PyObject *op) { #if defined(Py_LIMITED_API) &amp;&amp; (Py_LIMITED_API+0 &gt;= 0x030c0000 || defined(Py_REF_DEBUG)) // Stable ABI implements Py_INCREF() as a function call on limited C API // version 3.12 and newer, and on Python built in debug mode. _Py_IncRef() // was added to Python 3.10.0a7, use Py_IncRef() on older Python versions. // Py_IncRef() accepts NULL whereas _Py_IncRef() doesn't. # if Py_LIMITED_API+0 &gt;= 0x030a00A7 _Py_IncRef(op); # else Py_IncRef(op); # endif #else ... #endif } </pre> </div> <div class="section" id="tests"> <h2>Tests</h2> <p>The Python test runner <em>regrtest</em> has specific constraints because tests are run in subprocesses, on different platforms, with custom encodings and options. Over the last year, an annoying regrtest came and go: if a subprocess standard output (stdout) cannot be decoded, the test is treated as a success! I fixed <a class="reference external" href="https://github.com/python/cpython/issues/101634">the bug</a> and I made the code more reliable by marking this bug class as &quot;test failed&quot;.</p> <p>I fixed test_counter_optimizer() of test_capi when run twice: create a new function at each call, so each run starts in a known state. Previously, the second run was in a different state since the function was already optimized.</p> <p>I cleaned up old test_ctypes. My main goal was to remove <tt class="docutils literal">from ctypes import *</tt> to be able to use pyflakes on these tests. I found many skipped tests: I reenabled 3 of them, and removed the other ones. I also removed dead code.</p> <p>I removed test_xmlrpc_net: it was skipped since 2017. The public <tt class="docutils literal">buildbot.python.org</tt> server has no XML-RPC interface anymore, and no replacement public XML-RPC server was found in 6 years.</p> <p>I fixed dangling threads in <tt class="docutils literal">test_importlib.test_side_effect_import()</tt>: the import spawns threads, wait until they complete.</p> </div> <div class="section" id="c-api-deprecate"> <h2>C API: Deprecate</h2> <p>I listed <a class="reference external" href="https://docs.python.org/dev/whatsnew/3.13.html#pending-removal-in-python-3-14">pending C API removals</a> in the What's New in Python 3.13 document.</p> <p>I deprecated multiple APIs:</p> <ul class="simple"> <li>Py_UNICODE and PY_UNICODE_TYPE</li> <li>PyImport_ImportModuleNoBlock()</li> <li>Py_HasFileSystemDefaultEncoding</li> </ul> <p>I deprecated legacy Python initialization functions:</p> <ul class="simple"> <li>PySys_ResetWarnOptions()</li> <li>Py_GetExecPrefix()</li> <li>Py_GetPath()</li> <li>Py_GetPrefix()</li> <li>Py_GetProgramFullPath()</li> <li>Py_GetProgramName()</li> <li>Py_GetPythonHome()</li> </ul> <p>I removed the PyArg_Parse() deprecation. In 2007, the deprecation was added as a comment to the documentation, but the function remains relevant in Python 3.13 for some specific use cases.</p> </div> <div class="section" id="soft-deprecation"> <h2>Soft Deprecation</h2> <p><strong>tl; dr The getopt module is now soft deprecated.</strong></p> <p>I updated <a class="reference external" href="https://peps.python.org/pep-0387/">PEP 387: Backwards Compatibility Policy</a> to add <a class="reference external" href="https://peps.python.org/pep-0387/#soft-deprecation">Soft Deprecation</a>:</p> <blockquote> <p>A soft deprecation can be used when using an API which should no longer be used to write new code, but it remains safe to continue using it in existing code. The API remains documented and tested, but will not be developed further (no enhancement).</p> <p>The main difference between a “soft” and a (regular) “hard” deprecation is that the soft deprecation does not imply scheduling the removal of the deprecated API.</p> </blockquote> <p>I converted <strong>optparse</strong> deprecation to a <strong>soft deprecation</strong>.</p> <p>I soft deprecated the <strong>getopt</strong> module: it remains available and maintained, but argparse should be preferred for new projects.</p> </div> <div class="section" id="deprecate"> <h2>Deprecate</h2> <p>I deprecated the <tt class="docutils literal">getmark()</tt>, <tt class="docutils literal">setmark()</tt> and <tt class="docutils literal">getmarkers()</tt> methods of the Wave_read and Wave_write classes. These methods only existed for compatibility with the aifc module, but they did nothing or always failed, and the aifc module was removed in Python 3.13.</p> <p>I also deprecated <tt class="docutils literal">SetPointerType()</tt> and <tt class="docutils literal">ARRAY()</tt> functions of ctypes.</p> </div> <div class="section" id="c-api-remove"> <h2>C API: Remove</h2> <ul class="simple"> <li>I removed the following old functions to configure the Python initialization, that I deprecated in Python 3.11:<ul> <li>PySys_AddWarnOptionUnicode()</li> <li>PySys_AddWarnOption()</li> <li>PySys_AddXOption()</li> <li>PySys_HasWarnOptions()</li> <li>PySys_SetArgvEx()</li> <li>PySys_SetArgv()</li> <li>PySys_SetPath()</li> <li>Py_SetPath()</li> <li>Py_SetProgramName()</li> <li>Py_SetPythonHome()</li> <li>Py_SetStandardStreamEncoding()</li> <li>_Py_SetProgramFullPath()</li> </ul> </li> <li>I also deprecated removed &quot;call&quot; functions:<ul> <li>PyCFunction_Call()</li> <li>PyEval_CallFunction()</li> <li>PyEval_CallMethod()</li> <li>PyEval_CallObject()</li> <li>PyEval_CallObjectWithKeywords()</li> </ul> </li> <li>I removed deprecated PyEval_AcquireLock() and PyEval_InitThreads() functions.</li> <li>Remove old aliases which were kept backwards compatibility with Python 3.8:<ul> <li>_PyObject_CallMethodNoArgs()</li> <li>_PyObject_CallMethodOneArg()</li> <li>_PyObject_CallOneArg()</li> <li>_PyObject_FastCallDict()</li> <li>_PyObject_Vectorcall()</li> <li>_PyObject_VectorcallMethod()</li> <li>_PyVectorcall_Function()</li> </ul> </li> </ul> </div> <div class="section" id="remove"> <h2>Remove</h2> <p>I removed <strong>locale.resetlocale()</strong> function, but I failed to remove locale.getdefaultlocale() in Python 3.13: INADA-san asked me to keep it.</p> <p>I removed the untested and not documented <strong>logging.Logger.warn()</strong> method.</p> <p>Oh, I forgot to remove <strong>cafile</strong>, <strong>capath</strong> and <strong>cadefault</strong> parameters of the <strong>urllib.request.urlopen()</strong> function: it's now also done in Python 3.13. I removed similar parameters in many other modules in Python 3.12.</p> </div> <div class="section" id="cleanup"> <h2>Cleanup</h2> <p>As usual, I removed a bunch of unused imports (in the stdlib, tests and tools).</p> <p>I reimplemented xmlrpc.client <tt class="docutils literal">_iso8601_format()</tt> function with <tt class="docutils literal">datetime.datetime.isoformat()</tt>. The timezone is ignored on purpose: the XML-RPC specification doesn't explain how to handle it, many implementations ignore it.</p> </div> <div class="section" id="port-imp-code-to-importlib"> <h2>Port imp code to importlib</h2> <p>The importlib module was added to Python 3.1 and it became the default in Python 3.3. The imp module was deprecated in Python 3.4 but was only removed in Python 3.12. Replacing imp code with importlib is not trivial: importlib has a different design and API.</p> <p>I wrote documentation on how to port imp code to importlib in <a class="reference external" href="https://docs.python.org/dev/whatsnew/3.12.html#removed">What's New in Python 3.12</a>.</p> <p>I proposed <a class="reference external" href="https://github.com/python/cpython/pull/105755">adding importlib.util.load_source_path() function</a>, but I understood that the devil is in details: it's hard to decide how to handle the <tt class="docutils literal">sys.modules</tt> cache. I gave up and instead added a recipe in the What's New in Python 3.12 documentation:</p> <pre class="literal-block"> import importlib.util import importlib.machinery def load_source(modname, filename): loader = importlib.machinery.SourceFileLoader(modname, filename) spec = importlib.util.spec_from_file_location(modname, filename, loader=loader) module = importlib.util.module_from_spec(spec) # The module is always executed and not cached in sys.modules. # Uncomment the following line to cache the module. # sys.modules[module.__name__] = module loader.exec_module(module) return module </pre> <p>There are many projects affected by the imp removal and porting them is not easy. See <a class="reference external" href="https://discuss.python.org/t/how-do-i-migrate-from-imp/27885">How do I migrate from imp?</a> discussion.</p> </div> <div class="section" id="c-api-remove-private-functions"> <h2>C API: Remove private functions</h2> <p>Last but not least, in <a class="reference external" href="https://github.com/python/cpython/issues/106320">issue #106320</a>, I <strong>removed</strong> not less than <strong>181 private C API functions</strong>.</p> <p>As a reaction to my changes, a discussion was started to propose <a class="reference external" href="https://discuss.python.org/t/pssst-lets-treat-all-api-in-public-headers-as-public/28916">treating private functions as public functions</a>.</p> <p>I'm now working on identifying projects affected by these removals and on proposing solutions for the most commonly used removed functions like the <tt class="docutils literal">_PyObject_Vectorcall()</tt> alias.</p> <p>The list of the 181 removed private C API functions:</p> <ul class="simple"> <li><tt class="docutils literal">_PyArg_NoKwnames()</tt></li> <li><tt class="docutils literal">_PyBytesWriter_Alloc()</tt></li> <li><tt class="docutils literal">_PyBytesWriter_Dealloc()</tt></li> <li><tt class="docutils literal">_PyBytesWriter_Finish()</tt></li> <li><tt class="docutils literal">_PyBytesWriter_Init()</tt></li> <li><tt class="docutils literal">_PyBytesWriter_Prepare()</tt></li> <li><tt class="docutils literal">_PyBytesWriter_Resize()</tt></li> <li><tt class="docutils literal">_PyBytesWriter_WriteBytes()</tt></li> <li><tt class="docutils literal">_PyCodecInfo_GetIncrementalDecoder()</tt></li> <li><tt class="docutils literal">_PyCodecInfo_GetIncrementalEncoder()</tt></li> <li><tt class="docutils literal">_PyCodec_DecodeText()</tt></li> <li><tt class="docutils literal">_PyCodec_EncodeText()</tt></li> <li><tt class="docutils literal">_PyCodec_Forget()</tt></li> <li><tt class="docutils literal">_PyCodec_Lookup()</tt></li> <li><tt class="docutils literal">_PyCodec_LookupTextEncoding()</tt></li> <li><tt class="docutils literal">_PyComplex_FormatAdvancedWriter()</tt></li> <li><tt class="docutils literal">_PyDeadline_Get()</tt></li> <li><tt class="docutils literal">_PyDeadline_Init()</tt></li> <li><tt class="docutils literal">_PyErr_CheckSignals()</tt></li> <li><tt class="docutils literal">_PyErr_FormatFromCause()</tt></li> <li><tt class="docutils literal">_PyErr_GetExcInfo()</tt></li> <li><tt class="docutils literal">_PyErr_GetHandledException()</tt></li> <li><tt class="docutils literal">_PyErr_GetTopmostException()</tt></li> <li><tt class="docutils literal">_PyErr_ProgramDecodedTextObject()</tt></li> <li><tt class="docutils literal">_PyErr_SetHandledException()</tt></li> <li><tt class="docutils literal">_PyException_AddNote()</tt></li> <li><tt class="docutils literal">_PyImport_AcquireLock()</tt></li> <li><tt class="docutils literal">_PyImport_FixupBuiltin()</tt></li> <li><tt class="docutils literal">_PyImport_FixupExtensionObject()</tt></li> <li><tt class="docutils literal">_PyImport_GetModuleAttr()</tt></li> <li><tt class="docutils literal">_PyImport_GetModuleAttrString()</tt></li> <li><tt class="docutils literal">_PyImport_GetModuleId()</tt></li> <li><tt class="docutils literal">_PyImport_IsInitialized()</tt></li> <li><tt class="docutils literal">_PyImport_ReleaseLock()</tt></li> <li><tt class="docutils literal">_PyImport_SetModule()</tt></li> <li><tt class="docutils literal">_PyImport_SetModuleString()</tt></li> <li><tt class="docutils literal">_PyInterpreterState_Get()</tt></li> <li><tt class="docutils literal">_PyInterpreterState_GetConfig()</tt></li> <li><tt class="docutils literal">_PyInterpreterState_GetConfigCopy()</tt></li> <li><tt class="docutils literal">_PyInterpreterState_GetMainModule()</tt></li> <li><tt class="docutils literal">_PyInterpreterState_HasFeature()</tt></li> <li><tt class="docutils literal">_PyInterpreterState_SetConfig()</tt></li> <li><tt class="docutils literal">_PyLong_AsTime_t()</tt></li> <li><tt class="docutils literal">_PyLong_FromTime_t()</tt></li> <li><tt class="docutils literal">_PyModule_CreateInitialized()</tt></li> <li><tt class="docutils literal">_PyOS_URandom()</tt></li> <li><tt class="docutils literal">_PyOS_URandomNonblock()</tt></li> <li><tt class="docutils literal">_PyObject_CallMethod()</tt></li> <li><tt class="docutils literal">_PyObject_CallMethodId()</tt></li> <li><tt class="docutils literal">_PyObject_CallMethodIdNoArgs()</tt></li> <li><tt class="docutils literal">_PyObject_CallMethodIdObjArgs()</tt></li> <li><tt class="docutils literal">_PyObject_CallMethodIdOneArg()</tt></li> <li><tt class="docutils literal">_PyObject_CallMethodNoArgs()</tt></li> <li><tt class="docutils literal">_PyObject_CallMethodOneArg()</tt></li> <li><tt class="docutils literal">_PyObject_CallOneArg()</tt></li> <li><tt class="docutils literal">_PyObject_FastCallDict()</tt></li> <li><tt class="docutils literal">_PyObject_HasLen()</tt></li> <li><tt class="docutils literal">_PyObject_MakeTpCall()</tt></li> <li><tt class="docutils literal">_PyObject_RealIsInstance()</tt></li> <li><tt class="docutils literal">_PyObject_RealIsSubclass()</tt></li> <li><tt class="docutils literal">_PyObject_Vectorcall()</tt></li> <li><tt class="docutils literal">_PyObject_VectorcallMethod()</tt></li> <li><tt class="docutils literal">_PyObject_VectorcallMethodId()</tt></li> <li><tt class="docutils literal">_PySequence_BytesToCharpArray()</tt></li> <li><tt class="docutils literal">_PySequence_IterSearch()</tt></li> <li><tt class="docutils literal">_PyStack_AsDict()</tt></li> <li><tt class="docutils literal">_PyThreadState_GetDict()</tt></li> <li><tt class="docutils literal">_PyThreadState_Prealloc()</tt></li> <li><tt class="docutils literal">_PyThread_CurrentExceptions()</tt></li> <li><tt class="docutils literal">_PyThread_CurrentFrames()</tt></li> <li><tt class="docutils literal">_PyTime_Add()</tt></li> <li><tt class="docutils literal">_PyTime_As100Nanoseconds()</tt></li> <li><tt class="docutils literal">_PyTime_AsMicroseconds()</tt></li> <li><tt class="docutils literal">_PyTime_AsMilliseconds()</tt></li> <li><tt class="docutils literal">_PyTime_AsNanoseconds()</tt></li> <li><tt class="docutils literal">_PyTime_AsNanosecondsObject()</tt></li> <li><tt class="docutils literal">_PyTime_AsSecondsDouble()</tt></li> <li><tt class="docutils literal">_PyTime_AsTimespec()</tt></li> <li><tt class="docutils literal">_PyTime_AsTimespec_clamp()</tt></li> <li><tt class="docutils literal">_PyTime_AsTimeval()</tt></li> <li><tt class="docutils literal">_PyTime_AsTimevalTime_t()</tt></li> <li><tt class="docutils literal">_PyTime_AsTimeval_clamp()</tt></li> <li><tt class="docutils literal">_PyTime_FromMicrosecondsClamp()</tt></li> <li><tt class="docutils literal">_PyTime_FromMillisecondsObject()</tt></li> <li><tt class="docutils literal">_PyTime_FromNanoseconds()</tt></li> <li><tt class="docutils literal">_PyTime_FromNanosecondsObject()</tt></li> <li><tt class="docutils literal">_PyTime_FromSeconds()</tt></li> <li><tt class="docutils literal">_PyTime_FromSecondsObject()</tt></li> <li><tt class="docutils literal">_PyTime_FromTimespec()</tt></li> <li><tt class="docutils literal">_PyTime_FromTimeval()</tt></li> <li><tt class="docutils literal">_PyTime_GetMonotonicClock()</tt></li> <li><tt class="docutils literal">_PyTime_GetMonotonicClockWithInfo()</tt></li> <li><tt class="docutils literal">_PyTime_GetPerfCounter()</tt></li> <li><tt class="docutils literal">_PyTime_GetPerfCounterWithInfo()</tt></li> <li><tt class="docutils literal">_PyTime_GetSystemClock()</tt></li> <li><tt class="docutils literal">_PyTime_GetSystemClockWithInfo()</tt></li> <li><tt class="docutils literal">_PyTime_MulDiv()</tt></li> <li><tt class="docutils literal">_PyTime_ObjectToTime_t()</tt></li> <li><tt class="docutils literal">_PyTime_ObjectToTimespec()</tt></li> <li><tt class="docutils literal">_PyTime_ObjectToTimeval()</tt></li> <li><tt class="docutils literal">_PyTime_gmtime()</tt></li> <li><tt class="docutils literal">_PyTime_localtime()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_ClearTraces()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_GetMemory()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_GetObjectTraceback()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_GetTraceback()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_GetTracebackLimit()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_GetTracedMemory()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_GetTraces()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_Init()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_IsTracing()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_ResetPeak()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_Start()</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_Stop()</tt></li> <li><tt class="docutils literal">_PyUnicodeTranslateError_Create()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_Dealloc()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_Finish()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_Init()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_PrepareInternal()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_PrepareKindInternal()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteASCIIString()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteChar()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteLatin1String()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteStr()</tt></li> <li><tt class="docutils literal">_PyUnicodeWriter_WriteSubstring()</tt></li> <li><tt class="docutils literal">_PyUnicode_AsASCIIString()</tt></li> <li><tt class="docutils literal">_PyUnicode_AsLatin1String()</tt></li> <li><tt class="docutils literal">_PyUnicode_AsUTF8String()</tt></li> <li><tt class="docutils literal">_PyUnicode_CheckConsistency()</tt></li> <li><tt class="docutils literal">_PyUnicode_Copy()</tt></li> <li><tt class="docutils literal">_PyUnicode_DecodeRawUnicodeEscapeStateful()</tt></li> <li><tt class="docutils literal">_PyUnicode_DecodeUnicodeEscapeInternal()</tt></li> <li><tt class="docutils literal">_PyUnicode_DecodeUnicodeEscapeStateful()</tt></li> <li><tt class="docutils literal">_PyUnicode_EQ()</tt></li> <li><tt class="docutils literal">_PyUnicode_EncodeCharmap()</tt></li> <li><tt class="docutils literal">_PyUnicode_EncodeUTF16()</tt></li> <li><tt class="docutils literal">_PyUnicode_EncodeUTF32()</tt></li> <li><tt class="docutils literal">_PyUnicode_EncodeUTF7()</tt></li> <li><tt class="docutils literal">_PyUnicode_Equal()</tt></li> <li><tt class="docutils literal">_PyUnicode_EqualToASCIIId()</tt></li> <li><tt class="docutils literal">_PyUnicode_EqualToASCIIString()</tt></li> <li><tt class="docutils literal">_PyUnicode_FastCopyCharacters()</tt></li> <li><tt class="docutils literal">_PyUnicode_FastFill()</tt></li> <li><tt class="docutils literal">_PyUnicode_FindMaxChar ()</tt></li> <li><tt class="docutils literal">_PyUnicode_FormatAdvancedWriter()</tt></li> <li><tt class="docutils literal">_PyUnicode_FormatLong()</tt></li> <li><tt class="docutils literal">_PyUnicode_FromASCII()</tt></li> <li><tt class="docutils literal">_PyUnicode_FromId()</tt></li> <li><tt class="docutils literal">_PyUnicode_InsertThousandsGrouping()</tt></li> <li><tt class="docutils literal">_PyUnicode_JoinArray()</tt></li> <li><tt class="docutils literal">_PyUnicode_ScanIdentifier()</tt></li> <li><tt class="docutils literal">_PyUnicode_TransformDecimalAndSpaceToASCII()</tt></li> <li><tt class="docutils literal">_PyUnicode_WideCharString_Converter()</tt></li> <li><tt class="docutils literal">_PyUnicode_WideCharString_Opt_Converter()</tt></li> <li><tt class="docutils literal">_PyUnicode_XStrip()</tt></li> <li><tt class="docutils literal">_PyVectorcall_Function()</tt></li> <li><tt class="docutils literal">_Py_AtExit()</tt></li> <li><tt class="docutils literal">_Py_CheckFunctionResult()</tt></li> <li><tt class="docutils literal">_Py_CoerceLegacyLocale()</tt></li> <li><tt class="docutils literal">_Py_FatalErrorFormat()</tt></li> <li><tt class="docutils literal">_Py_FdIsInteractive()</tt></li> <li><tt class="docutils literal">_Py_FreeCharPArray()</tt></li> <li><tt class="docutils literal">_Py_GetConfig()</tt></li> <li><tt class="docutils literal">_Py_IsCoreInitialized()</tt></li> <li><tt class="docutils literal">_Py_IsFinalizing()</tt></li> <li><tt class="docutils literal">_Py_IsInterpreterFinalizing()</tt></li> <li><tt class="docutils literal">_Py_LegacyLocaleDetected()</tt></li> <li><tt class="docutils literal">_Py_RestoreSignals()</tt></li> <li><tt class="docutils literal">_Py_SetLocaleFromEnv()</tt></li> <li><tt class="docutils literal">_Py_VaBuildStack()</tt></li> <li><tt class="docutils literal">_Py_add_one_to_index_C()</tt></li> <li><tt class="docutils literal">_Py_add_one_to_index_F()</tt></li> <li><tt class="docutils literal">_Py_c_abs()</tt></li> <li><tt class="docutils literal">_Py_c_diff()</tt></li> <li><tt class="docutils literal">_Py_c_neg()</tt></li> <li><tt class="docutils literal">_Py_c_pow()</tt></li> <li><tt class="docutils literal">_Py_c_prod()</tt></li> <li><tt class="docutils literal">_Py_c_quot()</tt></li> <li><tt class="docutils literal">_Py_c_sum()</tt></li> <li><tt class="docutils literal">_Py_gitidentifier()</tt></li> <li><tt class="docutils literal">_Py_gitversion()</tt></li> </ul> </div> Convert macros to functions in the Python C API2022-12-12T23:00:00+01:002022-12-12T23:00:00+01:00Victor Stinnertag:vstinner.github.io,2022-12-12:/c-api-convert-macros-functions.html<a class="reference external image-reference" href="https://www.exemplaire-editions.fr/librairie/livre/loeil-du-cyclone"> <img alt="L'oeil du cyclone - Théo Grosjean" src="https://vstinner.github.io/images/loeil_cyclone.jpg" /> </a> <p><em>Drawing: &quot;L'oeil du cyclone&quot; by Théo Grosjean.</em></p> <div class="section" id="convert-macros-to-functions"> <h2>Convert macros to functions</h2> <p>For 4 years, between Python 3.7 (2018) and Python 3.12 (2022), I made many changes on macros in the Python C API to make the API less error prone (avoid <a class="reference external" href="https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html">macro pitfalls</a>) and better define the API …</p></div><a class="reference external image-reference" href="https://www.exemplaire-editions.fr/librairie/livre/loeil-du-cyclone"> <img alt="L'oeil du cyclone - Théo Grosjean" src="https://vstinner.github.io/images/loeil_cyclone.jpg" /> </a> <p><em>Drawing: &quot;L'oeil du cyclone&quot; by Théo Grosjean.</em></p> <div class="section" id="convert-macros-to-functions"> <h2>Convert macros to functions</h2> <p>For 4 years, between Python 3.7 (2018) and Python 3.12 (2022), I made many changes on macros in the Python C API to make the API less error prone (avoid <a class="reference external" href="https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html">macro pitfalls</a>) and better define the API (parameter types and return types, variable scope, etc.). <a class="reference external" href="https://peps.python.org/pep-0670/">PEP 670</a> &quot;Convert macros to functions in the Python C API&quot; describes in length the rationale of these changes.</p> <p>I moved private functions to the internal C API to reduce the C API size.</p> <p>Some changes are also related to preparing the API to make members of structures like <tt class="docutils literal">PyObject</tt> or <tt class="docutils literal">PyTypeObject</tt> private.</p> <p>Converting macros and static inline functions to regular functions hides implementation details and bends the API towards the limited C API and the stable ABI (build a C extension once, use the binary on multiple Python versions). Regular functions are usable in programming languages and use cases which cannot use C macros and C static inline functions.</p> <p>Most macros are converted to static inline functions, rather regular functions, to have no impact on performance.</p> <p>This work was made incrementally in 5 Python versions (3.8, 3.9, 3.10, 3.11 and 3.12) to limit the number of impacted projects at each Python release.</p> <p>Changing <tt class="docutils literal">Py_TYPE()</tt> and <tt class="docutils literal">Py_SIZE()</tt> macros impacted most projects. Python 3.11 contains the change. During Python 3.10 development cycle, the change has to be reverted since it impacted too many projects.</p> <p>Note: I didn't modify all macros and functions listed in this article, it's a collaborative work as usual.</p> </div> <div class="section" id="statistics"> <h2>Statistics</h2> <p><a class="reference external" href="https://pythoncapi.readthedocs.io/stats.html">Statistics on public functions</a>:</p> <ul class="simple"> <li>Python 3.7: 893 regular functions, 315 macros.</li> <li>Python 3.12: 943 regular functions, 246 macros, 69 static inline functions.</li> </ul> <p>Cumulative changes on macros between Python 3.7 and Python 3.12 on public, private and internal APIs:</p> <ul class="simple"> <li>Converted 88 macros to static inline functions</li> <li>Converted 11 macros to regular functions</li> <li>Converted 3 static inline functions to regular functions:</li> <li>Removed 47 macros</li> </ul> <p>See <a class="reference external" href="https://pythoncapi.readthedocs.io/stats.html">Statistics on the Python C API</a> for more numbers.</p> </div> <div class="section" id="python-3-12"> <h2>Python 3.12</h2> <p>Convert 39 macros to static inline functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyCell_GET()</tt></li> <li><tt class="docutils literal">PyCell_SET()</tt></li> <li><tt class="docutils literal">PyCode_GetNumFree()</tt></li> <li><tt class="docutils literal">PyDict_GET_SIZE()</tt></li> <li><tt class="docutils literal">PyFloat_AS_DOUBLE()</tt></li> <li><tt class="docutils literal">PyFunction_GET_ANNOTATIONS()</tt></li> <li><tt class="docutils literal">PyFunction_GET_CLOSURE()</tt></li> <li><tt class="docutils literal">PyFunction_GET_CODE()</tt></li> <li><tt class="docutils literal">PyFunction_GET_DEFAULTS()</tt></li> <li><tt class="docutils literal">PyFunction_GET_GLOBALS()</tt></li> <li><tt class="docutils literal">PyFunction_GET_KW_DEFAULTS()</tt></li> <li><tt class="docutils literal">PyFunction_GET_MODULE()</tt></li> <li><tt class="docutils literal">PyInstanceMethod_GET_FUNCTION()</tt></li> <li><tt class="docutils literal">PyMemoryView_GET_BASE()</tt></li> <li><tt class="docutils literal">PyMemoryView_GET_BUFFER()</tt></li> <li><tt class="docutils literal">PyMethod_GET_FUNCTION()</tt></li> <li><tt class="docutils literal">PyMethod_GET_SELF()</tt></li> <li><tt class="docutils literal">PySet_GET_SIZE()</tt></li> <li><tt class="docutils literal">Py_UNICODE_HIGH_SURROGATE()</tt></li> <li><tt class="docutils literal">Py_UNICODE_ISALNUM()</tt></li> <li><tt class="docutils literal">Py_UNICODE_ISSPACE()</tt></li> <li><tt class="docutils literal">Py_UNICODE_IS_HIGH_SURROGATE()</tt></li> <li><tt class="docutils literal">Py_UNICODE_IS_LOW_SURROGATE()</tt></li> <li><tt class="docutils literal">Py_UNICODE_IS_SURROGATE()</tt></li> <li><tt class="docutils literal">Py_UNICODE_JOIN_SURROGATES()</tt></li> <li><tt class="docutils literal">Py_UNICODE_LOW_SURROGATE()</tt></li> <li><tt class="docutils literal">_PyGCHead_FINALIZED()</tt></li> <li><tt class="docutils literal">_PyGCHead_NEXT()</tt></li> <li><tt class="docutils literal">_PyGCHead_PREV()</tt></li> <li><tt class="docutils literal">_PyGCHead_SET_FINALIZED()</tt></li> <li><tt class="docutils literal">_PyGCHead_SET_NEXT()</tt></li> <li><tt class="docutils literal">_PyGCHead_SET_PREV()</tt></li> <li><tt class="docutils literal">_PyGC_FINALIZED()</tt></li> <li><tt class="docutils literal">_PyGC_SET_FINALIZED()</tt></li> <li><tt class="docutils literal">_PyObject_GC_IS_TRACKED()</tt></li> <li><tt class="docutils literal">_PyObject_GC_MAY_BE_TRACKED()</tt></li> <li><tt class="docutils literal">_PyObject_SIZE()</tt></li> <li><tt class="docutils literal">_PyObject_VAR_SIZE()</tt></li> <li><tt class="docutils literal">_Py_AS_GC()</tt></li> </ul> <p>Remove 5 macros:</p> <ul class="simple"> <li><tt class="docutils literal">PyUnicode_AS_DATA()</tt></li> <li><tt class="docutils literal">PyUnicode_AS_UNICODE()</tt></li> <li><tt class="docutils literal">PyUnicode_GET_DATA_SIZE()</tt></li> <li><tt class="docutils literal">PyUnicode_GET_SIZE()</tt></li> <li><tt class="docutils literal">PyUnicode_WSTR_LENGTH()</tt></li> </ul> <p>The following 4 macros can be used as l-values in Python 3.12:</p> <ul class="simple"> <li><tt class="docutils literal">PyList_GET_ITEM()</tt></li> <li><tt class="docutils literal">PyTuple_GET_ITEM()</tt>:</li> <li><tt class="docutils literal">PyDescr_NAME()</tt></li> <li><tt class="docutils literal">PyDescr_TYPE()</tt></li> </ul> <p>Code pattern like <tt class="docutils literal">&amp;PyTuple_GET_ITEM(tuple, 0)</tt> and <tt class="docutils literal">&amp;PyList_GET_ITEM(list, 0)</tt> is still commonly used to get a direct access to items as <tt class="docutils literal">PyObject**</tt>. <tt class="docutils literal">PyDescr_NAME()</tt> and <tt class="docutils literal">PyDescr_TYPE()</tt> are used by SWIG: see <a class="reference external" href="https://bugs.python.org/issue46538">https://bugs.python.org/issue46538</a></p> </div> <div class="section" id="python-3-11"> <h2>Python 3.11</h2> <p>Convert 33 macros to static inline functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyByteArray_AS_STRING()</tt></li> <li><tt class="docutils literal">PyByteArray_GET_SIZE()</tt></li> <li><tt class="docutils literal">PyBytes_AS_STRING()</tt></li> <li><tt class="docutils literal">PyBytes_GET_SIZE()</tt></li> <li><tt class="docutils literal">PyCFunction_GET_CLASS()</tt></li> <li><tt class="docutils literal">PyCFunction_GET_FLAGS()</tt></li> <li><tt class="docutils literal">PyCFunction_GET_FUNCTION()</tt></li> <li><tt class="docutils literal">PyCFunction_GET_SELF()</tt></li> <li><tt class="docutils literal">PyList_GET_SIZE()</tt></li> <li><tt class="docutils literal">PyList_SET_ITEM()</tt></li> <li><tt class="docutils literal">PyTuple_GET_SIZE()</tt></li> <li><tt class="docutils literal">PyTuple_SET_ITEM()</tt></li> <li><tt class="docutils literal">PyUnicode_AS_DATA()</tt></li> <li><tt class="docutils literal">PyUnicode_AS_UNICODE()</tt></li> <li><tt class="docutils literal">PyUnicode_CHECK_INTERNED()</tt></li> <li><tt class="docutils literal">PyUnicode_DATA()</tt></li> <li><tt class="docutils literal">PyUnicode_GET_DATA_SIZE()</tt></li> <li><tt class="docutils literal">PyUnicode_GET_LENGTH()</tt></li> <li><tt class="docutils literal">PyUnicode_GET_SIZE()</tt></li> <li><tt class="docutils literal">PyUnicode_IS_ASCII()</tt></li> <li><tt class="docutils literal">PyUnicode_IS_COMPACT()</tt></li> <li><tt class="docutils literal">PyUnicode_IS_COMPACT_ASCII()</tt></li> <li><tt class="docutils literal">PyUnicode_IS_READY()</tt></li> <li><tt class="docutils literal">PyUnicode_MAX_CHAR_VALUE()</tt></li> <li><tt class="docutils literal">PyUnicode_READ()</tt></li> <li><tt class="docutils literal">PyUnicode_READY()</tt></li> <li><tt class="docutils literal">PyUnicode_READ_CHAR()</tt></li> <li><tt class="docutils literal">PyUnicode_WRITE()</tt></li> <li><tt class="docutils literal">PyWeakref_GET_OBJECT()</tt></li> <li><tt class="docutils literal">Py_SIZE()</tt>: <tt class="docutils literal">Py_SET_SIZE()</tt> must be used to set an object size</li> <li><tt class="docutils literal">Py_TYPE()</tt>: <tt class="docutils literal">Py_SET_TYPE()</tt> must be used to set an object type</li> <li><tt class="docutils literal">_PyUnicode_COMPACT_DATA()</tt></li> <li><tt class="docutils literal">_PyUnicode_NONCOMPACT_DATA()</tt></li> </ul> <p>Convert 2 macros to regular functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyType_SUPPORTS_WEAKREFS()</tt></li> <li><tt class="docutils literal">Py_GETENV()</tt></li> </ul> <p>Remove 11 macros:</p> <ul class="simple"> <li>Moved to the internal C API:<ul> <li><tt class="docutils literal">PyHeapType_GET_MEMBERS()</tt>: renamed to <tt class="docutils literal">_PyHeapType_GET_MEMBERS()</tt></li> <li><tt class="docutils literal">_Py_InIntegralTypeRange()</tt></li> <li><tt class="docutils literal">_Py_IntegralTypeMax()</tt></li> <li><tt class="docutils literal">_Py_IntegralTypeMin()</tt></li> <li><tt class="docutils literal">_Py_IntegralTypeSigned()</tt></li> </ul> </li> <li><tt class="docutils literal">PyFunction_AS_FRAME_CONSTRUCTOR()</tt></li> <li><tt class="docutils literal">Py_FORCE_DOUBLE()</tt></li> <li><tt class="docutils literal">Py_OVERFLOWED()</tt></li> <li><tt class="docutils literal">Py_SET_ERANGE_IF_OVERFLOW()</tt></li> <li><tt class="docutils literal">Py_SET_ERRNO_ON_MATH_ERROR()</tt></li> <li><tt class="docutils literal">_Py_SET_EDOM_FOR_NAN()</tt></li> </ul> <p>Add <tt class="docutils literal">_Py_RVALUE()</tt> to 7 macros to disallow using them as l-value:</p> <ul class="simple"> <li><tt class="docutils literal">_PyGCHead_SET_FINALIZED()</tt></li> <li><tt class="docutils literal">_PyGCHead_SET_NEXT()</tt></li> <li><tt class="docutils literal">asdl_seq_GET()</tt></li> <li><tt class="docutils literal">asdl_seq_GET_UNTYPED()</tt></li> <li><tt class="docutils literal">asdl_seq_LEN()</tt></li> <li><tt class="docutils literal">asdl_seq_SET()</tt></li> <li><tt class="docutils literal">asdl_seq_SET_UNTYPED()</tt></li> </ul> <p>Note: the <tt class="docutils literal">PyCell_SET()</tt> macro was modified to use <tt class="docutils literal">_Py_RVALUE()</tt>, but it already used <tt class="docutils literal">(void)</tt> in Python 3.10.</p> </div> <div class="section" id="python-3-10"> <h2>Python 3.10</h2> <p>Convert 3 macros to regular functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyDescr_IsData()</tt></li> <li><tt class="docutils literal">PyExceptionClass_Name()</tt></li> <li><tt class="docutils literal">PyIter_Check()</tt></li> </ul> <p>Convert 2 macros to static inline functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyObject_TypeCheck()</tt></li> <li><tt class="docutils literal">Py_REFCNT()</tt>: <tt class="docutils literal">Py_SET_REFCNT()</tt> must be used to set an object reference count</li> </ul> <p>Remove 6 macros:</p> <ul class="simple"> <li><tt class="docutils literal">PyAST_Compile()</tt></li> <li><tt class="docutils literal">PyParser_SimpleParseFile()</tt></li> <li><tt class="docutils literal">PyParser_SimpleParseString()</tt></li> <li><tt class="docutils literal">PySTEntry_Check()</tt>: moved to the internal C API</li> <li><tt class="docutils literal">_PyErr_OCCURRED()</tt></li> <li><tt class="docutils literal">_PyList_ITEMS()</tt>: moved to the internal C API</li> </ul> <p>Modify 3 macros to disallow using them as l-values by adding <tt class="docutils literal">(void)</tt> cast:</p> <ul class="simple"> <li><tt class="docutils literal">PyCell_SET()</tt></li> <li><tt class="docutils literal">PyList_SET_ITEM()</tt></li> <li><tt class="docutils literal">PyTuple_SET_ITEM()</tt></li> </ul> </div> <div class="section" id="python-3-9"> <h2>Python 3.9</h2> <p>Convert 6 macros to regular functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyIndex_Check()</tt></li> <li><tt class="docutils literal">PyObject_CheckBuffer()</tt></li> <li><tt class="docutils literal">PyObject_GET_WEAKREFS_LISTPTR()</tt></li> <li><tt class="docutils literal">PyObject_IS_GC()</tt></li> <li><tt class="docutils literal">Py_EnterRecursiveCall()</tt></li> <li><tt class="docutils literal">Py_LeaveRecursiveCall()</tt></li> </ul> <p>Convert 5 macros to static inline functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyType_Check()</tt></li> <li><tt class="docutils literal">PyType_CheckExact()</tt></li> <li><tt class="docutils literal">PyType_HasFeature()</tt></li> <li><tt class="docutils literal">Py_UNICODE_COPY()</tt></li> <li><tt class="docutils literal">Py_UNICODE_FILL()</tt></li> </ul> <p>Convert 3 static inline functions to regular functions:</p> <ul class="simple"> <li><tt class="docutils literal">_Py_Dealloc()</tt></li> <li><tt class="docutils literal">_Py_ForgetReference()</tt></li> <li><tt class="docutils literal">_Py_NewReference()</tt></li> </ul> <p>Remove 18 macros:</p> <ul class="simple"> <li>Moved to the internal C API:<ul> <li><tt class="docutils literal">PyDoc_STRVAR_shared()</tt>:</li> <li><tt class="docutils literal">PyObject_GC_IS_TRACKED()</tt></li> <li><tt class="docutils literal">PyObject_GC_MAY_BE_TRACKED()</tt></li> <li><tt class="docutils literal">Py_AS_GC()</tt></li> <li><tt class="docutils literal">_PyGCHead_FINALIZED()</tt></li> <li><tt class="docutils literal">_PyGCHead_NEXT()</tt></li> <li><tt class="docutils literal">_PyGCHead_PREV()</tt></li> <li><tt class="docutils literal">_PyGCHead_SET_FINALIZED()</tt></li> <li><tt class="docutils literal">_PyGCHead_SET_NEXT()</tt></li> <li><tt class="docutils literal">_PyGCHead_SET_PREV()</tt></li> <li><tt class="docutils literal">_PyGC_SET_FINALIZED()</tt></li> </ul> </li> <li><tt class="docutils literal">Py_UNICODE_MATCH()</tt></li> <li><tt class="docutils literal">_Py_DEC_TPFREES()</tt></li> <li><tt class="docutils literal">_Py_INC_TPALLOCS()</tt></li> <li><tt class="docutils literal">_Py_INC_TPFREES()</tt></li> <li><tt class="docutils literal">_Py_MakeEndRecCheck()</tt></li> <li><tt class="docutils literal">_Py_MakeRecCheck()</tt></li> <li><tt class="docutils literal">_Py_RecursionLimitLowerWaterMark()</tt></li> </ul> </div> <div class="section" id="python-3-8"> <h2>Python 3.8</h2> <p>Convert 9 macros to static inline functions:</p> <ul class="simple"> <li><tt class="docutils literal">Py_DECREF()</tt></li> <li><tt class="docutils literal">Py_INCREF()</tt></li> <li><tt class="docutils literal">Py_XDECREF()</tt></li> <li><tt class="docutils literal">Py_XINCREF()</tt></li> <li><tt class="docutils literal">_PyObject_CallNoArg()</tt></li> <li><tt class="docutils literal">_PyObject_FastCall()</tt></li> <li><tt class="docutils literal">_Py_Dealloc()</tt></li> <li><tt class="docutils literal">_Py_ForgetReference()</tt></li> <li><tt class="docutils literal">_Py_NewReference()</tt></li> </ul> <p>Remove 7 macros:</p> <ul class="simple"> <li><tt class="docutils literal">_PyGCHead_DECREF()</tt></li> <li><tt class="docutils literal">_PyGCHead_REFS()</tt></li> <li><tt class="docutils literal">_PyGCHead_SET_REFS()</tt></li> <li><tt class="docutils literal">_PyGC_REFS()</tt></li> <li><tt class="docutils literal">_PyObject_GC_TRACK()</tt>: moved to the internal C API</li> <li><tt class="docutils literal">_PyObject_GC_UNTRACK()</tt>: moved to the internal C API</li> <li><tt class="docutils literal">_Py_CHECK_REFCNT()</tt></li> </ul> </div> Debug a Python reference leak2022-11-04T13:00:00+01:002022-11-04T13:00:00+01:00Victor Stinnertag:vstinner.github.io,2022-11-04:/debug-python-refleak.html<a class="reference external image-reference" href="https://twitter.com/djamilaknopf/status/1587441869403099136"> <img alt="Childhood memories in the countryside" src="https://vstinner.github.io/images/refleak.jpg" /> </a> <p>This morning, I got <a class="reference external" href="https://mail.python.org/archives/list/buildbot-status&#64;python.org/message/MU2EJRTFF4ZCYTDXYER7KCL3IQUM5F3T/">this email</a> from the buildbot-status mailing list:</p> <blockquote> The Buildbot has detected a new failure on builder PPC64LE Fedora Rawhide <strong>Refleaks</strong> 3.x while building Python.</blockquote> <p>I get many of buildbot failures per month (by email), but I like to debug reference leaks: they are more challenging …</p><a class="reference external image-reference" href="https://twitter.com/djamilaknopf/status/1587441869403099136"> <img alt="Childhood memories in the countryside" src="https://vstinner.github.io/images/refleak.jpg" /> </a> <p>This morning, I got <a class="reference external" href="https://mail.python.org/archives/list/buildbot-status&#64;python.org/message/MU2EJRTFF4ZCYTDXYER7KCL3IQUM5F3T/">this email</a> from the buildbot-status mailing list:</p> <blockquote> The Buildbot has detected a new failure on builder PPC64LE Fedora Rawhide <strong>Refleaks</strong> 3.x while building Python.</blockquote> <p>I get many of buildbot failures per month (by email), but I like to debug reference leaks: they are more challenging :-) I decided to write this article to document and explain my work on maintaining Python (buildbots).</p> <p>I truncated most the output of most commands in this article to make it easier to read.</p> <p>Drawing: <a class="reference external" href="https://twitter.com/djamilaknopf/status/1587441869403099136">Childhood memories in the countryside</a> by <a class="reference external" href="https://twitter.com/djamilaknopf/">Djamila Knopf</a>.</p> <div class="section" id="reproduce-the-bug"> <h2>Reproduce the bug</h2> <p>I look into <a class="reference external" href="https://buildbot.python.org/all/#builders/300/builds/548">buildbot logs</a>:</p> <pre class="literal-block"> test_int leaked [1, 1, 1] references, sum=3 </pre> <p>Aha, interesting: the <tt class="docutils literal">test_int</tt> test leaks Python strong references, each test iteration leaks exactly one reference. Well, in short, it leaks memory.</p> <p>I build Python to check if the refleak is still there:</p> <pre class="literal-block"> git switch main make clean ./configure --with-pydebug make </pre> <p>The main branch is currently at this commit:</p> <pre class="literal-block"> $ git show main commit 2844aa6a8eb1d486b5c432f0ed33a2082998f41e (...) </pre> <p>I run the test with <tt class="docutils literal"><span class="pre">-R</span> 3:3</tt> to check for reference leaks:</p> <pre class="literal-block"> $ ./python -m test -R 3:3 test_int (...) test_int leaked [1, 1, 1] references, sum=3 (...) Total duration: 4.8 sec </pre> <p>Great! It's still there, it's real regression. I told you, I love this kind of bugs :-)</p> </div> <div class="section" id="identify-which-test-leaks-test-bisect-cmd"> <h2>Identify which test leaks (test.bisect_cmd)</h2> <pre class="literal-block"> $ ./python -m test test_int --list-cases|wc -l 42 $ wc -l Lib/test/test_int.py 885 Lib/test/test_int.py </pre> <p><tt class="docutils literal">test_int</tt> has only 42 methods and takes 4.8 seconds to run (with <tt class="docutils literal"><span class="pre">-R</span> 3:3</tt>). That's small, but the file is made of 885 lines of Python code. I'm lazy, I don't want to read so many lines. I will use <tt class="docutils literal">python <span class="pre">-m</span> test.bisect_cmd</tt> to identify which test method leaks so I have less test code to read and reproducing the test will be even faster.</p> <p>I run <tt class="docutils literal">python <span class="pre">-m</span> test.bisect_cmd</tt>:</p> <pre class="literal-block"> $ ./python -m test.bisect_cmd -R 3:3 test_int (...) [+] Iteration 17: run 1 tests/2 (...) test_int leaked [1, 1, 1] references, sum=3 (...) * test.test_int.PyLongModuleTests.test_pylong_misbehavior_error_path_from_str </pre> <p>I love watching this tool doing my job, I don't have anything to do! :-)</p> <p>I confirm that the <tt class="docutils literal">test_pylong_misbehavior_error_path_from_str()</tt> test leaks:</p> <pre class="literal-block"> $ ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str test_int leaked [1, 1, 1] references, sum=3 Total duration: 445 ms </pre> <p>The <tt class="docutils literal">test_pylong_misbehavior_error_path_from_str()</tt> method is only 17 lines of code, it's way better than 885 lines of code (52x less code to read). And reproducing the bug now only takes 445 ms instead of 4.8 seconds (10x faster).</p> <p>At this point, there is the brave method of looking into the C code: Python is made of 500 000 lines of C code. Good luck! Or maybe there is another way?</p> </div> <div class="section" id="git-bisection"> <h2>Git bisection</h2> <p>Again, I'm lazy. I always begin with the &quot;divide to conquer&quot; method. A Git bisection is an efficient method for that.</p> <p>I start <tt class="docutils literal">git bisect</tt>:</p> <pre class="literal-block"> git bisect reset git bisect start --term-bad=leak --term-good=noleak git bisect leak # we just saw that current commit leaks </pre> <p>Defining &quot;good&quot; and &quot;bad&quot; terms helps me a lot to prevent mistakes: it's a nice Git bisect feature! In the past, I always picked the wrong one at some point which messed up the whole bisection.</p> <p>Ok, now how can I know when the leak was introduced? Well, I like to move in the past step by step: one day, two days, one week, one month, one year, etc.</p> <p>I pick a random commit merged yesterday:</p> <pre class="literal-block"> $ date Fri Nov 4 11:55:12 CET 2022 $ git log (...) commit 016c7d37b6acfe2203542a2655080c6402b3be1f Date: Thu Nov 3 23:21:01 2022 +0000 (...) commit 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead Date: Thu Nov 3 16:18:38 2022 -0700 (...) </pre> <p>I'm not lucky at my first bet, the code already leaked yesterday:</p> <pre class="literal-block"> $ git checkout 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead^C $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str test_int leaked [1, 1, 1] references, sum=3 </pre> <p>I repeat the process, I pick a random commit the day before:</p> <pre class="literal-block"> $ git log (...) commit f3007ac3702ea22c7dd0abf8692b1504ea3c9f63 Author: Victor Stinner &lt;vstinner&#64;python.org&gt; Date: Wed Nov 2 20:45:58 2022 +0100 (...) </pre> <p>For my greatest pleasure, I pick a commit made by myself. Maybe I'm lucky and I'm the one who introduced the leak :-D</p> <pre class="literal-block"> $ git checkout f3007ac3702ea22c7dd0abf8692b1504ea3c9f63 $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str (...) Tests result: NO TESTS RAN </pre> <p>&quot;NO TESTS RAN&quot; means that the test doesn't exist. Oh wait, the test didn't exist 2 days ago? So the test itself is new? Well, no tests ran also means... &quot;no leak&quot;.</p> <p>I will make the assumption that &quot;NO TESTS RAN&quot; means &quot;no leak&quot; and see what's going on:</p> <pre class="literal-block"> $ git bisect noleak Bisecting: 13 revisions left to test after this (roughly 4 steps) $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str Tests result: NO TESTS RAN $ git bisect noleak Bisecting: 6 revisions left to test after this (roughly 3 steps) $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str Tests result: NO TESTS RAN $ git bisect noleak Bisecting: 3 revisions left to test after this (roughly 2 steps) $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str Tests result: NO TESTS RAN $ git bisect noleak Bisecting: 1 revision left to test after this (roughly 1 step) $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str test_int leaked [1, 1, 1] references, sum=3 $ git bisect leak Bisecting: 0 revisions left to test after this (roughly 0 steps) $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str test_int leaked [1, 1, 1] references, sum=3 vstinner&#64;mona$ git bisect leak 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead is the first leak commit commit 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead Author: Gregory P. Smith &lt;greg&#64;krypto.org&gt; Date: Thu Nov 3 16:18:38 2022 -0700 gh-90716: bugfixes and more tests for _pylong. (#99073) * Properly decref on _pylong import error. * Improve the error message on _pylong TypeError. * Fix the assertion error in pydebug builds to be a TypeError. * Tie the return value comments together. These are minor followups to issues not caught among the reviewers on https://github.com/python/cpython/pull/96673. Lib/test/test_int.py | 39 +++++++++++++++++++++++++++++++++++++++ Objects/longobject.c | 15 +++++++++++---- 2 files changed, 50 insertions(+), 4 deletions(-) </pre> <p>In total, it took 7 <tt class="docutils literal">git bisect</tt> steps to identify a single commit. That's quick! I also love this tool, I feel that it does my job!</p> <p>Sometimes, I mess up with Git bisection. Here, <a class="reference external" href="https://github.com/python/cpython/commit/4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead">the guilty commit</a> seems like a good candidate since it changes <tt class="docutils literal">Objects/longobject.c</tt> which is C code, so it can likely introduce a leak. Moreover, this C file is the implementation of the Python <tt class="docutils literal">int</tt> type, so it is directly related to <tt class="docutils literal">test_int</tt> (the test suite of the <tt class="docutils literal">int</tt> type).</p> <p>Just in case, I test manually the the leak before/after:</p> <pre class="literal-block"> # after $ git checkout 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str test_int leaked [1, 1, 1] references, sum=3 # before $ git checkout 4c4b5ce2e529a1279cd287e2d2d73ffcb6cf2ead^ $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str Tests result: NO TESTS RAN </pre> <p>Ok, there is no doubt anymore: the commit introduced the leak. But since the commit also adds the leaking test, maybe the leak already existed, and it's just that nobody noticed the leak before.</p> </div> <div class="section" id="debug-the-leak"> <h2>Debug the leak</h2> <p>Since I identified the commit introducing the leak, I only have to review code changes by this single commit. But to debug the code, I prefer to come back to the main branch. To prepare a fix, I will have to start from the main branch anyway.</p> <p>Go back to the main branch:</p> <pre class="literal-block"> $ git bisect reset $ git switch main </pre> <p>The second command is useless, I was already at the main branch. I did some many mistakes with Git in the past, that I took the habit of doing things very carefully. I don't care of doing things twice, just in case. It's cheaper than messing with the Git god! Trust me.</p> <p>Just in case, I double check that the leak is still there in the main branch:</p> <pre class="literal-block"> $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str test_int leaked [1, 1, 1] references, sum=3 </pre> <p>Ok, we are good to start debugging. Let me open Lib/test/test_int.py and look for the test_pylong_misbehavior_error_path_from_str() method:</p> <pre class="literal-block"> &#64;support.cpython_only # tests implementation details of CPython. &#64;unittest.skipUnless(_pylong, &quot;_pylong module required&quot;) &#64;mock.patch.object(_pylong, &quot;int_from_string&quot;) def test_pylong_misbehavior_error_path_from_str( self, mock_int_from_str): big_value = '7'*19_999 with support.adjust_int_max_str_digits(20_000): mock_int_from_str.return_value = b'not an int' with self.assertRaises(TypeError) as ctx: int(big_value) self.assertIn('_pylong.int_from_string did not', str(ctx.exception)) mock_int_from_str.side_effect = RuntimeError(&quot;test123&quot;) with self.assertRaises(RuntimeError): int(big_value) </pre> <p>Always divide to conquer: let me try to make the code as short as possible (7 lines), I also make the &quot;big_value&quot; smaller:</p> <pre class="literal-block"> &#64;mock.patch.object(_pylong, &quot;int_from_string&quot;) def test_pylong_misbehavior_error_path_from_str(self, mock_int_from_str): big_value = '7' * 9999 with support.adjust_int_max_str_digits(10_000): mock_int_from_str.return_value = b'not an int' with self.assertRaises(TypeError) as ctx: int(big_value) </pre> <p>Ok, so the test is about converting a long string (9999 decimal digits) to an integer using the new <tt class="docutils literal">_pylong</tt> module which is implemented in pure Python (<tt class="docutils literal">Lib/_pylong.py</tt>) and called from C code (<tt class="docutils literal">Objects/longobject.c</tt>). Well, I followed recent developments, so I don't have to dig into the C code to know that. It helps!</p> <p>If I search for <tt class="docutils literal">_pylong</tt> in <tt class="docutils literal">Objects/longobject.c</tt>, I find this interesting function:</p> <pre class="literal-block"> /* asymptotically faster str-to-long conversion for base 10, using _pylong.py */ static int pylong_int_from_string(const char *start, const char *end, PyLongObject **res) { PyObject *mod = PyImport_ImportModule(&quot;_pylong&quot;); ... } </pre> <p>With a quick look, I don't see any obvious reference leak in this code. I add <tt class="docutils literal">printf()</tt> to make sure that I'm looking at the right function:</p> <pre class="literal-block"> static int pylong_int_from_string(const char *start, const char *end, PyLongObject **res) { ... PyObject *s = PyUnicode_FromStringAndSize(start, end-start); if (s == NULL) { Py_DECREF(mod); goto error; } printf(&quot;pylong_int_from_string()\n&quot;); PyObject *result = PyObject_CallMethod(mod, &quot;int_from_string&quot;, &quot;O&quot;, s); ... } </pre> <p>I added the print before the int_from_string() call, since this function is overriden by the test.</p> <p>I build Python and run the test:</p> <pre class="literal-block"> $ make $ ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str (...) beginning 6 repetitions 123456 pylong_int_from_string() .pylong_int_from_string() .pylong_int_from_string() .pylong_int_from_string() .pylong_int_from_string() .pylong_int_from_string() (...) </pre> <p>Ok, I'm looking at the right place. The print happens when the test runs. So which code path is taken? Let me add print calls <em>after</em> the function call:</p> <pre class="literal-block"> static int pylong_int_from_string(const char *start, const char *end, PyLongObject **res) { ... PyObject *result = PyObject_CallMethod(mod, &quot;int_from_string&quot;, &quot;O&quot;, s); Py_DECREF(s); Py_DECREF(mod); if (result == NULL) { printf(&quot;pylong_int_from_string() error\n&quot;); // &lt;====== ADD goto error; } if (!PyLong_Check(result)) { printf(&quot;pylong_int_from_string() wrong type\n&quot;); // &lt;====== ADD PyErr_SetString(PyExc_TypeError, &quot;_pylong.int_from_string did not return an int&quot;); goto error; } printf(&quot;pylong_int_from_string() ok\n&quot;); // &lt;====== ADD ... } </pre> <p>Test output:</p> <pre class="literal-block"> ... pylong_int_from_string() wrong type .pylong_int_from_string() wrong type .pylong_int_from_string() wrong type ... </pre> <p>Aha, the bug should be around the <tt class="docutils literal">if (!PyLong_Check(result))</tt> code path. Oh wait... <tt class="docutils literal">result</tt> is a Python object, and in this code path, the function exits without returning <tt class="docutils literal">result</tt> to the caller, nor removing the reference to <tt class="docutils literal">result</tt>. That's our leak!</p> </div> <div class="section" id="write-a-fix"> <h2>Write a fix</h2> <p>To write a fix, I start by reverting all local changes (remove debug traces, restore the original test code):</p> <pre class="literal-block"> $ git checkout . </pre> <p>I write a fix:</p> <pre class="literal-block"> $ git diff diff --git a/Objects/longobject.c b/Objects/longobject.c index a872938990..652fdb7974 100644 --- a/Objects/longobject.c +++ b/Objects/longobject.c &#64;&#64; -2376,6 +2376,7 &#64;&#64; pylong_int_from_string(const char *start, const char *end, PyLongObject **res) goto error; } if (!PyLong_Check(result)) { + Py_DECREF(result); PyErr_SetString(PyExc_TypeError, &quot;_pylong.int_from_string did not return an int&quot;); goto error; </pre> <p>I build and test my fix:</p> <pre class="literal-block"> $ make &amp;&amp; ./python -m test -R 3:3 test_int -m test_pylong_misbehavior_error_path_from_str (...) Tests result: SUCCESS </pre> <p>Ok, the leak is fixed! So it was a just a missing <tt class="docutils literal">Py_DECREF()</tt> in code recently added to Python. It's a common mistake. By the way, when I looked at the code the first code, I also missed this &quot;obvious&quot; leak.</p> <p>I prepare a PR:</p> <pre class="literal-block"> $ git switch -c int_str $ git commit -a # Commit message: # gh-90716: Fix pylong_int_from_string() refleak </pre> <p>Let me validate my work from the new clean commit:</p> <pre class="literal-block"> $ make &amp;&amp; ./python -m test -R 3:3 test_int (...) Tests result: SUCCESS </pre> <p>I complete the commit message using <tt class="docutils literal">git commit <span class="pre">--amend</span></tt>:</p> <pre class="literal-block"> gh-90716: Fix pylong_int_from_string() refleak Fix validated by: $ ./python -m test -R 3:3 test_int Tests result: SUCCESS </pre> <p>I run <tt class="docutils literal">gh_pr.sh</tt> (my short shell script) to create a PR from the command line.</p> <p>I add the <tt class="docutils literal">skip news</tt> label on the PR, since this refleak is not part of any Python release, no user is impacted. It's not worth documenting it. I don't think that the change is part of Python 3.12 alpha 1. Moreover, only very few users test alpha 1 releases.</p> <p>Here it is, my shiny PR fixing the leak! <a class="reference external" href="https://github.com/python/cpython/pull/99094">https://github.com/python/cpython/pull/99094</a></p> <p>Since Gregory worked on longobject.c recently, I add him in copy of my PR. I just add the comment <tt class="docutils literal">cc &#64;gpshead</tt> to my PR.</p> <p>I don't plan to wait for this review. The change is just one line, I'm confident that it does fix the issue, I don't need a review.</p> <p>To finish, I <a class="reference external" href="https://mail.python.org/archives/list/buildbot-status&#64;python.org/message/J3MC7FIPFN6GNQAWQQRHE4EDLE7J2MIQ/">reply by email to the buildbot-status failure email</a>.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>In total, it took me between one and two hours to reproduce, debug and fix this reference leak.</p> <p>In the meanwhile, I also looked into other Python stuffs (and I discussed with friends!), while the bisection was running, or during the Python build. It's hard to estimate exactly how much time it takes me to fix a refleak.</p> <p>I consider that I'm efficient on fixing such leak since I'm following the Python development: I was already aware of the on-going <tt class="docutils literal">_pylong</tt> work. I also fixed many refleaks in the past.</p> <p>By the way, I wrote the <tt class="docutils literal">python <span class="pre">-m</span> test.bisect_cmd</tt> tool exactly to accelerate my work on debugging reference leaks. I'm now also used to Git bisection.</p> <p>For me, <strong>the key of my whole methodology is to &quot;divide to conquer&quot;</strong>:</p> <ul class="simple"> <li>Reproduce the issue</li> <li>Get a reproducer</li> <li>Make the reproducer as fast as possible and as short as possible</li> <li>Use Git bisection to identify the change introducing the change</li> <li>Add print calls to identify which parts of the code and the test are taken by the issue</li> </ul> <p>Oh by the way, while I finished my article, my PR got reviewed and I merged it: <a class="reference external" href="https://github.com/python/cpython/commit/387f72588d538bc56669f0f28cc41df854fc5b43">my commit fixing the leak</a>!</p> </div> Python C API: Add functions to access PyObject2021-10-05T14:00:00+02:002021-10-05T14:00:00+02:00Victor Stinnertag:vstinner.github.io,2021-10-05:/c-api-abstract-pyobject.html<a class="reference external image-reference" href="https://twitter.com/Kekeflipnote/status/1433139994516934663"> <img alt="A spider in my bedroom" src="https://vstinner.github.io/images/spider.png" /> </a> <p>The PyObject structure prevents indirectly to optimize CPython. We will see why and how I prepared the C API to make this structure opaque. It took me 1 year and a half to add functions and to introduce <strong>incompatible C API changes</strong> (fear!).</p> <p>In February 2020, I started by adding …</p><a class="reference external image-reference" href="https://twitter.com/Kekeflipnote/status/1433139994516934663"> <img alt="A spider in my bedroom" src="https://vstinner.github.io/images/spider.png" /> </a> <p>The PyObject structure prevents indirectly to optimize CPython. We will see why and how I prepared the C API to make this structure opaque. It took me 1 year and a half to add functions and to introduce <strong>incompatible C API changes</strong> (fear!).</p> <p>In February 2020, I started by adding functions like <tt class="docutils literal">Py_SET_TYPE()</tt> to abstract accesses to the <tt class="docutils literal">PyObject</tt> structure. I modified C extensions of the standard library to use functions like <tt class="docutils literal">Py_TYPE()</tt> and <tt class="docutils literal">Py_SET_TYPE()</tt>.</p> <p>I converted the <tt class="docutils literal">Py_TYPE()</tt> macro to a static inline function, but my change was reverted twice. I had to fix many C extensions and fix a test_exceptions crash on Windows to be able to finally merge my change in September 2021.</p> <p>Finally, we will also see what can be done next to be able to fully make the PyObject structure opaque.</p> <p>Thanks to <strong>Dong-hee Na</strong>, <strong>Hai Shi</strong> and <strong>Andy Lester</strong> who helped me to make these changes, and thanks to <strong>Miro Hrončok</strong> who reported C extensions broken by my incompatible C API changes.</p> <p>This article is a follow-up of the <a class="reference external" href="https://vstinner.github.io/c-api-opaque-structures.html">Make structures opaque in the Python C API</a> article.</p> <p><em>Drawing: &quot;A spider in my bedroom&quot; by Kéké</em></p> <div class="section" id="the-c-api-prevents-to-optimize-cpython"> <h2>The C API prevents to optimize CPython</h2> <p>The C API allows to access directly to structure members by deferencing an <tt class="docutils literal">PyObject*</tt> pointer. Example getting directly the reference count of an object:</p> <pre class="literal-block"> Py_ssize_t get_refcnt(PyObject *obj) { return obj-&gt;ob_refcnt; } </pre> <p>This ability to access directly structure members prevents optimizing CPython.</p> <div class="section" id="mandatory-inefficient-boxing-unboxing"> <h3>Mandatory inefficient boxing/unboxing</h3> <p>The ability to dereference a <tt class="docutils literal">PyObject*</tt> pointer prevents optimizations which avoid inefficient boxing/unboxing, like tagged pointers or list strategies.</p> </div> <div class="section" id="no-tagged-pointer"> <h3>No tagged pointer</h3> <p>Tagged pointers require adding code to all functions which currently dereference object pointers. The current C API prevents doing that in C extensions, since pointers can be dereferenced directly.</p> </div> <div class="section" id="no-list-strategies"> <h3>No list strategies</h3> <p>Since all Python object structures must start with a <tt class="docutils literal">PyObject ob_base;</tt> member, it is not possible to make other structures opaque before PyObject is made opaque. It prevents implementing PyPy list strategies to reduce the memory footprint, like storing an array of numbers directly as numbers, not as boxed numbers (<tt class="docutils literal">PyLongObject</tt> objects).</p> <p>Currently, the <tt class="docutils literal">PyListObject</tt> structure cannot be made opaque. If <tt class="docutils literal">PyListObject</tt> could be made opaque, it would be possible to store an array of numbers directly as numbers, and to box objects in <tt class="docutils literal">PyList_GetItem()</tt> on demand.</p> </div> <div class="section" id="no-moving-garbage-collector"> <h3>No moving garbage collector</h3> <p>Being able to dereference a <tt class="docutils literal">PyObject**</tt> pointer also prevents to move objects in memory. A moving garbage collector can compact memory to reduce the fragmentation. Currently, it cannot be implemented in CPython.</p> </div> <div class="section" id="cannot-allocate-temporarily-objects-on-the-stack"> <h3>Cannot allocate temporarily objects on the stack</h3> <p>In CPython, all objects must be allocated on the heap. If an object is allocated on the stack, stored in a list and the list is still accessible after the function completes: the stack memory is no longer valid, and so the list is corrupted at the function exit.</p> <p>If objects would only be referenced by opaque handles, as the HPy project does, it would be possible to copy the object from the stack to the heap memory, when the object is added to the list.</p> </div> <div class="section" id="reference-counting-doesn-t-scale"> <h3>Reference counting doesn't scale</h3> <p>The <tt class="docutils literal">PyObject</tt> structure has a reference count (<tt class="docutils literal">ob_refcnt</tt> member), whereas reference counting is a performance bottleneck when using the same objects from multiple threads running in parallel. Quickly, there is a race for the memory cacheline which contains the <tt class="docutils literal">PyObject.ob_refcnt</tt> counter. It is especially true for the most commonly used Python objects like None and True singletons. All CPUs want to read or modify it in parallel.</p> <p>This problem killed the Gilectomy project which attempted to remove the GIL from CPython.</p> <p>A <a class="reference external" href="https://en.wikipedia.org/wiki/Tracing_garbage_collection">tracing garbage collector</a> doesn't need reference counting, but it cannot be implemented currently because of the <tt class="docutils literal">PyObject</tt> structure.</p> </div> </div> <div class="section" id="creation-of-the-issue-feb-2020"> <h2>Creation of the issue (Feb 2020)</h2> <p>In February 2020, I created the <a class="reference external" href="https://bugs.python.org/issue39573">bpo-39573</a> : &quot;[C API] Make PyObject an opaque structure in the limited C API&quot;. It is related to my work on the my <a class="reference external" href="https://www.python.org/dev/peps/pep-0620/">PEP 620 (Hide implementation details from the C API)</a>.</p> <p>My initial plan was to make the PyObject structure fully opaque in the C API.</p> </div> <div class="section" id="add-functions"> <h2>Add functions</h2> <p>In Python 3.8, <tt class="docutils literal">Py_REFCNT()</tt> and <tt class="docutils literal">Py_TYPE()</tt> macros can be used to set directly an object reference count or an object type:</p> <pre class="literal-block"> Py_REFCNT(obj) = new_refcnt; Py_TYPE(obj) = new_type; </pre> <p>Such syntax requires a direct access to <tt class="docutils literal">PyObject.ob_refcnt</tt> and <tt class="docutils literal">PyObject.ob_type</tt> members as l-value.</p> <p>In Python 3.9, I added Py_SET_REFCNT() and Py_SET_TYPE() functions to add an abstraction to <tt class="docutils literal">PyObject</tt> members, and I added <tt class="docutils literal">Py_SET_SIZE()</tt> to add an abstraction to the <tt class="docutils literal">PyVarObject.ob_size</tt> member.</p> <p>In Python 3.9, I also added <tt class="docutils literal">Py_IS_TYPE(obj, type,)</tt> helper function to test an object type. It is equivalent to: <tt class="docutils literal">Py_TYPE(obj) == type</tt>.</p> </div> <div class="section" id="use-py-type-and-py-set-size-in-the-stdlib"> <h2>Use Py_TYPE() and Py_SET_SIZE() in the stdlib</h2> <p>I modified the standard library (C extensions) to no longer access directly <tt class="docutils literal">PyObject</tt> and <tt class="docutils literal">PyVarObject</tt> members directly:</p> <ul class="simple"> <li>Replace <tt class="docutils literal"><span class="pre">&quot;obj-&gt;ob_refcnt&quot;</span></tt> with <tt class="docutils literal">Py_REFCNT(obj)</tt></li> <li>Replace <tt class="docutils literal"><span class="pre">&quot;obj-&gt;ob_type&quot;</span></tt> with <tt class="docutils literal">Py_TYPE(obj)</tt></li> <li>Replace <tt class="docutils literal"><span class="pre">&quot;obj-&gt;ob_size&quot;</span></tt> with <tt class="docutils literal">Py_SIZE(obj)</tt></li> <li>Replace <tt class="docutils literal">&quot;Py_REFCNT(obj) = new_refcnt&quot;</tt> with <tt class="docutils literal">Py_SET_REFCNT(obj, new_refcnt)</tt></li> <li>Replace <tt class="docutils literal">&quot;Py_TYPE(obj) = new_type&quot;</tt> with <tt class="docutils literal">Py_SET_TYPE(obj, new_type)</tt></li> <li>Replace <tt class="docutils literal">&quot;Py_SIZE(obj) = new_size&quot;</tt> with <tt class="docutils literal">Py_SET_SIZE(obj, new_size)</tt></li> <li>Replace <tt class="docutils literal">&quot;Py_TYPE(obj) == type&quot;</tt> test with <tt class="docutils literal">Py_IS_TYPE(obj, type)</tt></li> </ul> </div> <div class="section" id="enforce-py-set-type"> <h2>Enforce Py_SET_TYPE()</h2> <p>In Python 3.10, I converted Py_REFCNT(), Py_TYPE() and Py_SIZE() macros to static inline functions, so <tt class="docutils literal">Py_TYPE(obj) = new_type</tt> becomes a compiler error.</p> <p>Static inline functions still access directly <tt class="docutils literal">PyObject</tt> and <tt class="docutils literal">PyVarObject</tt> members at the ABI level, and so don't solve the initial goal: &quot;make the PyObject structure opaque&quot;. Not accessing members at the ABI level can have a negative impact on performance and I prefer to address it later. I already get enough backfire with the other C API changes that I made :-)</p> </div> <div class="section" id="broken-c-extensions-first-revert"> <h2>Broken C extensions (first revert)</h2> <p>Converting Py_TYPE() and Py_SIZE() macros to static inline functions broke 16 C extensions:</p> <ul class="simple"> <li><strong>Cython</strong></li> <li>PyPAM</li> <li>bitarray</li> <li>boost</li> <li>breezy</li> <li>duplicity</li> <li>gobject-introspection</li> <li>immutables</li> <li>mercurial</li> <li><strong>numpy</strong></li> <li>pybluez</li> <li>pycurl</li> <li>pygobject3</li> <li>pylibacl</li> <li>pyside2</li> <li>rdiff-backup</li> </ul> <p>In November 2020, during the Python 3.10 devcycle, I preferred to revert Py_TYPE() and Py_SIZE() changes.</p> <p>I kept the Py_REFCNT() change since it only broke a single C extension (PySide2) and it was simple to update it to Py_SET_REFCNT().</p> </div> <div class="section" id="pythoncapi-compat"> <h2>pythoncapi_compat</h2> <p>I created the <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat</a> project to provide the following functions to Python 3.8 and older:</p> <ul class="simple"> <li><tt class="docutils literal">Py_SET_REFCNT()</tt></li> <li><tt class="docutils literal">Py_SET_TYPE()</tt></li> <li><tt class="docutils literal">Py_SET_SIZE()</tt></li> <li><tt class="docutils literal">Py_IS_TYPE()</tt></li> </ul> <p>I also wrote a upgrade_pythoncapi.py script to upgrade C extensions to use these functions, without losing support for Python 3.8 and older.</p> <p>Using the pythoncapi_compat project, I succeeded to update multiple C extensions to prepare them for Py_TYPE() becoming a static inline function.</p> </div> <div class="section" id="test-exceptions-crash-second-revert"> <h2>test_exceptions crash (second revert)</h2> <p>In June 2021, during the Python 3.11 devcycle, I changed again Py_TYPE() and Py_SIZE() since <a class="reference external" href="https://bugs.python.org/issue39573#msg401378">most C extensions have been fixed in the meanwhile</a>.</p> <p>Problem: <tt class="docutils literal">test_recursion_in_except_handler()</tt> of <tt class="docutils literal">test_exceptions</tt> started to crash on a Python debug build on Windows: see <a class="reference external" href="https://bugs.python.org/issue44348">bpo-44348</a>.</p> <p>Since nobody understood the issue, it was decided to revert my change again to repair buildbots.</p> </div> <div class="section" id="fix-baseexception-deallocator"> <h2>Fix BaseException deallocator</h2> <p>In September 2021, I looked at the test_exceptions crash. In a <strong>debug build</strong>, the MSC compiler <strong>doesn't inline</strong> calls to static inline functions. Because of that, converting Py_TYPE() macro to a static inline functions <strong>increases the stack memory usage</strong> on a Python debug build on Windows.</p> <p>I proposed to enable compiler optimizations when building Python in debug mode on Windows, to inline calls to static inline functions like Py_TYPE(). This idea was rejected, since the debug build must remain fully usable in a debugger.</p> <p>I looked again at the crash and found the root issue. test_recursion_in_except_handler() creates chained of exceptions. When an exception is deallocated, it calls the deallocator of another exception, etc.</p> <ul class="simple"> <li>recurse_in_except() sub-test creates chains of 11 nested deallocator calls</li> <li>recurse_in_body_and_except() sub-test creates a chain of <strong>8192 nested deallocator calls</strong></li> </ul> <p>I proposed a change to use the <strong>trashcan mechanism</strong>. It limits the call stack to 50 function calls. I checked with a benchmark that the performance overhead is acceptable. My change fixed the test_exceptions crash!</p> </div> <div class="section" id="close-the-pyobject-issue"> <h2>Close the PyObject issue</h2> <p>Since most C extensions have been fixed and test_exceptions is fixed, I was able to change Py_TYPE() and Py_SIZE() for the third time. My final commit: <a class="reference external" href="https://github.com/python/cpython/commit/cb15afcccffc6c42cbfb7456ce8db89cd2f77512">Py_TYPE becomes a static inline function</a>.</p> <p>I changed the issue topic to restrict it to adding functions to access PyObject members. Previously, the goal was to make the PyObject structure opaque. It took 1 year and a half to add made all these changes.</p> </div> <div class="section" id="what-s-next-to-make-pyobject-opaque"> <h2>What's Next to Make PyObject opaque?</h2> <p>The <tt class="docutils literal">PyObject</tt> structure is used to define structurres of all Python types, like <tt class="docutils literal">PyListObject</tt>. All structures start with <tt class="docutils literal">PyObject ob_base;</tt> and so the compiler must have access to the <tt class="docutils literal">PyObject</tt> structure.</p> <p>Moreover, <tt class="docutils literal">PyType_FromSpec()</tt> and <tt class="docutils literal">PyType_Spec</tt> API use indirectly <tt class="docutils literal">sizeof(PyObject)</tt> in the <tt class="docutils literal">PyType_Spec.basicsize</tt> member when defining a type.</p> <p>One option to make the <tt class="docutils literal">PyObject</tt> structure opaque would be to modify the <tt class="docutils literal">PyObject</tt> structure to make it empty, and move its members into a new private <tt class="docutils literal">_PyObject</tt> structure. This <tt class="docutils literal">_PyObject</tt> structure would be allocated before the <tt class="docutils literal">PyObject*</tt> pointer, same idea as the current <tt class="docutils literal">PyGC_Head</tt> header which is also allocated before the <tt class="docutils literal">PyObject*</tt> pointer.</p> <p>These changes are more complex than what I expected and so I prefer to open a new issue later to propose these changes. Also, the performance of these changes must be checked with benchmarks, to ensure that there is no performance overhead or that the overhead is acceptable.</p> </div> C API changes between Python 3.5 to 3.102021-10-04T15:00:00+02:002021-10-04T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2021-10-04:/c-api-python3_10-changes.html<img alt="Homer Simpson hiding" src="https://vstinner.github.io/images/homer_hiding.webp" /> <p>I'm trying to enhance and to fix the Python C API for 5 years. My first goal was to shrink the C API without breaking third party C extensions. I hid many private functions from the public functions: I moved them to the &quot;internal C API&quot;. I also deprecated and …</p><img alt="Homer Simpson hiding" src="https://vstinner.github.io/images/homer_hiding.webp" /> <p>I'm trying to enhance and to fix the Python C API for 5 years. My first goal was to shrink the C API without breaking third party C extensions. I hid many private functions from the public functions: I moved them to the &quot;internal C API&quot;. I also deprecated and removed many functions.</p> <p>Between Python 3.5 and 3.10, 80 symbols have been removed. Python 3.10 is the first Python version exporting less symbols than its previous version!</p> <p>Since Python 3.8, the C API is organized as 3 parts:</p> <ol class="arabic simple"> <li><tt class="docutils literal">Include/</tt> directory: Limited API</li> <li><tt class="docutils literal">Include/cpython/</tt> directory: CPython implementation details</li> <li><tt class="docutils literal">Include/internal/</tt> directory: The internal API</li> </ol> <p>The devguide <a class="reference external" href="https://devguide.python.org/c-api/">Changing Python’s C API</a> documentation now gives guidelines for C API additions, like avoiding borrowed references.</p> <p>The limited C API got a few more functions, whereas broken and private functions have been removed. The Stable ABI is now explicitly defined and documented in the <a class="reference external" href="https://docs.python.org/dev/c-api/stable.html#stable">C API Stability</a> page.</p> <p>This article lists all C API changes, not only the ones done by me.</p> <div class="section" id="shrink-the-the-c-api"> <h2>Shrink the the C API</h2> <p>Between Python 3.5 and 3.10, 80 symbols (functions or variables) have been removed, 3 structures have been removed, and 21 functions have been deprecated. In meanwhile, other symbols have been added to implement new Python features at each Python version.</p> <p>Python 3.10 is the first Python version exporting less symbols than its previous version.</p> <div class="section" id="python-3-6"> <h3>Python 3.6</h3> <p>Deprecate 4 functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyUnicode_AsDecodedObject()</tt></li> <li><tt class="docutils literal">PyUnicode_AsDecodedUnicode()</tt></li> <li><tt class="docutils literal">PyUnicode_AsEncodedObject()</tt></li> <li><tt class="docutils literal">PyUnicode_AsEncodedUnicode()</tt></li> </ul> </div> <div class="section" id="python-3-7"> <h3>Python 3.7</h3> <ul class="simple"> <li>Deprecate <tt class="docutils literal">PyOS_AfterFork()</tt></li> <li>Remove <tt class="docutils literal">PyExc_RecursionErrorInst</tt> singleton (also removed in Python 3.6.4).</li> </ul> </div> <div class="section" id="python-3-8"> <h3>Python 3.8</h3> <p>Remove 3 functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyByteArray_Init()</tt></li> <li><tt class="docutils literal">PyByteArray_Fini()</tt></li> <li><tt class="docutils literal">PyEval_ReInitThreads()</tt></li> </ul> <p>Remove 1 structure:</p> <ul class="simple"> <li><tt class="docutils literal">PyInterpreterState</tt> (moved to the internal C API)</li> </ul> </div> <div class="section" id="python-3-9"> <h3>Python 3.9</h3> <p>Remove 32 symbols:</p> <ul class="simple"> <li><tt class="docutils literal">PyAsyncGen_ClearFreeLists()</tt></li> <li><tt class="docutils literal">PyCFunction_ClearFreeList()</tt></li> <li><tt class="docutils literal">PyCmpWrapper_Type</tt></li> <li><tt class="docutils literal">PyContext_ClearFreeList()</tt></li> <li><tt class="docutils literal">PyDict_ClearFreeList()</tt></li> <li><tt class="docutils literal">PyFloat_ClearFreeList()</tt></li> <li><tt class="docutils literal">PyFrame_ClearFreeList()</tt></li> <li><tt class="docutils literal">PyFrame_ExtendStack()</tt></li> <li><tt class="docutils literal">PyList_ClearFreeList()</tt></li> <li><tt class="docutils literal">PyMethod_ClearFreeList()</tt></li> <li><tt class="docutils literal">PyNoArgsFunction type</tt></li> <li><tt class="docutils literal">PyNullImporter_Type</tt></li> <li><tt class="docutils literal">PySet_ClearFreeList()</tt></li> <li><tt class="docutils literal">PySortWrapper_Type</tt></li> <li><tt class="docutils literal">PyTuple_ClearFreeList()</tt></li> <li><tt class="docutils literal">PyUnicode_ClearFreeList()</tt></li> <li><tt class="docutils literal">Py_UNICODE_MATCH()</tt></li> <li><tt class="docutils literal">_PyAIterWrapper_Type</tt></li> <li><tt class="docutils literal">_PyBytes_InsertThousandsGrouping()</tt></li> <li><tt class="docutils literal">_PyBytes_InsertThousandsGroupingLocale()</tt></li> <li><tt class="docutils literal">_PyDebug_PrintTotalRefs()</tt></li> <li><tt class="docutils literal">_PyFloat_Digits()</tt></li> <li><tt class="docutils literal">_PyFloat_DigitsInit()</tt></li> <li><tt class="docutils literal">_PyFloat_Repr()</tt></li> <li><tt class="docutils literal">_PyThreadState_GetFrame()</tt> (and <tt class="docutils literal">_PyRuntime.getframe</tt>)</li> <li><tt class="docutils literal">_PyUnicode_ClearStaticStrings()</tt></li> <li><tt class="docutils literal">_Py_AddToAllObjects()</tt></li> <li><tt class="docutils literal">_Py_InitializeFromArgs()</tt></li> <li><tt class="docutils literal">_Py_InitializeFromWideArgs()</tt></li> <li><tt class="docutils literal">_Py_PrintReferenceAddresses()</tt></li> <li><tt class="docutils literal">_Py_PrintReferences()</tt></li> <li><tt class="docutils literal">_Py_tracemalloc_config</tt></li> </ul> <p>Remove 1 structure:</p> <ul class="simple"> <li><tt class="docutils literal">PyGC_Head</tt> (moved to the internal C API)</li> </ul> <p>Deprecate 15 functions:</p> <ul class="simple"> <li><tt class="docutils literal">PyEval_CallFunction()</tt></li> <li><tt class="docutils literal">PyEval_CallMethod()</tt></li> <li><tt class="docutils literal">PyEval_CallObject()</tt></li> <li><tt class="docutils literal">PyEval_CallObjectWithKeywords()</tt></li> <li><tt class="docutils literal">PyNode_Compile()</tt></li> <li><tt class="docutils literal">PyParser_SimpleParseFileFlags()</tt></li> <li><tt class="docutils literal">PyParser_SimpleParseStringFlags()</tt></li> <li><tt class="docutils literal">PyParser_SimpleParseStringFlagsFilename()</tt></li> <li><tt class="docutils literal">PyUnicode_AsUnicode()</tt></li> <li><tt class="docutils literal">PyUnicode_AsUnicodeAndSize()</tt></li> <li><tt class="docutils literal">PyUnicode_FromUnicode()</tt></li> <li><tt class="docutils literal">PyUnicode_WSTR_LENGTH()</tt></li> <li><tt class="docutils literal">Py_UNICODE_COPY()</tt></li> <li><tt class="docutils literal">Py_UNICODE_FILL()</tt></li> <li><tt class="docutils literal">_PyUnicode_AsUnicode()</tt></li> </ul> </div> <div class="section" id="python-3-10"> <h3>Python 3.10</h3> <p>Remove 44 symbols:</p> <ul class="simple"> <li><tt class="docutils literal">PyAST_Compile()</tt></li> <li><tt class="docutils literal">PyAST_CompileEx()</tt></li> <li><tt class="docutils literal">PyAST_CompileObject()</tt></li> <li><tt class="docutils literal">PyAST_Validate()</tt></li> <li><tt class="docutils literal">PyArena_AddPyObject()</tt></li> <li><tt class="docutils literal">PyArena_Free()</tt></li> <li><tt class="docutils literal">PyArena_Malloc()</tt></li> <li><tt class="docutils literal">PyArena_New()</tt></li> <li><tt class="docutils literal">PyFuture_FromAST()</tt></li> <li><tt class="docutils literal">PyFuture_FromASTObject()</tt></li> <li><tt class="docutils literal">PyLong_FromUnicode()</tt></li> <li><tt class="docutils literal">PyNode_Compile()</tt></li> <li><tt class="docutils literal">PyOS_InitInterrupts()</tt></li> <li><tt class="docutils literal">PyObject_AsCharBuffer()</tt></li> <li><tt class="docutils literal">PyObject_AsReadBuffer()</tt></li> <li><tt class="docutils literal">PyObject_AsWriteBuffer()</tt></li> <li><tt class="docutils literal">PyObject_CheckReadBuffer()</tt></li> <li><tt class="docutils literal">PyParser_ASTFromFile()</tt></li> <li><tt class="docutils literal">PyParser_ASTFromFileObject()</tt></li> <li><tt class="docutils literal">PyParser_ASTFromFilename()</tt></li> <li><tt class="docutils literal">PyParser_ASTFromString()</tt></li> <li><tt class="docutils literal">PyParser_ASTFromStringObject()</tt></li> <li><tt class="docutils literal">PyParser_SimpleParseFileFlags()</tt></li> <li><tt class="docutils literal">PyParser_SimpleParseStringFlags()</tt></li> <li><tt class="docutils literal">PyParser_SimpleParseStringFlagsFilename()</tt></li> <li><tt class="docutils literal">PyST_GetScope()</tt></li> <li><tt class="docutils literal">PySymtable_Build()</tt></li> <li><tt class="docutils literal">PySymtable_BuildObject()</tt></li> <li><tt class="docutils literal">PySymtable_Free()</tt></li> <li><tt class="docutils literal">PyUnicode_AsUnicodeCopy()</tt></li> <li><tt class="docutils literal">PyUnicode_GetMax()</tt></li> <li><tt class="docutils literal">Py_ALLOW_RECURSION</tt></li> <li><tt class="docutils literal">Py_END_ALLOW_RECURSION</tt></li> <li><tt class="docutils literal">Py_SymtableString()</tt></li> <li><tt class="docutils literal">Py_SymtableStringObject()</tt></li> <li><tt class="docutils literal">Py_UNICODE_strcat()</tt></li> <li><tt class="docutils literal">Py_UNICODE_strchr()</tt></li> <li><tt class="docutils literal">Py_UNICODE_strcmp()</tt></li> <li><tt class="docutils literal">Py_UNICODE_strcpy()</tt></li> <li><tt class="docutils literal">Py_UNICODE_strlen()</tt></li> <li><tt class="docutils literal">Py_UNICODE_strncmp()</tt></li> <li><tt class="docutils literal">Py_UNICODE_strncpy()</tt></li> <li><tt class="docutils literal">Py_UNICODE_strrchr()</tt></li> <li><tt class="docutils literal">_Py_CheckRecursionLimit</tt></li> </ul> <p>Remove 1 structure:</p> <ul class="simple"> <li><tt class="docutils literal">_PyUnicode_Name_CAPI</tt></li> </ul> <p>Deprecate 1 function:</p> <ul class="simple"> <li><tt class="docutils literal">PyUnicode_InternImmortal()</tt></li> </ul> <p>Moreover, <tt class="docutils literal">PyUnicode_FromStringAndSize(NULL, size)</tt> and <tt class="docutils literal">PyUnicode_FromUnicode(NULL, size)</tt> have been deprecated.</p> </div> <div class="section" id="statistics"> <h3>Statistics</h3> <p>Public Python symbols exported with <tt class="docutils literal">PyAPI_FUNC()</tt> and <tt class="docutils literal">PyAPI_DATA()</tt>:</p> <table border="1" class="docutils"> <colgroup> <col width="39%" /> <col width="61%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Python</th> <th class="head">Symbols</th> </tr> </thead> <tbody valign="top"> <tr><td>2.7</td> <td>891</td> </tr> <tr><td>3.6</td> <td>1041 (+150)</td> </tr> <tr><td>3.7</td> <td>1068 (+27)</td> </tr> <tr><td>3.8</td> <td>1105 (+37)</td> </tr> <tr><td>3.9</td> <td>1115 (+10)</td> </tr> <tr><td>3.10</td> <td>1080 (-35)</td> </tr> </tbody> </table> <p>Command used to count public symbols:</p> <pre class="literal-block"> grep -E 'PyAPI_(FUNC|DATA)' Include/*.h Include/cpython/*.h|grep -v ' _Py'|wc -l </pre> </div> </div> <div class="section" id="reorganize-header-files"> <h2>Reorganize header files</h2> <p>Since Python 3.8, the C API is organized as 3 parts:</p> <ol class="arabic simple"> <li><tt class="docutils literal">Include/</tt> directory: Limited API</li> <li><tt class="docutils literal">Include/cpython/</tt> directory: CPython implementation details</li> <li><tt class="docutils literal">Include/internal/</tt> directory: The internal API</li> </ol> <p>The intent is to help developers to think about if their additions must be part of the limited C API, the CPython C API or the internal C API.</p> <div class="section" id="python-3-7-1"> <h3>Python 3.7</h3> <p>Creation on the <tt class="docutils literal">Include/internal/</tt> directory.</p> </div> <div class="section" id="python-3-8-1"> <h3>Python 3.8</h3> <p>Creation on the <tt class="docutils literal">Include/cpython/</tt> directory.</p> </div> <div class="section" id="python-3-10-1"> <h3>Python 3.10</h3> <p>Move 8 header files from <tt class="docutils literal">Include/</tt> to <tt class="docutils literal">Include/cpython/</tt>:</p> <ul class="simple"> <li><tt class="docutils literal">odictobject.h</tt></li> <li><tt class="docutils literal">parser_interface.h</tt></li> <li><tt class="docutils literal">picklebufobject.h</tt></li> <li><tt class="docutils literal">pyarena.h</tt></li> <li><tt class="docutils literal">pyctype.h</tt></li> <li><tt class="docutils literal">pydebug.h</tt></li> <li><tt class="docutils literal">pyfpe.h</tt></li> <li><tt class="docutils literal">pytime.h</tt></li> </ul> <p>Python 3.10 added a <a class="reference external" href="https://github.com/python/cpython/blob/master/Include/README.rst">Include/README.rst documentation</a> to explain this organization and give guidelines for adding new functions. For example, new functions in the public C API must not steal references nor return borrowed references. In the meanwhile, this documentation moved to the devguide: <a class="reference external" href="https://devguide.python.org/c-api/">Changing Python’s C API</a>.</p> </div> <div class="section" id="statistics-1"> <h3>Statistics</h3> <p>Number of C API line numbers per Python version:</p> <table border="1" class="docutils"> <colgroup> <col width="14%" /> <col width="27%" /> <col width="22%" /> <col width="24%" /> <col width="14%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Python</th> <th class="head">Limited API</th> <th class="head">CPython API</th> <th class="head">Internal API</th> <th class="head">Total</th> </tr> </thead> <tbody valign="top"> <tr><td>2.7</td> <td>12,686 (100%)</td> <td>0</td> <td>0</td> <td>12,686</td> </tr> <tr><td>3.6</td> <td>16,011 (100%)</td> <td>0</td> <td>0</td> <td>16,011</td> </tr> <tr><td>3.7</td> <td>16,517 (96%)</td> <td>0</td> <td>705 (4%)</td> <td>17,222</td> </tr> <tr><td>3.8</td> <td>13,160 (70%)</td> <td>3,417 (18%)</td> <td>2,230 (12%)</td> <td>18,807</td> </tr> <tr><td>3.9</td> <td>12,264 (62%)</td> <td>4,343 (22%)</td> <td>3,066 (16%)</td> <td>19,673</td> </tr> <tr><td>3.10</td> <td>10,305 (52%)</td> <td>4,513 (23%)</td> <td>5,092 (26%)</td> <td>19,910</td> </tr> </tbody> </table> <p>Commands:</p> <ul class="simple"> <li>Limited: <tt class="docutils literal">wc <span class="pre">-l</span> <span class="pre">Include/*.h</span></tt></li> <li>CPython: <tt class="docutils literal">wc <span class="pre">-l</span> <span class="pre">Include/cpython/*.h</span></tt></li> <li>Internal: <tt class="docutils literal">wc <span class="pre">-l</span> <span class="pre">Include/internal/*.h</span></tt></li> </ul> </div> </div> <div class="section" id="changes-in-the-limited-c-api"> <h2>Changes in the Limited C API</h2> <p>Between Python 3.8 and 3.10, 4 new functions have been and 14 symbols (functions or variables) have been removed from the limited C API.</p> <p>The trashcan API was excluded from the limited C API since it never worked. The implementation accessed directly PyThreadState members, whereas this structure is opaque in the limited C API.</p> <p>On the other side, Py_EnterRecursiveCall() and Py_LeaveRecursiveCall() functions have been added to the limited C API. In Python 3.8, they were defined as macros accessing directly PyThreadState members. In Python 3.9, they became opaque function calls and so are now compatible with the stable ABI.</p> <div class="section" id="python-3-9-1"> <h3>Python 3.9</h3> <p>Add 3 functions to the limited C API:</p> <ul class="simple"> <li><tt class="docutils literal">Py_EnterRecursiveCall()</tt></li> <li><tt class="docutils literal">Py_LeaveRecursiveCall()</tt></li> <li><tt class="docutils literal">PyFrame_GetLineNumber()</tt></li> </ul> <p>Remove 14 symbols from the limited C API:</p> <ul class="simple"> <li><tt class="docutils literal">PyFPE_START_PROTECT()</tt></li> <li><tt class="docutils literal">PyFPE_END_PROTECT()</tt></li> <li><tt class="docutils literal">PyThreadState_DeleteCurrent()</tt></li> <li><tt class="docutils literal">PyTrash_UNWIND_LEVEL</tt></li> <li><tt class="docutils literal">Py_TRASHCAN_BEGIN</tt></li> <li><tt class="docutils literal">Py_TRASHCAN_BEGIN_CONDITION</tt></li> <li><tt class="docutils literal">Py_TRASHCAN_END</tt></li> <li><tt class="docutils literal">Py_TRASHCAN_SAFE_BEGIN</tt></li> <li><tt class="docutils literal">Py_TRASHCAN_SAFE_END</tt></li> <li><tt class="docutils literal">_PyTraceMalloc_NewReference()</tt></li> <li><tt class="docutils literal">_Py_CheckRecursionLimit</tt></li> <li><tt class="docutils literal">_Py_GetRefTotal()</tt></li> <li><tt class="docutils literal">_Py_NewReference()</tt></li> <li><tt class="docutils literal">_Py_ForgetReference()</tt></li> </ul> </div> <div class="section" id="python-3-10-2"> <h3>Python 3.10</h3> <p>Add 1 function to the limited C API:</p> <ul class="simple"> <li><tt class="docutils literal">PyUnicode_AsUTF8AndSize()</tt></li> </ul> </div> </div> <div class="section" id="pep-652-maintaining-the-stable-abi"> <h2>PEP 652: Maintaining the Stable ABI</h2> <p>Petr Viktorin wrote and implemented the <a class="reference external" href="https://www.python.org/dev/peps/pep-0652/">PEP 652: Maintaining the Stable ABI</a> in Python 3.10.</p> <p>The Stable ABI (Application Binary Interface) for extension modules or embedding Python is now explicitly defined. The <a class="reference external" href="https://docs.python.org/dev/c-api/stable.html#stable">C API Stability</a> documentation describes C API and ABI stability guarantees along with best practices for using the Stable ABI.</p> </div> Creation of the pythoncapi_compat project2021-03-30T20:00:00+02:002021-03-30T20:00:00+02:00Victor Stinnertag:vstinner.github.io,2021-03-30:/pythoncapi_compat.html<a class="reference external image-reference" href="https://twitter.com/Kekeflipnote/status/1378034391872638980"> <img alt="Strange Cat by Kéké" src="https://vstinner.github.io/images/strange_cat.jpg" /> </a> <p>In 2020, I created a new <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat project</a> to add Python 3.10 support to C extensions without losing support for old Python versions. It supports Python 2.7-3.10 and PyPy 2.7-3.7. The project is made of two parts:</p> <ul class="simple"> <li><tt class="docutils literal">pythoncapi_compat.h</tt>: Header file providing new C API …</li></ul><a class="reference external image-reference" href="https://twitter.com/Kekeflipnote/status/1378034391872638980"> <img alt="Strange Cat by Kéké" src="https://vstinner.github.io/images/strange_cat.jpg" /> </a> <p>In 2020, I created a new <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat project</a> to add Python 3.10 support to C extensions without losing support for old Python versions. It supports Python 2.7-3.10 and PyPy 2.7-3.7. The project is made of two parts:</p> <ul class="simple"> <li><tt class="docutils literal">pythoncapi_compat.h</tt>: Header file providing new C API functions to old Python versions, like <tt class="docutils literal">Py_SET_TYPE()</tt>.</li> <li><tt class="docutils literal">upgrade_pythoncapi.py</tt>: Script upgrading C extension modules using <tt class="docutils literal">pythoncapi_compat.h</tt>. For example, it replaces <tt class="docutils literal">Py_TYPE(obj) = type;</tt> with <tt class="docutils literal">Py_SET_TYPE(obj, type);</tt>.</li> </ul> <p>This article is about the creation of the header file and the upgrade script.</p> <p>Photo: Strange cats 🐾 by Kéké.</p> <div class="section" id="py-set-type-macro-for-python-3-8-and-older"> <h2>Py_SET_TYPE() macro for Python 3.8 and older</h2> <div class="section" id="py-type-macro-converted-to-a-static-inline-function"> <h3>Py_TYPE() macro converted to a static inline function</h3> <p>In May 2020 in the <a class="reference external" href="https://bugs.python.org/issue39573">bpo-39573 &quot;Make PyObject an opaque structure&quot;</a>, <a class="reference external" href="https://github.com/python/cpython/commit/ad3252bad905d41635bcbb4b76db30d570cf0087">Py_TYPE()</a> (change by Dong-hee Na), <a class="reference external" href="https://github.com/python/cpython/commit/fe2978b3b940fe2478335e3a2ca5ad22338cdf9c">Py_REFCNT() and Py_SIZE()</a> (change by me) macros were converted to static inline functions. This change broke 17 C extension modules (see my previous article <a class="reference external" href="https://vstinner.github.io/c-api-opaque-structures.html">Make structures opaque in the Python C API</a>).</p> <p>I prepared this change in Python 3.9 by adding Py_SET_REFCNT(), Py_SET_TYPE() and Py_SET_SIZE() functions, and by modifying Python to use these functions. I also <a class="reference external" href="https://github.com/python/cpython/commit/d905df766c367c350f20c46ccd99d4da19ed57d8">added Py_IS_TYPE() function</a> which tests the type of an object:</p> <pre class="literal-block"> static inline int _Py_IS_TYPE(PyObject *ob, PyTypeObject *type) { return ob-&gt;ob_type == type; } #define Py_IS_TYPE(ob, type) _Py_IS_TYPE(_PyObject_CAST(ob), type) </pre> <p>For example, <tt class="docutils literal">Py_TYPE(ob) == (tp)</tt> can be replaced with <tt class="docutils literal">Py_IS_TYPE(ob, tp)</tt>.</p> </div> <div class="section" id="cython-and-numpy-fixes"> <h3>Cython and numpy fixes</h3> <p>I fixed Cython by <a class="reference external" href="https://github.com/cython/cython/commit/d8e93b332fe7d15459433ea74cd29178c03186bd">adding __Pyx_SET_REFCNT() and __Pyx_SET_SIZE() macros</a>:</p> <pre class="literal-block"> #if PY_VERSION_HEX &gt;= 0x030900A4 #define __Pyx_SET_REFCNT(obj, refcnt) Py_SET_REFCNT(obj, refcnt) #define __Pyx_SET_SIZE(obj, size) Py_SET_SIZE(obj, size) #else #define __Pyx_SET_REFCNT(obj, refcnt) Py_REFCNT(obj) = (refcnt) #define __Pyx_SET_SIZE(obj, size) Py_SIZE(obj) = (size) #endif </pre> <p>The <a class="reference external" href="https://github.com/numpy/numpy/commit/a96b18e3d4d11be31a321999cda4b795ea9eccaa">numpy fix</a>:</p> <pre class="literal-block"> #if PY_VERSION_HEX &lt; 0x030900a4 #define Py_SET_TYPE(obj, typ) (Py_TYPE(obj) = typ) #define Py_SET_SIZE(obj, size) (Py_SIZE(obj) = size) #endif </pre> <p><a class="reference external" href="https://github.com/numpy/numpy/commit/f1671076c80bd972421751f2d48186ee9ac808aa">The numpy fix was updated</a> to not have a return value by adding <tt class="docutils literal">&quot;, (void)0&quot;</tt>:</p> <pre class="literal-block"> #if PY_VERSION_HEX &lt; 0x030900a4 #define Py_SET_TYPE(obj, type) ((Py_TYPE(obj) = (type)), (void)0) #define Py_SET_SIZE(obj, size) ((Py_SIZE(obj) = (size)), (void)0) #endif </pre> <p>So the macros better mimicks the static inline functions behavior.</p> </div> <div class="section" id="c-api-porting-guide"> <h3>C API Porting Guide</h3> <p>I copied the numpy macros <a class="reference external" href="https://github.com/python/cpython/commit/dc24b8a2ac32114313bae519db3ccc21fe45c982">to the C API section of the Python 3.10 porting guide (What's New in Python 3.10)</a>. Py_SET_TYPE() documentation.</p> <blockquote> <p>Since <tt class="docutils literal">Py_TYPE()</tt> is changed to the inline static function, <tt class="docutils literal">Py_TYPE(obj) = new_type</tt> must be replaced with <tt class="docutils literal">Py_SET_TYPE(obj, new_type)</tt>: see <tt class="docutils literal">Py_SET_TYPE()</tt> (available since Python 3.9). For backward compatibility, this macro can be used:</p> <pre class="literal-block"> #if PY_VERSION_HEX &lt; 0x030900A4 # define Py_SET_TYPE(obj, type) ((Py_TYPE(obj) = (type)), (void)0) #endif </pre> </blockquote> </div> <div class="section" id="copy-paste-macros"> <h3>Copy/paste macros</h3> <p>Up to 3 macros must be copied/pasted for backward compatibility in each project:</p> <pre class="literal-block"> #if PY_VERSION_HEX &lt; 0x030900A4 # define Py_SET_TYPE(obj, type) ((Py_TYPE(obj) = (type)), (void)0) #endif #if PY_VERSION_HEX &lt; 0x030900A4 # define Py_SET_REFCNT(obj, refcnt) ((Py_REFCNT(obj) = (refcnt)), (void)0) #endif #if PY_VERSION_HEX &lt; 0x030900A4 # define Py_SET_SIZE(obj, size) ((Py_SIZE(obj) = (size)), (void)0) #endif </pre> <p>These macros started to be copied into multiple projects. Examples:</p> <ul class="simple"> <li><a class="reference external" href="https://bazaar.launchpad.net/~brz/brz/3.1/revision/7647">breezy</a></li> <li><a class="reference external" href="https://github.com/numpy/numpy/commit/f1671076c80bd972421751f2d48186ee9ac808aa">numpy</a></li> <li><a class="reference external" href="https://github.com/pycurl/pycurl/commit/e633f9a1ac4df5e249e78c218d5fbbd848219042">pycurl</a></li> </ul> <p>There might be a better way than copying/pasting these compatibility layer in each project, adding macros one by one...</p> </div> </div> <div class="section" id="creation-of-the-pythoncapi-compat-h-header-file"> <h2>Creation of the pythoncapi_compat.h header file</h2> <p>While the code for Py_SET_REFCNT(), Py_SET_TYPE() and Py_SET_SIZE() macros is short, I also wanted to use the new seven Python 3.9 getter functions on Python 3.8 and older:</p> <ul class="simple"> <li>Py_IS_TYPE()</li> <li>PyFrame_GetBack()</li> <li>PyFrame_GetCode()</li> <li>PyInterpreterState_Get()</li> <li>PyThreadState_GetFrame()</li> <li>PyThreadState_GetID()</li> <li>PyThreadState_GetInterpreter()</li> </ul> <p>In June 2020, I created <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">the pythoncapi_compat project</a> project with a <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat/blob/main/pythoncapi_compat.h">pythoncapi_compat.h header file</a> which defines these functions as static inline functions. An <tt class="docutils literal">&quot;#if PY_VERSION_HEX&quot;</tt> guard prevents to define a function if it's already provided by <tt class="docutils literal">Python.h</tt>. Example of the current implementation of PyThreadState_GetInterpreter() for Python 3.8 and older:</p> <pre class="literal-block"> // bpo-39947 added PyThreadState_GetInterpreter() to Python 3.9.0a5 #if PY_VERSION_HEX &lt; 0x030900A5 static inline PyInterpreterState * PyThreadState_GetInterpreter(PyThreadState *tstate) { assert(tstate != NULL); return tstate-&gt;interp; } #endif </pre> <p>I wrote tests on each function using a C extension. The project initially supported Python 3.6 to Python 3.10. The test runner checks also for reference leaks.</p> </div> <div class="section" id="mercurial-and-python-2-7"> <h2>Mercurial and Python 2.7</h2> <p>The Mercurial project has multiple C extensions, was broken on Python 3.10 by the Py_TYPE() change, and is one of the last project still requiring Python 2.7 in 2021. It's a good candidate to check if pythoncapi_compat.h is useful.</p> <p><a class="reference external" href="https://bz.mercurial-scm.org/show_bug.cgi?id=6451">I proposed a patch</a> then <a class="reference external" href="https://foss.heptapod.net/octobus/mercurial-devel/-/merge_requests/61">converted to a merge request</a>. It got accepted in the &quot;next&quot; branch, but compatibility with Visual Studio 2008 had to be fixed for Python 2.7 on Windows. I fixed pythoncapi_compat.h by defining <tt class="docutils literal">inline</tt> as <tt class="docutils literal">__inline</tt>:</p> <pre class="literal-block"> // Compatibility with Visual Studio 2013 and older which don't support // the inline keyword in C (only in C++): use __inline instead. #if (defined(_MSC_VER) &amp;&amp; _MSC_VER &lt; 1900 \ &amp;&amp; !defined(__cplusplus) &amp;&amp; !defined(inline)) # define inline __inline # define PYTHONCAPI_COMPAT_MSC_INLINE // These two macros are undefined at the end of this file #endif (...) #ifdef PYTHONCAPI_COMPAT_MSC_INLINE # undef inline # undef PYTHONCAPI_COMPAT_MSC_INLINE #endif </pre> <p>I chose to continue writing <tt class="docutils literal">static inline</tt>, so pythoncapi_compat.h remains close to Python header files. I also modified the pythoncapi_compat test suite to also test Python 2.7.</p> </div> <div class="section" id="pybind11-and-pypy"> <h2>pybind11 and PyPy</h2> <p>More recently, I added PyPy 2.7, 3.6 and 3.7 support for pybind11, since PyPy is tested by their CI. The fix is to no longer define the following functions on PyPy:</p> <ul class="simple"> <li>PyFrame_GetBack(), _PyFrame_GetBackBorrow()</li> <li>PyThreadState_GetFrame(), _PyThreadState_GetFrameBorrow()</li> <li>PyThreadState_GetID()</li> <li>PyObject_GC_IsTracked()</li> <li>PyObject_GC_IsFinalized()</li> </ul> </div> <div class="section" id="creation-of-the-upgrade-pythoncapi-py-script"> <h2>Creation of the upgrade_pythoncapi.py script</h2> <div class="section" id="upgrade-pythoncapi-py"> <h3>upgrade_pythoncapi.py</h3> <p>In November 2020, I created a new <tt class="docutils literal">upgrade_pythoncapi.py</tt> script to replace <tt class="docutils literal">&quot;Py_TYPE(obj) = type;&quot;</tt> with <tt class="docutils literal">&quot;Py_SET_TYPE(obj, <span class="pre">type);&quot;</span></tt>. The script is based on my <a class="reference external" href="https://github.com/vstinner/sixer">old sixer.py project</a> which adds Python 3 support to a Python project without losing Python 2 support. The <tt class="docutils literal">upgrade_pythoncapi.py</tt> script uses regular expressions to replace one pattern with another.</p> <p>Similar to <tt class="docutils literal">sixer</tt> which adds <tt class="docutils literal">import six</tt> to support Python 2 and Python 3 in a single code base, <tt class="docutils literal">upgrade_pythoncapi.py</tt> adds <tt class="docutils literal">#include &quot;pythoncapi_compat.h&quot;</tt> to support old and new versions of the Python C API in a single code base.</p> <p>I first created a new GitHub project for upgrade_pythoncapi.py, but since it was too tightly coupled to the pythoncapi_compat.h header file, I moved the script to the pythoncapi_compat project.</p> </div> <div class="section" id="tests"> <h3>Tests</h3> <p>I added more and more &quot;operations&quot; to update C extensions. For me, <strong>the most important part is the test suite</strong> to ensure that the script doesn't introduce bugs. It contains code which must not be replaced. For example, it ensures that <tt class="docutils literal"><span class="pre">frame-&gt;f_code</span> = code</tt> is not replaced with <tt class="docutils literal">_PyFrame_GetCodeBorrow(frame) = code</tt> by mistake.</p> </div> <div class="section" id="borrowed-references"> <h3>Borrowed references</h3> <p>Code accessing <tt class="docutils literal"><span class="pre">frame-&gt;f_code</span></tt> directly must use <tt class="docutils literal">PyFrame_GetCode()</tt> but this function returns a strong reference, whereas <tt class="docutils literal"><span class="pre">frame-&gt;f_code</span></tt> gives a borrowed reference. I added &quot;Borrow&quot; variants of the functions to <tt class="docutils literal">pythoncapi_compat.h</tt> for <tt class="docutils literal">upgrade_pythoncapi.py</tt>. For example, <tt class="docutils literal"><span class="pre">frame-&gt;f_code</span></tt> is replaced with <tt class="docutils literal">_PyFrame_GetCodeBorrow()</tt> which is defined as:</p> <pre class="literal-block"> static inline PyCodeObject* _PyFrame_GetCodeBorrow(PyFrameObject *frame) { return (PyCodeObject *)_Py_StealRef(PyFrame_GetCode(frame)); } </pre> <p>The <tt class="docutils literal">_Py_StealRef(obj)</tt> function converts a strong reference to a borrowed reference (simplified code):</p> <pre class="literal-block"> static inline PyObject* _Py_StealRef(PyObject *obj) { Py_DECREF(obj); return obj; } </pre> <p>It is the opposite of <tt class="docutils literal">Py_NewRef()</tt>. It is similar to <tt class="docutils literal">Py_DECREF(obj)</tt> but it can be used as an expression: it returns <em>obj</em>. pythoncapi_compat.h defines private <tt class="docutils literal">_Py_StealRef()</tt> and <tt class="docutils literal">_Py_XStealRef()</tt> static inline functions. First I proposed to add them to Python, but I abandoned the idea (see <a class="reference external" href="https://bugs.python.org/issue42522">bpo-42522</a>).</p> <p>Thanks to the &quot;Borrow&quot; suffix in function names, it becomes easier to discover the usage of borrowed references. Using a borrowed reference is unsafe if it is possible that the object is destroyed before the last usage of borrowed reference. In case of doubt, it's better to use a strong reference. For example, <tt class="docutils literal">_PyFrame_GetCodeBorrow()</tt> can be replaced with <tt class="docutils literal">PyFrame_GetCode()</tt>, but it requires to explicitly delete the created strong reference with <tt class="docutils literal">Py_DECREF()</tt>.</p> </div> </div> <div class="section" id="practical-solution-for-incompatible-c-api-changes"> <h2>Practical solution for incompatible C API changes</h2> <p>So far, I succeeded to convince 4 projects to use pythoncapi_compat.h: bitarray, immutables, Mercurial and python-zstandard.</p> <p>In my opinion, pythoncapi_compat.h is the right approach to introduce incompatible C API changes: provide a practical solution to support old and new Python versions in a single code base.</p> <p>The next steps is to get it adopted more widely and get it endorsed by the Python project, maybe by moving it under the PSF organization on GitHub.</p> </div> Make structures opaque in the Python C API2021-03-26T12:00:00+01:002021-03-26T12:00:00+01:00Victor Stinnertag:vstinner.github.io,2021-03-26:/c-api-opaque-structures.html<a class="reference external image-reference" href="https://fr.wikipedia.org/wiki/Incendie_du_centre_de_donn%C3%A9es_d%27OVHcloud_%C3%A0_Strasbourg"> <img alt="OVHcloud datacenter fire in Strasbourg" src="https://vstinner.github.io/images/incendie-ovh.jpg" /> </a> <p>This article is about changes that I made, with the help other developers, in the Python C API in Python 3.8, 3.9 and 3.10 to avoid accessing structures members: prepare the C API to <a class="reference external" href="https://en.wikipedia.org/wiki/Opaque_data_type">make structures opaque</a>. These changes are related to my <a class="reference external" href="https://www.python.org/dev/peps/pep-0620/">PEP 620 &quot;Hide implementation …</a></p><a class="reference external image-reference" href="https://fr.wikipedia.org/wiki/Incendie_du_centre_de_donn%C3%A9es_d%27OVHcloud_%C3%A0_Strasbourg"> <img alt="OVHcloud datacenter fire in Strasbourg" src="https://vstinner.github.io/images/incendie-ovh.jpg" /> </a> <p>This article is about changes that I made, with the help other developers, in the Python C API in Python 3.8, 3.9 and 3.10 to avoid accessing structures members: prepare the C API to <a class="reference external" href="https://en.wikipedia.org/wiki/Opaque_data_type">make structures opaque</a>. These changes are related to my <a class="reference external" href="https://www.python.org/dev/peps/pep-0620/">PEP 620 &quot;Hide implementation details from the C API&quot;</a>.</p> <p>One change had <strong>negative impact on performance</strong> and had to be reverted. Making Python slower just to make structures opaque would first require to get the PEP 620 accepted.</p> <p>While compatible changes merged in Python 3.8 and Python 3.9 went fine, one Python 3.10 <strong>incompatible change caused more troubles</strong> and had to be reverted.</p> <p>Photo: OVHcloud data center fire in Strasbourg.</p> <div class="section" id="rationale"> <h2>Rationale</h2> <p>The C API currently exposes most object structures, C extensions indirectly access structures members through the API, but can also access them directly. It causes different issues:</p> <ul class="simple"> <li>Modifying a structure can break an unknown number of C extensions. To prevent any risk, CPython core developers avoid modifying structures. Once most structures will be opaque, it will be possible to experiment <strong>optimizations</strong> which require deep structures changes without breaking C extensions. The irony is that we first have to break the backward compatibility and C extensions for that.</li> <li>Any structure change breaks the ABI. The <strong>stable ABI</strong> solved this issue by not exposing structures into its limited C API. The idea is to bend the default C API towards the limited C API to provide a stable ABI for everyone in the long term.</li> </ul> </div> <div class="section" id="issues"> <h2>Issues</h2> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue39573">PyObject: bpo-39573</a></li> <li><a class="reference external" href="https://bugs.python.org/issue40170">PyTypeObject: bpo-40170</a></li> <li><a class="reference external" href="https://bugs.python.org/issue39947">PyThreadState: bpo-39947</a></li> <li><a class="reference external" href="https://bugs.python.org/issue40421">PyFrameObject: bpo-40421</a></li> </ul> </div> <div class="section" id="opaque-structures"> <h2>Opaque structures</h2> <ul class="simple"> <li>Python 3.8 made the PyInterpreterState structure opaque.</li> <li>Python 3.9 made the PyGC_Head structure opaque.</li> </ul> </div> <div class="section" id="add-getter-functions-to-python-3-9"> <h2>Add getter functions to Python 3.9</h2> <ul class="simple"> <li>PyObject, PyVarObject:<ul> <li>Py_SET_REFCNT()</li> <li>Py_SET_TYPE()</li> <li>Py_SET_SIZE()</li> <li>Py_IS_TYPE()</li> </ul> </li> <li>PyFrameObject:<ul> <li>PyFrame_GetCode()</li> <li>PyFrame_GetBack()</li> </ul> </li> <li>PyThreadState:<ul> <li>PyThreadState_GetInterpreter()</li> <li>PyThreadState_GetFrame()</li> <li>PyThreadState_GetID()</li> </ul> </li> <li>PyInterpreterState:<ul> <li>PyInterpreterState_Get()</li> </ul> </li> </ul> <p>PyInterpreterState_Get() can be used to replace <tt class="docutils literal"><span class="pre">PyThreadState_Get()-&gt;interp</span></tt> and <tt class="docutils literal"><span class="pre">PyThreadState_GetInterpreter(PyThreadState_Get())</span></tt>.</p> </div> <div class="section" id="convert-macros-to-static-inline-functions-in-python-3-8"> <h2>Convert macros to static inline functions in Python 3.8</h2> <div class="section" id="macro-pitfalls"> <h3>Macro pitfalls</h3> <p>Macros are convenient but have <a class="reference external" href="https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html">multiple pitfalls</a>. Some macros can be abused in surprising ways. For example, the following code is valid with Python 3.9:</p> <pre class="literal-block"> if (obj == NULL || PyList_SET_ITEM (l, i, obj) &lt; 0) { ... } </pre> <p>In Python 3.9, PyList_SET_ITEM() returns <em>obj</em> in this case, <em>obj</em> is a pointer, and so the test checks if a pointer is negative which makes no sense (but is accepted by C compilers by default). This code is likely a confusion with PyList_SetItem() which returns a int, negative in case of an error.</p> <p>Zackery Spytz and me modified <a class="reference external" href="https://github.com/python/cpython/commit/556d97f473fa538cef780f84bd29239ecf57d9c5">PyList_SET_ITEM()</a> and <a class="reference external" href="https://github.com/python/cpython/commit/0ef96c2b2a291c9d2d9c0ba42bbc1900a21e65f3">PyCell_SET()</a> macros in Python 3.10 to return void.</p> <p>This change broke alsa-python: I proposed a <a class="reference external" href="https://github.com/alsa-project/alsa-python/commit/5ea2f8709b4d091700750661231f8a3ddce0fc7c">fix which was merged</a>.</p> <p>One nice side effect of converting macros to static inline functions is that debuggers and profilers are able to retrieve the name of the function.</p> </div> <div class="section" id="converted-macros"> <h3>Converted macros</h3> <ul class="simple"> <li>Py_INCREF(), Py_XINCREF()</li> <li>Py_DECREF(), Py_XDECREF()</li> <li>PyObject_INIT(), PyObject_INIT_VAR()</li> <li>_PyObject_GC_TRACK(), _PyObject_GC_UNTRACK(), _Py_Dealloc()</li> </ul> </div> <div class="section" id="performance"> <h3>Performance</h3> <p>Since <tt class="docutils literal">Py_INCREF()</tt> is criticial for general Python performance, the impact of the change was analyzed in depth before <a class="reference external" href="https://github.com/python/cpython/commit/2aaf0c12041bcaadd7f2cc5a54450eefd7a6ff12">being merged</a> in <a class="reference external" href="https://bugs.python.org/issue35059">bpo-35059</a>. The usage of <tt class="docutils literal"><span class="pre">__attribute__((always_inline))</span></tt> and <tt class="docutils literal">__forceinline</tt> to force inlining was rejected.</p> </div> <div class="section" id="cast-to-pyobject"> <h3>Cast to PyObject*</h3> <p>Old Py_INCREF() implementation in Python 3.7:</p> <pre class="literal-block"> #define Py_INCREF(op) ( \ _Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \ ((PyObject *)(op))-&gt;ob_refcnt++) </pre> <p>where <tt class="docutils literal">_Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA</tt> becomes <tt class="docutils literal"><span class="pre">_Py_RefTotal++,</span></tt> if the <tt class="docutils literal">Py_REF_DEBUG</tt> macro is defined, or nothing otherwise. Current Py_INCREF() implementation in Python 3.10:</p> <pre class="literal-block"> static inline void _Py_INCREF(PyObject *op) { #ifdef Py_REF_DEBUG _Py_RefTotal++; #endif op-&gt;ob_refcnt++; } #define Py_INCREF(op) _Py_INCREF(_PyObject_CAST(op)) </pre> <p>Most static inline functions go through a macro to cast their argument to <tt class="docutils literal">PyObject*</tt> using the macro:</p> <pre class="literal-block"> #define _PyObject_CAST(op) ((PyObject*)(op)) </pre> </div> </div> <div class="section" id="convert-macros-to-regular-functions-in-python-3-9"> <h2>Convert macros to regular functions in Python 3.9</h2> <div class="section" id="converted-macros-1"> <h3>Converted macros</h3> <ul class="simple"> <li>PyIndex_Check()</li> <li>PyObject_CheckBuffer()</li> <li>PyObject_GET_WEAKREFS_LISTPTR()</li> <li>PyObject_IS_GC()</li> <li>PyObject_NEW(): alias to PyObject_New()</li> <li>PyObject_NEW_VAR(): alias to PyObjectVar_New()</li> </ul> </div> <div class="section" id="performance-1"> <h3>Performance</h3> <p>PyType_HasFeature() was modified to always call PyType_GetFlags() function, rather than accessing directly <tt class="docutils literal">PyTypeObject.tp_flags</tt>. The problem is that on macOS, Python is built without LTO, the PyType_GetFlags() call is not inlined, making functions like tuplegetter_descr_get() <strong>slower</strong>: see <a class="reference external" href="https://bugs.python.org/issue39542#msg372962">bpo-39542</a>. I <strong>reverted the PyType_HasFeature() change</strong> until the PEP 620 is accepted. macOS does not use LTO to keep support support for macOS 10.6 (Snow Leopard): see <a class="reference external" href="https://bugs.python.org/issue41181">bpo-41181</a>.</p> </div> <div class="section" id="fast-static-inline-functions"> <h3>Fast static inline functions</h3> <p>To keep best performances on Python built without LTO, fast private variants were added as static inline functions to the internal C API:</p> <ul class="simple"> <li>_PyIndex_Check()</li> <li>_PyObject_IS_GC()</li> <li>_PyType_HasFeature()</li> <li>_PyType_IS_GC()</li> </ul> <p>For example, PyObject_IS_GC() is defined as a function, whereas _PyObject_IS_GC() is defined as an internal static inline function. Header file:</p> <pre class="literal-block"> /* Test if an object implements the garbage collector protocol */ PyAPI_FUNC(int) PyObject_IS_GC(PyObject *obj); // Fast inlined version of PyObject_IS_GC() static inline int _PyObject_IS_GC(PyObject *obj) { return (PyType_IS_GC(Py_TYPE(obj)) &amp;&amp; (Py_TYPE(obj)-&gt;tp_is_gc == NULL || Py_TYPE(obj)-&gt;tp_is_gc(obj))); } </pre> <p>C code:</p> <pre class="literal-block"> int PyObject_IS_GC(PyObject *obj) { return _PyObject_IS_GC(obj); } </pre> </div> </div> <div class="section" id="python-3-10-incompatible-c-api-change"> <h2>Python 3.10 incompatible C API change</h2> <p>The <tt class="docutils literal">Py_REFCNT()</tt> macro was converted to a static inline function: <tt class="docutils literal">Py_REFCNT(obj) = refcnt;</tt> now fails with a compiler error. It must be replaced with <tt class="docutils literal">Py_SET_REFCNT(obj, refcnt)</tt>: Py_SET_REFCNT() was added to Python 3.9.</p> </div> <div class="section" id="the-complex-case-of-py-type-and-py-size-macros"> <h2>The complex case of Py_TYPE() and Py_SIZE() macros</h2> <div class="section" id="macros-converted-and-then-reverted"> <h3>Macros converted and then reverted</h3> <p>The <tt class="docutils literal">Py_TYPE()</tt> and <tt class="docutils literal">Py_SIZE()</tt> macros were also converted to static inline functions in Python 3.10, but the change <a class="reference external" href="https://bugs.python.org/issue39573#msg370303">broke 17 C extensions</a>.</p> <p>Since the change broke too many C extensions, I reverted the change: I <a class="reference external" href="https://github.com/python/cpython/commit/0e2ac21dd4960574e89561243763eabba685296a">converted Py_TYPE() and Py_SIZE() back to macros</a> to have more time to fix fix C extensions.</p> </div> <div class="section" id="i-fixed-6-extensions"> <h3>I fixed 6 extensions</h3> <ul class="simple"> <li>Cython: <a class="reference external" href="https://github.com/cython/cython/commit/d8e93b332fe7d15459433ea74cd29178c03186bd">my fix adding __Pyx_SET_SIZE() and __Pyx_SET_REFCNT()</a></li> <li>immutables: <a class="reference external" href="https://github.com/MagicStack/immutables/commit/45105ecd8b56a4d88dbcb380fcb8ff4b9cc7b19c">my fix adding pythoncapi_compat.h for Py_SET_SIZE()</a></li> <li>breezy: <a class="reference external" href="https://bazaar.launchpad.net/~brz/brz/3.1/revision/7647">my fix adding Py_SET_REFCNT() macro</a></li> <li>bitarray: <a class="reference external" href="https://github.com/ilanschnell/bitarray/commit/a0cca9f2986ec796df74ca8f42aff56c4c7103ba">my fix adding pythoncapi_compat.h</a></li> <li>python-zstandard: <a class="reference external" href="https://github.com/indygreg/python-zstandard/commit/e5a3baf61b65f3075f250f504ddad9f8612bfedf">my fix adding pythoncapi_compat.h</a> followed by <a class="reference external" href="https://github.com/indygreg/python-zstandard/commit/477776e6019478ca1c0b5777b073afbec70975f5">a pythoncapi_compat.h update for Python 2.7</a></li> <li>mercurial: <a class="reference external" href="https://www.mercurial-scm.org/repo/hg/rev/e92ca942ddca">my fix adding pythoncapi_compat.h</a> followed by a <a class="reference external" href="https://www.mercurial-scm.org/repo/hg/rev/38b9a63d3a13">fix for Python 2.7</a> (then <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat/commit/3e0bde93954ea8df328d36900c7060a3f3433eb0">fixed into upstream pythoncapi_compat.h</a>)</li> </ul> </div> <div class="section" id="extensions-fixed-by-others"> <h3>Extensions fixed by others</h3> <ul class="simple"> <li>numpy: <a class="reference external" href="https://github.com/numpy/numpy/commit/a96b18e3d4d11be31a321999cda4b795ea9eccaa">fix defining Py_SET_TYPE() and Py_SET_SIZE()</a>, followed by a <a class="reference external" href="https://github.com/numpy/numpy/commit/f1671076c80bd972421751f2d48186ee9ac808aa">cleanup commit</a></li> <li>pycurl: <a class="reference external" href="https://github.com/pycurl/pycurl/commit/e633f9a1ac4df5e249e78c218d5fbbd848219042">fix defining Py_SET_TYPE()</a></li> <li>boost: <a class="reference external" href="https://github.com/boostorg/python/commit/500194edb7833d0627ce7a2595fec49d0aae2484#diff-b06ac66c98951b48056826c904be75263cdf56ec9b79d3274ea493e7d27cbac4">fix adding Py_SET_TYPE() and Py_SET_SIZE() macros</a></li> <li>duplicity: <a class="reference external" href="https://git.launchpad.net/duplicity/commit/?id=9c63dcb83e922e0afac206188203891e203b4e66">fix 1</a>, <a class="reference external" href="https://git.launchpad.net/duplicity/commit/?id=bbaae91b5ac6ef7e295968e508522884609fbf84">fix 2</a></li> <li>pylibacl: <a class="reference external" href="https://github.com/iustin/pylibacl/commit/26712b8fd92f1146102248cac1c92cb344620eff">fixed</a></li> <li>gobject-introspection: <a class="reference external" href="https://gitlab.gnome.org/GNOME/gobject-introspection/-/commit/c4d7d21a2ad838077c6310532fdf7505321f0ae7">fix adding Py_SET_TYPE() macro</a></li> </ul> </div> <div class="section" id="extensions-still-not-fixed"> <h3>Extensions still not fixed</h3> <ul class="simple"> <li>pyside2:<ul> <li>My patch is not merged upstream yet</li> <li><a class="reference external" href="https://bugreports.qt.io/browse/PYSIDE-1436">https://bugreports.qt.io/browse/PYSIDE-1436</a></li> <li><a class="reference external" href="https://src.fedoraproject.org/rpms/python-pyside2/pull-request/7">https://src.fedoraproject.org/rpms/python-pyside2/pull-request/7</a></li> <li><a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1898974">https://bugzilla.redhat.com/show_bug.cgi?id=1898974</a></li> <li><a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1902618">https://bugzilla.redhat.com/show_bug.cgi?id=1902618</a></li> </ul> </li> <li>pybluez: <a class="reference external" href="https://github.com/pybluez/pybluez/pull/371">closed PR (not merged)</a></li> <li>PyPAM</li> <li>pygobject3</li> <li>rdiff-backup</li> </ul> </div> </div> <div class="section" id="what-s-next"> <h2>What's Next?</h2> <ul class="simple"> <li>Convert again Py_TYPE() and Py_SIZE() macros to static inline functions.</li> <li>Add &quot;%T&quot; formatter for <tt class="docutils literal"><span class="pre">Py_TYPE(obj)-&gt;tp_name</span></tt>: see <a class="reference external" href="https://bugs.python.org/issue34595">rejected bpo-34595</a>.</li> <li>Modify Cython to use getter functions.</li> <li>Attempt to make some structures opaque, like PyThreadState.</li> </ul> </div> Isolate Python Subinterpreters2020-12-27T22:00:00+01:002020-12-27T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-12-27:/isolate-subinterpreters.html<img alt="Christmas gift." src="https://vstinner.github.io/images/christmas-gift.jpg" /> <p>This article is about the work done in Python in 2019 and 2020 to better isolate subinterpreters. Static types are converted to heap types, extension modules are converted to use the new multiphase initialization API (PEP 489), caches, states, singletons and free lists are made per-interpreter, many bugs have been …</p><img alt="Christmas gift." src="https://vstinner.github.io/images/christmas-gift.jpg" /> <p>This article is about the work done in Python in 2019 and 2020 to better isolate subinterpreters. Static types are converted to heap types, extension modules are converted to use the new multiphase initialization API (PEP 489), caches, states, singletons and free lists are made per-interpreter, many bugs have been fixed, etc.</p> <p>Running multiple interpreters in parallel with one &quot;GIL&quot; per interpreter cannot be done yet, but a lot of complex technical challenges have been solved.</p> <div class="section" id="why-isolating-subinterpreters"> <h2>Why isolating subinterpreters?</h2> <p>The final goal is to be able run multiple interpreters in parallel in the same process, like one interpreter per CPU, each interpreter would run in its own thread. The principle is the same than the multiprocessing module and has the same limitations: no Python object can be shared directly between two interpreters. Later, we can imagine helpers to share Python mutable objects using proxies which would prevent race conditions.</p> <p>The work on subinterpreter requires to modify many functions and extension modules. It will benefit to Python in different ways.</p> <p>Converting static types to heap types and convert extension modules to the multiphase initialization API (PEP 489) makes extension modules implemented in C to behave closer to modules implemented in Python, which is good for the <a class="reference external" href="https://www.python.org/dev/peps/pep-0399/">PEP 399 -- Pure Python/C Accelerator Module Compatibility Requirements</a>. So <strong>this work also helps Python implementations other than CPython, like PyPy</strong>.</p> <p>These changes also destroy more Python objects and release more memory at Python exit which matters <strong>when Python is embedded in an application</strong>. Python should be &quot;state less&quot;, especially release all memory at exit. This work slowly fix the <a class="reference external" href="https://bugs.python.org/issue1635741">bpo-163574: Py_Finalize() doesn't clear all Python objects at exit</a>. Python leaks less and less Python objects at exit.</p> </div> <div class="section" id="proof-of-concept-in-may-2020"> <h2>Proof-of-concept in May 2020</h2> <p>In May 2020, I wrote a proof-of-concept to prove the feasability of the project and to prove that it is faster than sequential execution: <a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/S5GZZCEREZLA2PEMTVFBCDM52H4JSENR/#RIK75U3ROEHWZL4VENQSQECB4F4GDELV">PoC: Subinterpreters 4x faster than sequential execution or threads on CPU-bound workaround</a>. Benchmark on 4 CPUs:</p> <ul class="simple"> <li>Sequential: 1.99 sec +- 0.01 sec</li> <li>Threads: 3.15 sec +- 0.97 sec (1.5x <strong>slower</strong>)</li> <li>Multiprocessing: 560 ms +- 12 ms (3.6x <strong>faster</strong>)</li> <li>Subinterpreters: 583 ms +- 7 ms (3.4x <strong>faster</strong>)</li> </ul> <p>The performance of subintepreters is basically the same speed than multiprocessing on this benchmark which is promising.</p> </div> <div class="section" id="experimental-isolated-subintepreters"> <h2>Experimental isolated subintepreters</h2> <p>To write this PoC, I added a <tt class="docutils literal"><span class="pre">--with-experimental-isolated-subinterpreters</span></tt> option to <tt class="docutils literal">./configure</tt> in <a class="reference external" href="https://bugs.python.org/issue40514">bpo-40514</a> which defines the <tt class="docutils literal">EXPERIMENTAL_ISOLATED_SUBINTERPRETERS</tt> macro. Effects of this special build:</p> <ul class="simple"> <li>Make the GIL per-interpreter.</li> <li><tt class="docutils literal">_xxsubinterpreters.run_string()</tt> releases the GIL when running the subinterpreter.</li> <li>Add a thread local storage for the Python thread state (&quot;tstate&quot;).</li> <li>Disable the garbage collector in subinterpreters.</li> <li>Disable the type attribute lookup cache.</li> <li>Disable free lists: frame, list, tuple, type attribute lookup cache.</li> <li>Disable singletons: latin1 characters.</li> <li>Disable interned strings.</li> <li>Disable the fast pymalloc memory allocator (force libc malloc memory allocator).</li> </ul> <p>Features are disabled because their implementation is currently not compatible with multiple interpreters running in parallel.</p> <p>This special build is designed to be temporary. It should ease the development of isolated subinterpreters. It will be removed once subinterpreters will be fully isolated (once each interpreter will have its own GIL).</p> </div> <div class="section" id="convert-static-types-to-heap-types"> <h2>Convert static types to heap types</h2> <p>Types declared in Python (<tt class="docutils literal">class MyType: ...</tt>) are always &quot;heap types&quot;: types dynamically allocated on the heap memory. Historically, all types declared in C were declared as &quot;static types&quot;: defined statically at build time.</p> <p>In C, static types are referenced directly using the using <tt class="docutils literal">&amp;</tt> operator to get their address, they are not copied. For example, the Python <tt class="docutils literal">str</tt> type is referenced as <tt class="docutils literal">&amp;PyUnicode_Type</tt> in C.</p> <p>Types are also regular objects (<tt class="docutils literal">PyTypeObject</tt> inherits from <tt class="docutils literal">PyObject</tt>) and have a reference count, whereas the <tt class="docutils literal">PyObject.ob_refcnt</tt> member is not atomic and so must not be modified in parallel. Problem: all interpreters share the same static types. Static types have other problems:</p> <ul class="simple"> <li>A type <tt class="docutils literal">__mro__</tt> tuple (<tt class="docutils literal">PyTypeObject.tp_mro</tt> member) has the same problem of non-atomic reference count.</li> <li>When a subtype is created, it is stored in the <tt class="docutils literal">PyTypeObject.tp_subclasses</tt> dictionary member (accessible in Python with the <tt class="docutils literal">__subclasses__()</tt> method), whereas Python dictionaries are not thread-safe.</li> <li>Static types behave differently than regular Python types. For example, usually it is not possible to add an arbitrary attribute or override an attribute. It goes against the <a class="reference external" href="https://www.python.org/dev/peps/pep-0399/">PEP 399 -- Pure Python/C Accelerator Module Compatibility Requirements</a> principles.</li> <li>etc.</li> </ul> <p>Right now, <strong>43% (89/206)</strong> of types are declared as heap types on a total of 206 types. For comparison, in Python 3.8, only 9% (15/172) of types were declared as heap types: <strong>74 types</strong> have been converted in the meanwhile.</p> <p>TODO: convert the remaining 117 static types: see <a class="reference external" href="https://bugs.python.org/issue40077">bpo-40077</a>.</p> </div> <div class="section" id="multiphase-initialization-api"> <h2>Multiphase initialization API</h2> <p>Historically, extension modules are declared with the <tt class="docutils literal">PyModule_Create()</tt> function. Usually, such extension can be instanciated exactly once. It is stored in an internal <tt class="docutils literal">PyInterpreterState.modules_by_index</tt> list; an unique index is assigned to the module and stored in <tt class="docutils literal">PyModuleDef.m_base.m_index</tt>. Usually, such extension use static global variables.</p> <p>Such &quot;static&quot; extension has multiple issues:</p> <ul class="simple"> <li>The extension cannot be unloaded: its memory is not released at Python exit. It is an issue when Python is embedded in an application.</li> <li>The extension behaves differently than modules defined in Python. When an extension is reimported, its namespace (<tt class="docutils literal">module.__dict__</tt>) is duplicated, but mutable objects and static global variables are still shared. It goes against the <a class="reference external" href="https://www.python.org/dev/peps/pep-0399/">PEP 399 -- Pure Python/C Accelerator Module Compatibility Requirements</a> principles.</li> <li>etc.</li> </ul> <p>In 2013, <strong>Petr Viktorin</strong>, <strong>Stefan Behnel</strong> and <strong>Nick Coghlan</strong> wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0489/">PEP 489 -- Multi-phase extension module initialization</a> which has been approved and implemented in Python 3.5. For example, the <tt class="docutils literal">_abc</tt> module initialization function is now just a call to the new <tt class="docutils literal">PyModuleDef_Init()</tt> function:</p> <pre class="literal-block"> PyMODINIT_FUNC PyInit__abc(void) { return PyModuleDef_Init(&amp;_abcmodule); } </pre> <p>An extension module can have a module state, if <tt class="docutils literal">PyModuleDef.m_size</tt> is greater than zero. Example:</p> <pre class="literal-block"> typedef struct { PyTypeObject *_abc_data_type; unsigned long long abc_invalidation_counter; } _abcmodule_state; static struct PyModuleDef _abcmodule = { ... .m_size = sizeof(_abcmodule_state), // &lt;=== HERE === }; </pre> <p>The <tt class="docutils literal">PyModule_GetState()</tt> can be used to retrieve the module state. Example:</p> <pre class="literal-block"> static inline _abcmodule_state* get_abc_state(PyObject *module) { void *state = PyModule_GetState(module); assert(state != NULL); return (_abcmodule_state *)state; } static PyObject * _abc__abc_init(PyObject *module, PyObject *self) { _abcmodule_state *state = get_abc_state(module); ... data = abc_data_new(state-&gt;_abc_data_type, NULL, NULL); ... } </pre> <p>Right now, <strong>77% (102/132)</strong> of extension modules use the new multiphase initialization API (PEP 489) on a total of 132 extension modules. For comparison, in Python 3.8, only 23% (27/118) of extensions used the new multiphase initialization API: <strong>75 extensions</strong> have been converted in the meanwhile.</p> <p>TODO: convert the remaining 30 extension modules (<a class="reference external" href="https://bugs.python.org/issue1635741">bpo-163574</a>).</p> </div> <div class="section" id="module-states"> <h2>Module states</h2> <p>Some modules have a state which should be stored in the interpreter to share its state between multiple instances of the module, and also to give access to the state in functions of the public C API (ex: <tt class="docutils literal">PyAST_Check()</tt>).</p> <p>States made per-interpreter:</p> <ul class="simple"> <li>2019-05-10: <strong>warnings</strong> (<a class="reference external" href="https://bugs.python.org/issue36737">bpo-36737</a>, <a class="reference external" href="https://github.com/python/cpython/commit/86ea58149c3e83f402cecd17e6a536865fb06ce1">commit</a> by <strong>Eric Snow</strong>)</li> <li>2019-11-07: <strong>parser</strong> (<a class="reference external" href="https://bugs.python.org/issue36876">bpo-36876</a>, <a class="reference external" href="https://github.com/python/cpython/commit/9def81aa52adc3cc89554156e40742cf17312825">commit</a> by <strong>Vinay Sajip</strong>)</li> <li>2019-11-20: <strong>gc</strong> (<a class="reference external" href="https://bugs.python.org/issue36854">bpo-36854</a>, <a class="reference external" href="https://github.com/python/cpython/commit/7247407c35330f3f6292f1d40606b7ba6afd5700">commit</a> by me)</li> <li>2020-11-02: <strong>ast</strong> (<a class="reference external" href="https://bugs.python.org/issue41796">bpo-41796</a>, <a class="reference external" href="https://github.com/python/cpython/commit/5cf4782a2630629d0978bf4cf6b6340365f449b2">commit</a> by me)</li> <li>2020-12-15: <strong>atexit</strong> (<a class="reference external" href="https://bugs.python.org/issue42639">bpo-42639</a>, <a class="reference external" href="https://github.com/python/cpython/commit/b8fa135908d294b350cdad04e2f512327a538dee">commit</a> by me)</li> </ul> </div> <div class="section" id="singletons"> <h2>Singletons</h2> <p>Singletons must not be shared between interpreters.</p> <p>Singletons made per-interpreter.</p> <p><a class="reference external" href="https://bugs.python.org/issue38858">bpo-38858</a>:</p> <ul class="simple"> <li>2019-12-17: small <strong>integer</strong>, the [-5; 256] range (<a class="reference external" href="https://github.com/python/cpython/commit/630c8df5cf126594f8c1c4579c1888ca80a29d59">commit</a> by me)</li> </ul> <p><a class="reference external" href="https://bugs.python.org/issue40521">bpo-40521</a>:</p> <ul class="simple"> <li>2020-06-04: empty <strong>tuple</strong> singleton (<a class="reference external" href="https://github.com/python/cpython/commit/69ac6e58fd98de339c013fe64cd1cf763e4f9bca">commit</a> by me)</li> <li>2020-06-23: empty <strong>bytes</strong> string singleton and single byte character (<tt class="docutils literal"><span class="pre">b'\x00'</span></tt> to <tt class="docutils literal"><span class="pre">b'\xFF'</span></tt>) singletons (<a class="reference external" href="https://github.com/python/cpython/commit/c41eed1a874e2f22bde45c3c89418414b7a37f46">commit</a> by me)</li> <li>2020-06-23: empty <strong>Unicode</strong> string singleton (<a class="reference external" href="https://github.com/python/cpython/commit/f363d0a6e9cfa50677a6de203735fbc0d06c2f49">commit</a> by me)</li> <li>2020-06-23: empty <strong>frozenset</strong> singleton (<a class="reference external" href="https://github.com/python/cpython/commit/261cfedf7657a515e04428bba58eba2a9bb88208">commit</a> by me); later removed.</li> <li>2020-06-24: single <strong>Unicode</strong> character (U+0000-U+00FF range) (<a class="reference external" href="https://github.com/python/cpython/commit/2f9ada96e0d420fed0d09a032b37197f08ef167a">commit</a> by me)</li> </ul> <p>I also micro-optimized the code: most singletons are now always created at startup, it's no longer needed to check if it is created at each function call. Moreover, an assertion now ensures that singletons are no longer used after they are deleted.</p> </div> <div class="section" id="free-lists"> <h2>Free lists</h2> <p>A free list is a micro-optimization on memory allocations. The memory of recently destroyed objects is not freed to be able to reuse it for new objects. Free lists must not be shared between interpreters.</p> <p>Free lists made per-interpreter (<a class="reference external" href="https://bugs.python.org/issue40521">bpo-40521</a>):</p> <ul class="simple"> <li>2020-06-04: <strong>slice</strong> (<a class="reference external" href="https://github.com/python/cpython/commit/7daba6f221e713f7f60c613b246459b07d179f91">commit</a> by me)</li> <li>2020-06-04: <strong>tuple</strong> (<a class="reference external" href="https://github.com/python/cpython/commit/69ac6e58fd98de339c013fe64cd1cf763e4f9bca">commit</a> by me)</li> <li>2020-06-04: <strong>float</strong> (<a class="reference external" href="https://github.com/python/cpython/commit/2ba59370c3dda2ac229c14510e53a05074b133d1">commit</a> by me)</li> <li>2020-06-04: <strong>frame</strong> (<a class="reference external" href="https://github.com/python/cpython/commit/3744ed2c9c0b3905947602fc375de49533790cb9">commit</a> by me)</li> <li>2020-06-05: <strong>async generator</strong> (<a class="reference external" href="https://github.com/python/cpython/commit/78a02c2568714562e23e885b6dc5730601f35226">commit</a> by me)</li> <li>2020-06-05: <strong>context</strong> (<a class="reference external" href="https://github.com/python/cpython/commit/e005ead49b1ee2b1507ceea94e6f89c28ecf1f81">commit</a> by me)</li> <li>2020-06-05: <strong>list</strong> (<a class="reference external" href="https://github.com/python/cpython/commit/88ec9190105c9b03f49aaef601ce02b242a75273">commit</a> by me)</li> <li>2020-06-23: <strong>dict</strong> (<a class="reference external" href="https://github.com/python/cpython/commit/b4e85cadfbc2b1b24ec5f3159e351dbacedaa5e0">commit</a> by me)</li> <li>2020-06-23: <strong>MemoryError</strong> (<a class="reference external" href="https://github.com/python/cpython/commit/281cce1106568ef9fec17e3c72d289416fac02a5">commit</a> by me)</li> </ul> </div> <div class="section" id="caches"> <h2>Caches</h2> <p>Caches made per interpreter:</p> <ul class="simple"> <li>2020-06-04: <strong>slice</strong> cache (<a class="reference external" href="https://bugs.python.org/issue40521">bpo-40521</a>, <a class="reference external" href="https://github.com/python/cpython/commit/7daba6f221e713f7f60c613b246459b07d179f91">commit</a> by me)</li> <li>2020-12-26: <strong>type</strong> attribute lookup cache (<a class="reference external" href="https://bugs.python.org/issue42745">bpo-42745</a>, <a class="reference external" href="https://github.com/python/cpython/commit/41010184880151d6ae02a226dbacc796e5c90d11">commit</a> by me)</li> </ul> </div> <div class="section" id="interned-strings-and-identifiers"> <h2>Interned strings and identifiers</h2> <ul class="simple"> <li>2020-12-25: Per-interpreter identifiers: <tt class="docutils literal">_PyUnicode_FromId()</tt> (<a class="reference external" href="https://bugs.python.org/issue39465">bpo-39465</a>, <a class="reference external" href="https://github.com/python/cpython/commit/ba3d67c2fb04a7842741b1b6da5d67f22c579f33">commit</a> by me)</li> <li>2020-12-26: Per-interpreter interned strings: <tt class="docutils literal">PyUnicode_InternInPlace()</tt> (<a class="reference external" href="https://bugs.python.org/issue40521">bpo-40521</a>, <a class="reference external" href="https://github.com/python/cpython/commit/ea251806b8dffff11b30d2182af1e589caf88acf">commit</a> by me)</li> </ul> <p>For <tt class="docutils literal">_PyUnicode_FromId()</tt>, I added the <tt class="docutils literal">pycore_atomic_funcs.h</tt> header file (<a class="reference external" href="https://github.com/python/cpython/commit/52a327c1cbb86c7f2f5c460645889b23615261bf">commit</a>) which adds functions for atomic memory accesses (to variables of type <tt class="docutils literal">Py_ssize_t</tt>). It uses <tt class="docutils literal">__atomic_load_n()</tt> and <tt class="docutils literal">__atomic_store_n()</tt> on GCC and clang, or <tt class="docutils literal">_InterlockedCompareExchange64()</tt> and <tt class="docutils literal">_InterlockedExchange64()</tt> on MSC (Windows).</p> <p>First, I tried to use the <tt class="docutils literal">_Py_hashtable</tt> type: <a class="reference external" href="https://github.com/python/cpython/pull/20048">PR 20048</a>. Using <tt class="docutils literal">_Py_hashtable</tt>, <tt class="docutils literal">_PyUnicode_FromId()</tt> took 15.5 ns +- 0.1 ns. I optimized <tt class="docutils literal">_Py_hashtable</tt>: <tt class="docutils literal">_PyUnicode_FromId()</tt> took 6.65 ns +- 0.09 ns. But it was still slower than the reference code: 2.38 ns +- 0.00 ns.</p> <p>The merged implementation uses an array. An unique index is assigned, index in this array. The array is made larger on demand. The final change adds 1 ns per function call:</p> <pre class="literal-block"> [ref] 2.42 ns +- 0.00 ns -&gt; [atomic] 3.39 ns +- 0.00 ns: 1.40x slower </pre> </div> <div class="section" id="misc"> <h2>Misc</h2> <ul class="simple"> <li>2020-03-19: Per-interpreter pending calls (<a class="reference external" href="https://bugs.python.org/issue39984">bpo-39984</a>, <a class="reference external" href="https://github.com/python/cpython/commit/50e6e991781db761c496561a995541ca8d83ff87">commit</a> by me).</li> </ul> </div> <div class="section" id="bugfixes"> <h2>Bugfixes</h2> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/gil-bugfixes-daemon-threads-python39.html">GIL bugfixes for daemon threads in Python 3.9</a></li> <li>Fix many <a class="reference external" href="https://vstinner.github.io/subinterpreter-leaks.html">leaks discovered by subinterpreters</a></li> <li>Fix pickling heap types implemented in C with protocols 0 and 1 (<a class="reference external" href="https://bugs.python.org/issue41052">bpo-41052</a>)</li> </ul> </div> <div class="section" id="pep-630-isolating-extension-modules"> <h2>PEP 630: Isolating Extension Modules</h2> <p>In August 2020, <strong>Petr Viktorin</strong> wrote <a class="reference external" href="https://www.python.org/dev/peps/pep-0630/">PEP 630 -- Isolating Extension Modules</a> which gives practical advices on how to update an extension module to make it stateless using previous PEPs (heap types, multi-phase init, etc.). Once a module is stateless, it becomes safe to use it subinterpreters running in parallel.</p> </div> <div class="section" id="thanks"> <h2>Thanks</h2> <p>The work on subintepreters, multiphase init and heap types is a collaborative work on-going for 2 years. I would like to thank the following developers for helping on this large task:</p> <ul class="simple"> <li><strong>Christian Heimes</strong></li> <li><strong>Dong-hee Na</strong></li> <li><strong>Eric Snow</strong></li> <li><strong>Erlend Egeberg Aasland</strong></li> <li><strong>Hai Shi</strong></li> <li><strong>Mohamed Koubaa</strong></li> <li><strong>Nick Coghlan</strong></li> <li><strong>Paulo Henrique Silva</strong></li> <li><strong>Petr Viktorin</strong></li> <li><strong>Vinay Sajip</strong></li> </ul> <p>Note: Since the work is scattered in many issues and pull requests, it's hard to track who helped: sorry if I forgot someone! (Please contact me and I will complete the list.)</p> </div> <div class="section" id="what-s-next"> <h2>What's Next?</h2> <p>There are still multiple interesting technical challenges:</p> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue39511">bpo-39511: Per-interpreter singletons (None, True, False, etc.)</a></li> <li><a class="reference external" href="https://bugs.python.org/issue40601">bpo-40601: Hide static types from the C API</a></li> <li>Make pymalloc allocator compatible with subinterpreters.</li> <li>Make the GIL per interpreter. Maybe even give the choice to share or not the GIL when a subinterpreter is created.</li> <li>Make the <tt class="docutils literal">_PyArg_Parser</tt> (<tt class="docutils literal">parser_init()</tt>) function compatible with subinterpreters. Maybe use a per-interpreter array, similar solution than <tt class="docutils literal">_PyUnicode_FromId()</tt>.</li> <li><a class="reference external" href="https://bugs.python.org/issue15751">bpo-15751: Make the PyGILState API compatible with subinterpreters</a> (issue created in 2012!)</li> <li><a class="reference external" href="https://bugs.python.org/issue40522">bpo-40522: Get the current Python interpreter state from Thread Local Storage (autoTSSkey)</a></li> </ul> <p>Also, there are still many static types to convert to heap types (<a class="reference external" href="https://bugs.python.org/issue40077">bpo-40077</a>) and many extension modules to convert to the multiphase initialization API (<a class="reference external" href="https://bugs.python.org/issue1635741">bpo-163574</a>).</p> <p>I'm tracking the work in my <a class="reference external" href="https://pythondev.readthedocs.io/subinterpreters.html">Python Subinterpreters</a> page and in the <a class="reference external" href="https://bugs.python.org/issue40512">bpo-40512: Meta issue: per-interpreter GIL</a>.</p> </div> Hide implementation details from the Python C API2020-12-25T22:00:00+01:002020-12-25T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-12-25:/hide-implementation-details-python-c-api.html<img alt="My cat attacking the Python C API" src="https://vstinner.github.io/images/pepsie.jpg" /> <p>This article is the history of Python C API discussions over the last 4 years, and the creation of C API projects: <a class="reference external" href="https://pythoncapi.readthedocs.io/">pythoncapi website</a>, <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat.h header file</a> and <a class="reference external" href="https://hpy.readthedocs.io/">HPy (new clean C API)</a>. More and more people are aware of issues caused by the C API and are working …</p><img alt="My cat attacking the Python C API" src="https://vstinner.github.io/images/pepsie.jpg" /> <p>This article is the history of Python C API discussions over the last 4 years, and the creation of C API projects: <a class="reference external" href="https://pythoncapi.readthedocs.io/">pythoncapi website</a>, <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat.h header file</a> and <a class="reference external" href="https://hpy.readthedocs.io/">HPy (new clean C API)</a>. More and more people are aware of issues caused by the C API and are working on solutions.</p> <p>It took me a lot of iterations to find the right approach to evolve the C API without breaking too many third-party extension modules. My first ideas were based on two APIs with an opt-in option somehow. At the end, I decided to fix directly the default API, and helped maintainers of extension modules to update their projects for incompatible C API changes.</p> <p>I wrote a <tt class="docutils literal">pythoncapi_compat.h</tt> header file which adds C API functions of newer Python to old Python versions up to Python 2.7. I also wrote a <tt class="docutils literal">upgrade_pythoncapi.py</tt> script to add Python 3.10 support to an extension module without losing Python 2.7 support: the tool adds <tt class="docutils literal">#include &quot;pythoncapi_compat.h&quot;</tt>. For example, it replaces <tt class="docutils literal">Py_TYPE(obj) = type</tt> with <tt class="docutils literal">Py_SET_SIZE(obj, type)</tt>.</p> <p>The photo: my cat attacking the Python C API.</p> <div class="section" id="year-2016"> <h2>Year 2016</h2> <p>Between 2016 and 2017, Larry Hastings worked on removing the GIL in a CPython fork called &quot;The Gilectomy&quot;. He pushed the first commit in April 2016: <a class="reference external" href="https://github.com/larryhastings/gilectomy/commit/4a1a4ff49e34b9705608cad968f467af161dcf02">Removed the GIL. Don't merge this!</a> (&quot;Few programs work now&quot;). At EuroPython 2016, he gave the talk <a class="reference external" href="https://www.youtube.com/watch?v=fgWUwQVoLHo">Larry Hastings - The Gilectomy</a> where he explains that the current parallelism bottleneck is the CPython reference counting which doesn't scale with the number of threads.</p> <p>It was just another hint telling me that &quot;something&quot; should be done to make the C API more abstract, move away from implementation details like reference counting. PyPy also has performance issues with the C API for many years.</p> </div> <div class="section" id="year-2017"> <h2>Year 2017</h2> <div class="section" id="may"> <h3>May</h3> <p>In 2017, I discussed with Eric Snow who was working on subinterpreters. He had to modify public structures, especially the <tt class="docutils literal">PyInterpreterState</tt> structure. He created <tt class="docutils literal">Include/internal/</tt> subdirectory to create a new &quot;internal C API&quot; which should not be exported. (Later, he moved the <tt class="docutils literal">PyInterpreterState</tt> structure to the internal C API in Python 3.8.)</p> <p>I started the discuss C API changes during the Python Language Summit (PyCon US 2017): <a class="reference external" href="https://github.com/vstinner/conf/raw/master/2017-PyconUS/summit.pdf">&quot;Python performance&quot; slides (PDF)</a>:</p> <ul class="simple"> <li>Split Include in sub-directories</li> <li>Move towards a stable ABI by default</li> </ul> <p>See also the LWN article: <a class="reference external" href="https://lwn.net/Articles/723752/#723949">Keeping Python competitive</a> by Jake Edge.</p> </div> <div class="section" id="july-first-pep-draft"> <h3>July: first PEP draft</h3> <p>I proposed the first PEP draft to python-ideas: <a class="reference external" href="https://mail.python.org/archives/list/python-ideas&#64;python.org/thread/6XATDGWK4VBUQPRHCRLKQECTJIPBVNJQ/">PEP: Hide implementation details in the C API</a>.</p> <p>The idea is to add an opt-in option to distutils to build an extension module with a new C API, remove implementation details from the new C API, and maybe later switch to the new C API by default.</p> </div> <div class="section" id="september"> <h3>September</h3> <p>I discussed my C API change ideas at the CPython core dev sprint (at Instagram, California). The ideas were liked by most (if not all) core developers who are fine with a minor performance slowdown (caused by replacing macros with function calls). I wrote <a class="reference external" href="https://vstinner.github.io/new-python-c-api.html">A New C API for CPython</a> blog post about these discussions.</p> </div> <div class="section" id="november"> <h3>November</h3> <p>I proposed <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-November/150607.html">Make the stable API-ABI usable</a> on the python-dev list. The idea is to add <tt class="docutils literal">PyTuple_GET_ITEM()</tt> (for example) to the limited C API but declared as a function call. Later, if enough extension modules are compatible with the extended limited C API, make it the default.</p> </div> </div> <div class="section" id="year-2018"> <h2>Year 2018</h2> <p>In July, I created the <a class="reference external" href="https://pythoncapi.readthedocs.io/">pythoncapi website</a> to collect issues of the current C API, list things to avoid in new functions like borrowed references, and start to design a new better C API.</p> <p>In September, Antonio Cuni wrote <a class="reference external" href="https://morepypy.blogspot.com/2018/09/inside-cpyext-why-emulating-cpython-c.html">Inside cpyext: Why emulating CPython C API is so Hard</a> article.</p> </div> <div class="section" id="year-2019"> <h2>Year 2019</h2> <p>In February, I sent <a class="reference external" href="https://mail.python.org/archives/list/capi-sig&#64;python.org/thread/WS6ATJWRUQZESGGYP3CCSVPF7OMPMNM6/">Update on CPython header files reorganization</a> to the capi-sig list.</p> <ul class="simple"> <li><tt class="docutils literal">Include/</tt>: limited C API</li> <li><tt class="docutils literal">Include/cpython/</tt>: CPython C API</li> <li><tt class="docutils literal">Include/internal/</tt>: CPython internal C API</li> </ul> <p>In March, I modified the Python debug build to make its ABI compatible with the release build ABI: <a class="reference external" href="https://docs.python.org/dev/whatsnew/3.8.html#debug-build-uses-the-same-abi-as-release-build">What’s New In Python 3.8: Debug build uses the same ABI as release build</a>.</p> <p>In May, I gave a lightning talk <a class="reference external" href="https://github.com/vstinner/conf/blob/master/2019-Pycon/status_stable_api_abi.pdf">Status of the stable API and ABI in Python 3.8</a>, at the Language Summit (during Pycon US 2019):</p> <ul class="simple"> <li>Convert macros to static inline functions</li> <li>Install the internal C API</li> <li>Debug build now ABI compatible with the release build ABI</li> <li>Getting rid of global variables</li> </ul> <p>By the way, see my <a class="reference external" href="https://vstinner.github.io/split-include-directory-python38.html">Split Include/ directory in Python 3.8</a> article: I converted many macros in Python 3.8.</p> <p>In July, the <a class="reference external" href="https://hpy.readthedocs.io/">HPy project</a> was created during EuroPython at Basel. There was an informal meeting which included core developers of PyPy (Antonio, Armin and Ronan), CPython (Victor Stinner and Mark Shannon) and Cython (Stefan Behnel).</p> <p>In December, Antonio, Armin and Ronan had a small internal sprint to kick-off the development of HPy: <a class="reference external" href="https://morepypy.blogspot.com/2019/12/hpy-kick-off-sprint-report.html">HPy kick-off sprint report</a></p> </div> <div class="section" id="year-2020"> <h2>Year 2020</h2> <div class="section" id="april"> <h3>April</h3> <p>I proposed <a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/#TKHNENOXP6H34E73XGFOL2KKXSM4Z6T2">PEP: Modify the C API to hide implementation details</a> on the python-dev list. The main idea is to provide a new optimized Python runtime which is backward incompatible on purpose, and continue to ship the regular runtime which is fully backward compatible.</p> </div> <div class="section" id="june"> <h3>June</h3> <p>I wrote <a class="reference external" href="https://www.python.org/dev/peps/pep-0620/">PEP 620 -- Hide implementation details from the C API</a> and <a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/">proposed the PEP to python-dev</a>. This PEP is my 3rd attempt to fix the C API: I rewrote it from scratch. Python now distributes a new <tt class="docutils literal">pythoncapi_compat.h</tt> header and a process is defined to reduce the number of broken C extensions when introducing C API incompatible changes listed in this PEP.</p> <p>I created the <a class="reference external" href="https://github.com/pythoncapi/pythoncapi_compat">pythoncapi_compat project</a>: header file providing new C API functions to old Python versions using static inline functions.</p> </div> <div class="section" id="december"> <h3>December</h3> <p>I wrote a new <tt class="docutils literal">upgrade_pythoncapi.py</tt> script to add Python 3.10 support to an extension module without losing support with Python 2.7. I sent <a class="reference external" href="https://mail.python.org/archives/list/capi-sig&#64;python.org/thread/LFLXFMKMZ77UCDUFD5EQCONSAFFWJWOZ/">New script: add Python 3.10 support to your C extensions without losing Python 3.6 support</a> to the capi-sig list.</p> <p>The pythoncapi_compat project got its first users (bitarray, immutables, python-zstandard)! It proves that the project is useful and needed.</p> <p>I collaborated with the HPy project to create a manifesto explaining how the C API prevents to optimize CPython and makes the CPython C API inefficient on PyPy. It is still a draft.</p> </div> </div> Leaks discovered by subinterpreters2020-12-23T14:00:00+01:002020-12-23T14:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-12-23:/subinterpreter-leaks.html<p>This article is about old reference leaks discovered or caused by the work on isolating subinterpreters: leaks in 6 different modules (gc, _weakref, _abc, _signal, _ast and _thread).</p> <img alt="_thread GC bug" src="https://vstinner.github.io/images/thread_gc_bug.jpg" /> <div class="section" id="refleaks-buildbot-failures"> <h2>Refleaks buildbot failures</h2> <p>With my work on isolating subinterpreters, old bugs about Python objects leaked at Python exit are suddenly becoming blocker …</p></div><p>This article is about old reference leaks discovered or caused by the work on isolating subinterpreters: leaks in 6 different modules (gc, _weakref, _abc, _signal, _ast and _thread).</p> <img alt="_thread GC bug" src="https://vstinner.github.io/images/thread_gc_bug.jpg" /> <div class="section" id="refleaks-buildbot-failures"> <h2>Refleaks buildbot failures</h2> <p>With my work on isolating subinterpreters, old bugs about Python objects leaked at Python exit are suddenly becoming blocker issues on buildbots.</p> <p>When subinterpreters still share Python objects with the main interpreter, it is ok-ish to leak these objects at Python exit. Right now (current master branch), there are still more than 18 000 Python objects which are not destroyed at Python exit:</p> <pre class="literal-block"> $ ./python -X showrefcount -c pass [18411 refs, 6097 blocks] </pre> <p>This issue is being solved in the <a class="reference external" href="https://bugs.python.org/issue1635741">bpo-1635741: Py_Finalize() doesn't clear all Python objects at exit</a> which was opened almost 14 years ago (2007).</p> <p>When subinterpreters are better isolated, objects are no longer shared, and suddenly these leaks make subinterpreters tests failing on Refleak buildbots. For example, when an extension module is converted to the multiphase initialization API (PEP 489) or when static types are converted to heap types, these issues pop up.</p> <p>It is a blocker issue for me, since I care of having only &quot;green&quot; buildbots (no test failure), otherwise more serious regressions can be easily missed.</p> </div> <div class="section" id="per-interpreter-gc-state"> <h2>Per-interpreter GC state</h2> <p>In November 2019, I made the state of the GC module per-interpreter in <a class="reference external" href="https://bugs.python.org/issue36854">bpo-36854</a> (<a class="reference external" href="https://github.com/python/cpython/commit/7247407c35330f3f6292f1d40606b7ba6afd5700">commit</a>) and test_atexit started to leak:</p> <pre class="literal-block"> $ ./python -m test -R 3:3 test_atexit -m test.test_atexit.SubinterpreterTest.test_callbacks_leak test_atexit leaked [3988, 3986, 3988] references, sum=11962 </pre> <p>I fixed the usage of the <tt class="docutils literal">PyModule_AddObject()</tt> function in the <tt class="docutils literal">_testcapi</tt> module (<a class="reference external" href="https://github.com/python/cpython/commit/310e2d25170a88ef03f6fd31efcc899fe062da2c">commit</a>).</p> <p>I also pushed a <strong>workaround</strong> in <tt class="docutils literal">finalize_interp_clear()</tt>:</p> <pre class="literal-block"> + /* bpo-36854: Explicitly clear the codec registry + and trigger a GC collection */ + PyInterpreterState *interp = tstate-&gt;interp; + Py_CLEAR(interp-&gt;codec_search_path); + Py_CLEAR(interp-&gt;codec_search_cache); + Py_CLEAR(interp-&gt;codec_error_registry); + _PyGC_CollectNoFail(); </pre> <p>I dislike having to push a &quot;temporary&quot; workaround, but the Python finalization is really complex and fragile. Fixing the root issues would require too much work, whereas I wanted to repair the Refleak buildbots as soon as possible.</p> <p>In December 2019, the workaround was partially removed (<a class="reference external" href="https://github.com/python/cpython/commit/ac0e1c2694bc199dbd073312145e3c09bee52cc4">commit</a>):</p> <pre class="literal-block"> - Py_CLEAR(interp-&gt;codec_search_path); - Py_CLEAR(interp-&gt;codec_search_cache); - Py_CLEAR(interp-&gt;codec_error_registry); </pre> <p>The year after (December 2020), the last GC collection was moved into <tt class="docutils literal">PyInterpreterState_Clear()</tt>, before finalizating the GC (<a class="reference external" href="https://github.com/python/cpython/commit/eba5bf2f5672bf4861c626937597b85ac0c242b9">commit</a>).</p> </div> <div class="section" id="port-weakref-to-multiphase-init"> <h2>Port _weakref to multiphase init</h2> <p>In March 2020, the <tt class="docutils literal">_weakref</tt> module was ported to the multiphase initialization API (PEP 489) in <a class="reference external" href="https://bugs.python.org/issue40050">bpo-40050</a> and test_importlib started to leak:</p> <pre class="literal-block"> $ ./python -m test -R 3:3 test_importlib test_importlib leaked [6303, 6299, 6303] references, sum=18905 </pre> <p>The analysis was quite long and complicated. The importlib imported some extension modules twice and it has to inject frozen modules to &quot;bootstrap&quot; the code.</p> <p>At the end, I fixed the issue by removing the now unused <tt class="docutils literal">_weakref</tt> import in <tt class="docutils literal">importlib._bootstrap_external</tt> (<a class="reference external" href="https://github.com/python/cpython/commit/83d46e0622d2efdf5f3bf8bf8904d0dcb55fc322">commit</a>). The fix also avoids importing an extension module twice.</p> </div> <div class="section" id="convert-abc-static-types-to-heap-types"> <h2>Convert _abc static types to heap types</h2> <p>In April 2020, the static types of the <tt class="docutils literal">_abc</tt> extension module were converted to heap types in <a class="reference external" href="https://bugs.python.org/issue40077">bpo-40077</a> (<a class="reference external" href="https://github.com/python/cpython/commit/53e4c91725083975598350877e2ed8e2d0194114">commit</a>) and test_threading started to leak:</p> <pre class="literal-block"> $ ./python -m test -R 3:3 test_threading test_threading leaked [19, 19, 19] references, sum=57 </pre> <p>I created <a class="reference external" href="https://bugs.python.org/issue40149">bpo-40149</a> to track the leak.</p> <div class="section" id="objects-hold-a-reference-to-heap-types"> <h3>Objects hold a reference to heap types</h3> <p>In March 2019, the <tt class="docutils literal">PyObject_Init()</tt> function was modified in <a class="reference external" href="https://bugs.python.org/issue35810">bpo-35810</a> to keep a strong reference (<tt class="docutils literal">INCREF</tt>) to the type if the type is a heap type (<a class="reference external" href="https://github.com/python/cpython/commit/364f0b0f19cc3f0d5e63f571ec9163cf41c62958">commit</a>):</p> <pre class="literal-block"> + if (PyType_GetFlags(tp) &amp; Py_TPFLAGS_HEAPTYPE) { + Py_INCREF(tp); + } </pre> <p>I opened <a class="reference external" href="https://bugs.python.org/issue40217">bpo-40217: The garbage collector doesn't take in account that objects of heap allocated types hold a strong reference to their type</a> to discuss the regression (the test_threading leak).</p> </div> <div class="section" id="first-workaround-not-merged-force-a-second-garbage-collection"> <h3>First workaround (not merged): force a second garbage collection</h3> <p>While analysing test_threading regression leak, I identified a first workaround: add a second <tt class="docutils literal">_PyGC_CollectNoFail()</tt> call in <tt class="docutils literal">finalize_interp_clear()</tt>.</p> <p>It was only a workaround which helped to understand the issue, it was not merged.</p> </div> <div class="section" id="first-fix-merged-abc-data-traverse"> <h3>First fix (merged): abc_data_traverse()</h3> <p>I merged a first fix: add a traverse function to the <tt class="docutils literal">_abc._abc_data</tt> type (<a class="reference external" href="https://github.com/python/cpython/commit/9cc3ebd7e04cb645ac7b2f372eaafa7464e16b9c">commit</a>):</p> <pre class="literal-block"> +static int +abc_data_traverse(_abc_data *self, visitproc visit, void *arg) +{ + Py_VISIT(self-&gt;_abc_registry); + Py_VISIT(self-&gt;_abc_cache); + Py_VISIT(self-&gt;_abc_negative_cache); + return 0; +} </pre> </div> <div class="section" id="second-workaround-not-merged-visit-the-type-in-abc-data-traverse"> <h3>Second workaround (not merged): visit the type in abc_data_traverse()</h3> <p>A second workaround was identified: add <tt class="docutils literal"><span class="pre">Py_VISIT(Py_TYPE(self));</span></tt> to the new <tt class="docutils literal">abc_data_traverse()</tt> function.</p> <p>Again, it was only a workaround which helped to understand the issue, but it was not merged.</p> </div> <div class="section" id="second-fix-merged-call-py-visit-py-type-self-automatically"> <h3>Second fix (merged): call Py_VISIT(Py_TYPE(self)) automatically</h3> <p>20 days after I opened <a class="reference external" href="https://bugs.python.org/issue40217">bpo-40217</a>, <strong>Pablo Galindo</strong> modified <tt class="docutils literal">PyType_FromSpec()</tt> to add a wrapper around the traverse function of heap types to ensure that <tt class="docutils literal">Py_VISIT(Py_TYPE(self))</tt> is always called (<a class="reference external" href="https://github.com/python/cpython/commit/0169d3003be3d072751dd14a5c84748ab63a249f">commit</a>).</p> </div> <div class="section" id="last-fix-merged-fix-every-traverse-function"> <h3>Last fix (merged): fix every traverse function</h3> <p>In May 2020, <strong>Pablo Galindo</strong> changed his mind. He reverted his <tt class="docutils literal">PyType_FromSpec()</tt> change and instead fixed traverse function of heap types (<a class="reference external" href="https://github.com/python/cpython/commit/1cf15af9a6f28750f37b08c028ada31d38e818dd">commit</a>).</p> <p>At the end, <tt class="docutils literal">abc_data_traverse()</tt> calls <tt class="docutils literal">Py_VISIT(Py_TYPE(self))</tt>. The second &quot;workaround&quot; was the correct fix!</p> </div> </div> <div class="section" id="convert-signal-to-multiphase-init"> <h2>Convert _signal to multiphase init</h2> <p>In September 2020, <strong>Mohamed Koubaa</strong> ported the <tt class="docutils literal">_signal</tt> module to the multiphase initialization API (PEP 489) in <a class="reference external" href="https://bugs.python.org/issue1635741">bpo-1635741</a> (<a class="reference external" href="https://github.com/python/cpython/commit/71d1bd9569c8a497e279f2fea6fe47cd70a87ea3">commit 71d1bd95</a>) and test_interpreters started to leak:</p> <pre class="literal-block"> $ ./python -m test -R 3:3 test_interpreters test_interpreters leaked [237, 237, 237] references, sum=711 </pre> <p>I created <a class="reference external" href="https://bugs.python.org/issue41713">bpo-41713</a> to track the regression. Since I failed to find a simple fix, I started by reverting the change which caused Refleak buildbots to fail (<a class="reference external" href="https://github.com/python/cpython/commit/4b8032e5a4994a7902076efa72fca1e2c85d8b7f">commit</a>).</p> <p>I had to refactor the <tt class="docutils literal">_signal</tt> extension module code with multiple commits to fix all bugs.</p> <p>The first fix was to remove the <tt class="docutils literal">IntHandler</tt> variable: there was no need to keep it alive, it was only needed once in <tt class="docutils literal">signal_module_exec()</tt>.</p> <p>The second fix is to close the Windows event at exit:</p> <pre class="literal-block"> + #ifdef MS_WINDOWS + if (sigint_event != NULL) { + CloseHandle(sigint_event); + sigint_event = NULL; + } + #endif </pre> <p>The last fix, the most important, is to clear the strong reference to old Python signal handlers when <tt class="docutils literal">signal_module_exec()</tt> is called more than once:</p> <pre class="literal-block"> // If signal_module_exec() is called more than one, we must // clear the strong reference to the previous function. Py_XSETREF(Handlers[signum].func, Py_NewRef(func)); </pre> <p>The <tt class="docutils literal">_signal</tt> module is not well isolated for subinterpreters yet, but at least it no longer leaks.</p> </div> <div class="section" id="per-interpreter-ast-state"> <h2>Per-interpreter _ast state</h2> <p>In September 2019, the <tt class="docutils literal">_ast</tt> extension module was converted to PEP 384 (stable ABI) in <a class="reference external" href="https://bugs.python.org/issue38113">bpo-38113</a> (<a class="reference external" href="https://github.com/python/cpython/commit/ac46eb4ad6662cf6d771b20d8963658b2186c48c">commit</a>): the AST state moves into a module state.</p> <p>This change caused 3 different bugs including crashes (<a class="reference external" href="https://bugs.python.org/issue41194">bpo-41194</a>, <a class="reference external" href="https://bugs.python.org/issue41261">bpo-41261</a>, <a class="reference external" href="https://bugs.python.org/issue41631">bpo-41631</a>). The issue is complex since there are public C APIs which require to access AST types, whereas it became possible to have multiple <tt class="docutils literal">_ast</tt> extension module instances.</p> <p>In July 2020, I fixed the root issue in <a class="reference external" href="https://bugs.python.org/issue41194">bpo-41194</a> by replacing the module state with a global state (<a class="reference external" href="https://github.com/python/cpython/commit/91e1bc18bd467a13bceb62e16fbc435b33381c82">commit</a>):</p> <pre class="literal-block"> static astmodulestate global_ast_state; </pre> <p>A global state is bad for subinterpreters. In November 2020, I made the AST state per-interpreter in <a class="reference external" href="https://bugs.python.org/issue41796">bpo-41796</a> (<a class="reference external" href="https://github.com/python/cpython/commit/5cf4782a2630629d0978bf4cf6b6340365f449b2">commit</a> and test_ast started to leak:</p> <pre class="literal-block"> $ ./python -m test -R 3:3 test_ast test_ast leaked [23640, 23636, 23640] references, sum=70916 </pre> <p>The fix is to call <tt class="docutils literal">_PyAST_Fini()</tt> earlier (<a class="reference external" href="https://github.com/python/cpython/commit/fd957c124c44441d9c5eaf61f7af8cf266bafcb1">commit</a>).</p> <p>Python types contain a reference to themselves in in their <tt class="docutils literal">PyTypeObject.tp_mro</tt> member (the MRO tuple: Method Resolution Order). <tt class="docutils literal">_PyAST_Fini()</tt> must called before the last GC collection to destroy AST types.</p> <p><tt class="docutils literal">_PyInterpreterState_Clear()</tt> now calls <tt class="docutils literal">_PyAST_Fini()</tt>. It now also calls <tt class="docutils literal">_PyWarnings_Fini()</tt> on subinterpeters, not only on the main interpreter.</p> </div> <div class="section" id="thread-lock-traverse"> <h2>_thread lock traverse</h2> <p>In December 2020, while I tried to port the <tt class="docutils literal">_thread</tt> extesnion module to the multiphase initialization API (PEP 489), test_threading started to leak:</p> <pre class="literal-block"> $ ./python -m test -R 3:3 test_threading test_threading leaked [56, 56, 56] references, sum=168 </pre> <p>As usual, the workaround was to force a second GC collection in <tt class="docutils literal">interpreter_clear()</tt>:</p> <pre class="literal-block"> /* Last garbage collection on this interpreter */ _PyGC_CollectNoFail(tstate); + _PyGC_CollectNoFail(tstate); _PyGC_Fini(tstate); </pre> <p>It took me two days to full understand the problem. I drew reference cycles on paper to help me to understand the problem:</p> <img alt="_thread GC bug" src="https://vstinner.github.io/images/thread_gc_bug.jpg" /> <p>There are two cycles:</p> <ul class="simple"> <li>Cycle 1:<ul> <li>at fork function</li> <li>-&gt; __main__ module dict</li> <li>-&gt; at fork function</li> </ul> </li> <li>Cycle 2:<ul> <li>_thread lock type</li> <li>-&gt; lock type methods</li> <li>-&gt; _thread module dict</li> <li>-&gt; _thread local type</li> <li>-&gt; _thread module</li> <li>-&gt; _thread module state</li> <li>-&gt; _thread lock type</li> </ul> </li> </ul> <p>Moreover, there is a link between these two reference cycles: an instance of the lock type.</p> <p>I fixed the issue by adding a traverse function to the lock type and add <tt class="docutils literal">Py_TPFLAGS_HAVE_GC</tt> flag to the type (<a class="reference external" href="https://github.com/python/cpython/commit/6104013838e181e3c698cb07316f449a0c31ea96">commit</a>):</p> <pre class="literal-block"> +static int +lock_traverse(lockobject *self, visitproc visit, void *arg) +{ + Py_VISIT(Py_TYPE(self)); + return 0; +} </pre> </div> <div class="section" id="notes-on-weird-gc-bugs"> <h2>Notes on weird GC bugs</h2> <ul class="simple"> <li><tt class="docutils literal">gc.get_referents()</tt> and <tt class="docutils literal">gc.get_referrers()</tt> can be used to check traverse functions.</li> <li><tt class="docutils literal">gc.is_tracked()</tt> can be used to check if the GC tracks an object.</li> <li>Using the <tt class="docutils literal">gdb</tt> debugger on <tt class="docutils literal">gc_collect_main()</tt> helps to see which objects are collected. See for example the <tt class="docutils literal">finalize_garbage()</tt> functions which calls finalizers on unreachable objects.</li> <li>The solution is usually a missing traverse functions or a missing <tt class="docutils literal">Py_VISIT()</tt> in an existing traverse function.</li> <li>GC bugs are hard to debug :-)</li> </ul> <p>Thanks <strong>Pablo Galindo</strong> for helping me to debug all these tricky GC bugs!</p> <p>Thanks to everybody who are helping to better isolate subintrepreters by converting extension modules to the multiphase initialization API (PEP 489) and by converting dozens of static types to heap types. We made huge progresses last months!</p> </div> GIL bugfixes for daemon threads in Python 3.92020-04-04T22:00:00+02:002020-04-04T22:00:00+02:00Victor Stinnertag:vstinner.github.io,2020-04-04:/gil-bugfixes-daemon-threads-python39.html<a class="reference external image-reference" href="https://twitter.com/Bouletcorp/status/1241018332112998401"> <img alt="`#CoronaMaison by Boulet" src="https://vstinner.github.io/images/coronamaison_boulet.jpg" /> </a> <p>My previous article <a class="reference external" href="https://vstinner.github.io/daemon-threads-python-finalization-python32.html">Daemon threads and the Python finalization in Python 3.2 and 3.3</a> introduces issues caused by daemon threads in the Python finalization and past changes to make them work.</p> <p>This article is about bugfixes of the infamous GIL (Global Interpreter Lock) in Python 3.9, between …</p><a class="reference external image-reference" href="https://twitter.com/Bouletcorp/status/1241018332112998401"> <img alt="`#CoronaMaison by Boulet" src="https://vstinner.github.io/images/coronamaison_boulet.jpg" /> </a> <p>My previous article <a class="reference external" href="https://vstinner.github.io/daemon-threads-python-finalization-python32.html">Daemon threads and the Python finalization in Python 3.2 and 3.3</a> introduces issues caused by daemon threads in the Python finalization and past changes to make them work.</p> <p>This article is about bugfixes of the infamous GIL (Global Interpreter Lock) in Python 3.9, between March 2019 and March 2020, for daemon threads during Python finalization. Some bugs were old: up to 6 years old. Some bugs were triggered by the on-going work on isolating subinterpreters in Python 3.9.</p> <p>Drawing: <a class="reference external" href="https://twitter.com/Bouletcorp/status/1241018332112998401">#CoronaMaison by Boulet</a>.</p> <div class="section" id="fix-1-exit-pyeval-acquirethread-if-finalizing"> <h2>Fix 1: Exit PyEval_AcquireThread() if finalizing</h2> <p>In March 2019, <strong>Remy Noel</strong> created <a class="reference external" href="https://bugs.python.org/issue36469">bpo-36469</a>: a multithreaded Python application using 20 daemon threads hangs randomly at exit on Python 3.5:</p> <blockquote> The bug happens about once every two weeks on a script that is fired more than 10K times a day.</blockquote> <p><strong>Eric Snow</strong> analyzed the bug and understood that it is related to daemon threads and Python finalization. He identified that <tt class="docutils literal">PyEval_AcquireLock()</tt> and <tt class="docutils literal">PyEval_AcquireThread()</tt> function take the GIL but don't exit the thread if Python is finalizing.</p> <p>When Python is finalizing and a daemon thread takes the GIL, Python can hang randomly.</p> <p>Eric created <a class="reference external" href="https://bugs.python.org/issue36475">bpo-36475</a> to propose to modify <tt class="docutils literal">PyEval_AcquireLock()</tt> and <tt class="docutils literal">PyEval_AcquireThread()</tt> to also exit the thread in this case. In April 2019, <strong>Joannah Nanjekye</strong> fixed the issue with <a class="reference external" href="https://github.com/python/cpython/commit/f781d202a2382731b43bade845a58d28a02e9ea1">commit f781d202</a>:</p> <pre class="literal-block"> bpo-36475: Finalize PyEval_AcquireLock() and PyEval_AcquireThread() properly (GH-12667) PyEval_AcquireLock() and PyEval_AcquireThread() now terminate the current thread if called while the interpreter is finalizing, making them consistent with PyEval_RestoreThread(), Py_END_ALLOW_THREADS, and PyGILState_Ensure(). </pre> <p>The fix adds <tt class="docutils literal">exit_thread_if_finalizing()</tt> function which exit the thread if Python is finalizing. This function is called after each <tt class="docutils literal">take_gil()</tt> call.</p> <p>The fix is very similar to <tt class="docutils literal">PyEval_RestoreThread()</tt> fix made in 2013 (<a class="reference external" href="https://github.com/python/cpython/commit/0d5e52d3469a310001afe50689f77ddba6d554d1">commit 0d5e52d3</a>) to fix <a class="reference external" href="https://bugs.python.org/issue1856#msg60014">bpo-1856</a> (Python crash involving daemon threads during Python exit).</p> </div> <div class="section" id="fix-2-pyeval-restorethread-on-freed-tstate"> <h2>Fix 2: PyEval_RestoreThread() on freed tstate</h2> <div class="section" id="concurrent-futures-crash-on-freebsd"> <h3>concurrent.futures crash on FreeBSD</h3> <p>In December 2019, I reported <a class="reference external" href="https://bugs.python.org/issue39088">bpo-39088</a>: test_concurrent_futures <strong>crashed randomly</strong> with a coredump on AMD64 FreeBSD Shared 3.x buildbot. In March 2020, I succeeded to reproduce the bug on FreeBSD and I was able to debug the coredump in gdb:</p> <pre class="literal-block"> (gdb) frame #0 0x00000000003b518c in PyEval_RestoreThread (tstate=0x801f23790) at Python/ceval.c:387 387 _PyRuntimeState *runtime = tstate-&gt;interp-&gt;runtime; (gdb) p tstate-&gt;interp $3 = (PyInterpreterState *) 0xdddddddddddddddd </pre> <p>The Python thread state (<tt class="docutils literal">tstate</tt>) was freed. In debug mode, the &quot;free()&quot; function of the Python memory allocator fills the freed memory block with <tt class="docutils literal">0xDD</tt> byte pattern (<tt class="docutils literal">D</tt> stands for dead byte) to detect usage of freed memory.</p> <p>The problem is that Python finalization already freed the memory of all PyThreadState structures, when <tt class="docutils literal">PyEval_RestoreThread(tstate)</tt> is called by a daemon thread. <tt class="docutils literal">PyEval_RestoreThread()</tt> dereferences <tt class="docutils literal">tstate</tt>:</p> <pre class="literal-block"> _PyRuntimeState *runtime = tstate-&gt;interp-&gt;runtime; </pre> <p>This bug is a regression caused by my change: <a class="reference external" href="https://github.com/python/cpython/commit/01b1cc12e7c6a3d6a3d27ba7c731687d57aae92a">Add PyInterpreterState.runtime field</a> of <a class="reference external" href="https://bugs.python.org/issue36710">bpo-36710</a>. I replaced:</p> <pre class="literal-block"> void PyEval_RestoreThread(PyThreadState *tstate) { _PyRuntimeState *runtime = &amp;_PyRuntime; ... } </pre> <p>with:</p> <pre class="literal-block"> void PyEval_RestoreThread(PyThreadState *tstate) { _PyRuntimeState *runtime = tstate-&gt;interp-&gt;runtime; ... } </pre> </div> <div class="section" id="fix-pyeval-restorethread-for-daemon-threads"> <h3>Fix PyEval_RestoreThread() for daemon threads</h3> <p>I created <a class="reference external" href="https://bugs.python.org/issue39877">bpo-39877</a> to investigate this bug. I managed to reproduce the crash on Linux with a script spawning daemon threads which sleep randomly between 0.0 and 1.0 second, and by adding <tt class="docutils literal">sleep(1);</tt> call at <tt class="docutils literal">Py_RunMain()</tt> exit.</p> <p>I wrote a <tt class="docutils literal">PyEval_RestoreThread()</tt> fix which access to <tt class="docutils literal">_PyRuntimeState.finalizing</tt> without the GIL.</p> <p><strong>Antoine Pitrou</strong> asked me to convert <tt class="docutils literal">_PyRuntimeState.finalizing</tt> to an atomic variable to avoid inconsistencies in case of parallel accesses. At March 7, 2020, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/7b3c252dc7f44d4bdc4c7c82d225ebd09c78f520">commit 7b3c252d</a>:</p> <pre class="literal-block"> bpo-39877: _PyRuntimeState.finalizing becomes atomic (GH-18816) Convert _PyRuntimeState.finalizing field to an atomic variable: * Rename it to _finalizing * Change its type to _Py_atomic_address * Add _PyRuntimeState_GetFinalizing() and _PyRuntimeState_SetFinalizing() functions * Remove _Py_CURRENTLY_FINALIZING() function: replace it with testing directly _PyRuntimeState_GetFinalizing() value Convert _PyRuntimeState_GetThreadState() to static inline function. </pre> <p>The day after, I pushed my fix, <a class="reference external" href="https://github.com/python/cpython/commit/eb4e2ae2b8486e8ee4249218b95d94a9f0cc513e">commit eb4e2ae2</a>:</p> <pre class="literal-block"> bpo-39877: Fix PyEval_RestoreThread() for daemon threads (GH-18811) * exit_thread_if_finalizing() does now access directly _PyRuntime variable, rather than using tstate-&gt;interp-&gt;runtime since tstate can be a dangling pointer after Py_Finalize() has been called. * exit_thread_if_finalizing() is now called *before* calling take_gil(). _PyRuntime.finalizing is an atomic variable, we don't need to hold the GIL to access it. </pre> <p><tt class="docutils literal">exit_thread_if_finalizing()</tt> is now called <strong>before</strong> <tt class="docutils literal">take_gil()</tt> to ensure that <tt class="docutils literal">take_gil()</tt> cannot be called with an invalid Python thread state (<tt class="docutils literal">tstate</tt>).</p> <p>I commented <em>naively</em>:</p> <blockquote> Ok, it should now be fixed.</blockquote> </div> </div> <div class="section" id="clear-python-thread-states-earlier-my-first-failed-attempt-in-2013"> <h2>Clear Python thread states earlier: my first failed attempt in 2013</h2> <p>In 2013, I opened <a class="reference external" href="https://bugs.python.org/issue19466">bpo-19466</a> to clear earlier the Python thread state of threads during Python finalization. My intent was to display <tt class="docutils literal">ResourceWarning</tt> warnings of daemon threads as well. In November 2013, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/45956b9a33af634a2919ade64c1dd223ab2d5235">commit 45956b9a</a>:</p> <pre class="literal-block"> Close #19466: Clear the frames of daemon threads earlier during the Python shutdown to call objects destructors. So &quot;unclosed file&quot; resource warnings are now correctly emitted for daemon threads. </pre> <p>Later, I discovered a crash in the the garbage collector while trying to reproduce a race condition in asyncio: I created <a class="reference external" href="https://bugs.python.org/issue20526">bpo-20526</a>. Sadly, this bug was trigger by my previous change. I decided that it's safer to revert my change.</p> <p>By the way, when I looked again at <a class="reference external" href="https://bugs.python.org/issue20526">bpo-20526</a>, I was able to reproduce again the garbage collector bug, likely because of recent changes. With the help of <strong>Pablo Galindo Salgado</strong>, Pablo and me <a class="reference external" href="https://bugs.python.org/issue20526#msg364851">understood the root issue</a>. At March 24, 2020, I pushed a fix (<a class="reference external" href="https://github.com/python/cpython/commit/5804f878e779712e803be927ca8a6df389d82cdf">commit</a>) to finally fix this 6 years old bug! The fix removes the following line from <tt class="docutils literal">PyThreadState_Clear()</tt>:</p> <pre class="literal-block"> Py_CLEAR(tstate-&gt;frame); </pre> </div> <div class="section" id="fix-3-exit-also-take-gil-at-exit-point-if-finalizing"> <h2>Fix 3: Exit also take_gil() at exit point if finalizing</h2> <p>After fixing <tt class="docutils literal">PyEval_RestoreThread()</tt>, I decided to attempt again to fix <a class="reference external" href="https://bugs.python.org/issue19466">bpo-19466</a> (clear earlier Python thread states). Sadly, I discovered that my <tt class="docutils literal">PyEval_RestoreThread()</tt> fix <strong>introduced a race condition</strong>!</p> <p>While the main thread finalizes Python, daemon threads can be waiting for the GIL: they block in <tt class="docutils literal">take_gil()</tt>. When the main thread releases the GIL during finalization, a daemon thread take the GIL instead of exiting. Daemon threads only check if they must exit <strong>before</strong> trying to take the GIL.</p> <p>The solution is to call <tt class="docutils literal">exit_thread_if_finalizing()</tt> twice in <tt class="docutils literal">take_gil()</tt>: before <strong>and</strong> after taking the GIL.</p> <p>In March 2020, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/9229eeee105f19705f72e553cf066751ac47c7b7">commit 9229eeee</a>:</p> <pre class="literal-block"> bpo-39877: take_gil() checks tstate_must_exit() twice (GH-18890) take_gil() now also checks tstate_must_exit() after acquiring the GIL: exit the thread if Py_Finalize() has been called. </pre> <p>I commented:</p> <blockquote> <p>I ran multiple times <tt class="docutils literal">daemon_threads_exit.py</tt> with <tt class="docutils literal">slow_exit.patch</tt>: no crash.</p> <p>I also ran multiple times <tt class="docutils literal">stress.py</tt> + <tt class="docutils literal">sleep_at_exit.patch</tt> of bpo-37135: no crash.</p> <p>And I tested <tt class="docutils literal">asyncio_gc.py</tt> of bpo-19466: no crash neither.</p> <p><strong>Python finalization now looks reliable.</strong> I'm not sure if it's &quot;more&quot; reliable than previously, but at least, I cannot get a crash anymore, even after bpo-19466 has been fixed (clear Python thread states of daemon threads earlier).</p> </blockquote> <p>Funny fact, in June 2019, <strong>Eric Snow</strong> added a very similar bug in <a class="reference external" href="https://bugs.python.org/issue36818">bpo-36818</a> with <a class="reference external" href="https://github.com/python/cpython/commit/396e0a8d9dc65453cb9d53500d0a620602656cfe">commit 396e0a8d</a>: test_multiprocessing_spawn segfault on FreeBSD (<a class="reference external" href="https://bugs.python.org/issue37135">bpo-37135</a>). I reverted his change to fix the issue. At this time, I didn't have the bandwidth to investigate the root cause. I just reverted Eric's change.</p> </div> <div class="section" id="fix-4-exit-take-gil-while-waiting-for-the-gil-if-finalizing"> <h2>Fix 4: Exit take_gil() while waiting for the GIL if finalizing</h2> <p>While I was working on moving pending calls from <tt class="docutils literal">_PyRuntime</tt> to <tt class="docutils literal">PyInterpreterState</tt>, <a class="reference external" href="https://bugs.python.org/issue39984">bpo-3998</a>, I had another bug.</p> <p>At March 18, 2020, I pushed a <tt class="docutils literal">take_gil()</tt> fix to avoid accessing <tt class="docutils literal">tstate</tt> if Python is finalizing, <a class="reference external" href="https://github.com/python/cpython/commit/29356e03d4f8800b04f799efe7a10e3ce8b16f61">commit 29356e03</a>:</p> <pre class="literal-block"> bpo-39877: Fix take_gil() for daemon threads (GH-19054) bpo-39877, bpo-39984: If the thread must exit, don't access tstate to prevent a potential crash: tstate memory has been freed. </pre> <p>And while working on the inefficient signal handling in multithreaded applications (<a class="reference external" href="https://bugs.python.org/issue40010">bpo-40010</a>), I discovered that the previous fix was not enough!</p> <p>At March 19, 2020, I pushed a <tt class="docutils literal">take_gil()</tt> fix to exit while <tt class="docutils literal">take_gil()</tt> is waiting for the GIL if Python is finalizing, <a class="reference external" href="https://github.com/python/cpython/commit/a36adfa6bbf5e612a4d4639124502135690899b8">commit a36adfa6</a>:</p> <pre class="literal-block"> bpo-39877: 4th take_gil() fix for daemon threads (GH-19080) bpo-39877, bpo-40010: Add a third tstate_must_exit() check in take_gil() to prevent using tstate which has been freed. </pre> <p>I can only hope that this fix is the last one to fix all corner cases with daemon threads in <tt class="docutils literal">take_gil()</tt> (<a class="reference external" href="https://bugs.python.org/issue39877">bpo-39877</a>)!</p> </div> <div class="section" id="summary-of-gil-bugfixes"> <h2>Summary of GIL bugfixes</h2> <p>The GIL got 5 main bugfixes for daemon threads and Python finalization:</p> <ul class="simple"> <li>May 2011, <strong>Antoine Pitrou</strong>, <a class="reference external" href="https://github.com/python/cpython/commit/0d5e52d3469a310001afe50689f77ddba6d554d1">commit 0d5e52d3</a>: <tt class="docutils literal">take_gil()</tt> exits if finalizing <strong>after</strong> taking the GIL (1 check)</li> <li>April 2019, <strong>Joannah Nanjekye</strong>, <a class="reference external" href="https://github.com/python/cpython/commit/f781d202a2382731b43bade845a58d28a02e9ea1">commit f781d202</a>: PyEval_AcquireLock() and PyEval_AcquireThread() also exit if Python is finalizing</li> <li>March 8, 2020, <strong>Victor Stinner</strong>, <a class="reference external" href="https://github.com/python/cpython/commit/eb4e2ae2b8486e8ee4249218b95d94a9f0cc513e">commit eb4e2ae2</a>: <tt class="docutils literal">take_gil()</tt> exits if finalizing <strong>before</strong> taking the GIL (1 check)</li> <li>March 9, 2020, <strong>Victor Stinner</strong>, <a class="reference external" href="https://github.com/python/cpython/commit/9229eeee105f19705f72e553cf066751ac47c7b7">commit 9229eeee</a>: <tt class="docutils literal">take_gil()</tt> exits if finalizing <strong>before and after</strong> taking the GIL (2 checks)</li> <li>March 19, 2020, <strong>Victor Stinner</strong>, <a class="reference external" href="https://github.com/python/cpython/commit/a36adfa6bbf5e612a4d4639124502135690899b8">commit a36adfa6</a>: <tt class="docutils literal">take_gil()</tt> exits if finalizing <strong>before, while, and after</strong> taking the GIL (3 checks)</li> </ul> </div> Threading shutdown race condition2020-04-03T20:00:00+02:002020-04-03T20:00:00+02:00Victor Stinnertag:vstinner.github.io,2020-04-03:/threading-shutdown-race-condition.html<p>This article is about a race condition in threading shutdown that I fixed in Python 3.9 in March 2019. I also forbid spawning daemon threads in subinterpreters to fix another related bug.</p> <a class="reference external image-reference" href="https://twitter.com/neeljulien/status/1240292383369150464"> <img alt="#CoronaMaison by Julien Neel" src="https://vstinner.github.io/images/coronamaison_jneel.jpg" /> </a> <p>Drawing: <a class="reference external" href="https://twitter.com/neeljulien/status/1240292383369150464">#CoronaMaison by Julien Neel</a>.</p> <div class="section" id="race-condition-in-threading-shutdown"> <h2>Race condition in threading shutdown</h2> <div class="section" id="random-test-failure-noticed-on-freebsd-buildbot"> <h3>Random test failure noticed on FreeBSD buildbot …</h3></div></div><p>This article is about a race condition in threading shutdown that I fixed in Python 3.9 in March 2019. I also forbid spawning daemon threads in subinterpreters to fix another related bug.</p> <a class="reference external image-reference" href="https://twitter.com/neeljulien/status/1240292383369150464"> <img alt="#CoronaMaison by Julien Neel" src="https://vstinner.github.io/images/coronamaison_jneel.jpg" /> </a> <p>Drawing: <a class="reference external" href="https://twitter.com/neeljulien/status/1240292383369150464">#CoronaMaison by Julien Neel</a>.</p> <div class="section" id="race-condition-in-threading-shutdown"> <h2>Race condition in threading shutdown</h2> <div class="section" id="random-test-failure-noticed-on-freebsd-buildbot"> <h3>Random test failure noticed on FreeBSD buildbot</h3> <p>In March 2019, I noticed that <tt class="docutils literal">test_threading.test_threads_join_2()</tt> was killed by SIGABRT on the FreeBSD CURRENT buildbot, <a class="reference external" href="https://bugs.python.org/issue36402">bpo-36402</a>:</p> <pre class="literal-block"> Fatal Python error: Py_EndInterpreter: not the last thread </pre> <p>The <tt class="docutils literal">test_threads_join_2()</tt> test <strong>failed randomly</strong> on buildbots when tests were <strong>run in parallel</strong>, but test_threading <strong>passed</strong> when it was <strong>re-run sequentially</strong>. Such failure was silently ignored, since the build was seen overall as a success.</p> <p>The test <tt class="docutils literal">test_threading.test_threads_join_2()</tt> was added by in 2013 <a class="reference external" href="https://github.com/python/cpython/commit/7b4769937fb612d576b6829c3b834f3dd31752f1">commit 7b476993</a>.</p> <p>In 2016, I already reported the same test failure: <a class="reference external" href="https://bugs.python.org/issue27791">bpo-27791</a> (same test, also on FreeBSD). And Christian Heimes reported a similar issue: <a class="reference external" href="https://bugs.python.org/issue28084">bpo-28084</a>. I simply closed these issues because I only saw the failure once in 4 months and <strong>I didn't have access to FreeBSD to attempt to reproduce the crash</strong>.</p> </div> <div class="section" id="reproduce-the-race-condition"> <h3>Reproduce the race condition</h3> <p>In 2019, I had a FreeBSD VM to attempt to reproduce the bug locally.</p> <p>In June 2019, I found a reliable way to reproduce the bug by <a class="reference external" href="https://github.com/python/cpython/pull/13889/files">adding random sleeps to the test</a>. With this patch, I was also able to reproduce the bug on Linux. <strong>I am way more comfortable to debug an issue on Linux</strong> with my favorite debugging tools!</p> <p>I identified a race condition in the Python finalization. I also understood that the bug was not specific to subinterpreters:</p> <blockquote> The test shows the bug using subinterpreters (Py_EndInterpreter), but <strong>the bug also exists in Py_Finalize()</strong> which has the same race condition.</blockquote> <p>I wrote a patch for <tt class="docutils literal">Py_Finalize()</tt> to help me to reproduce the bug without subinterpreters:</p> <pre class="literal-block"> + if (tstate != interp-&gt;tstate_head || tstate-&gt;next != NULL) { + Py_FatalError(&quot;Py_EndInterpreter: not the last thread&quot;); + } </pre> </div> <div class="section" id="threading-shutdown-race-condition-1"> <h3>threading._shutdown() race condition</h3> <p><tt class="docutils literal">threading._shutdown()</tt> uses <tt class="docutils literal">threading.enumerate()</tt> which iterates on <tt class="docutils literal">threading._active</tt> dictionary.</p> <p><tt class="docutils literal">threading.Thread</tt> registers itself into <tt class="docutils literal">threading._active</tt> when the thread starts. It unregisters itself from <tt class="docutils literal">threading._active</tt> when it completes.</p> <p>The bug occurs when the thread is unregistered whereas the underlying native thread is still running and <strong>the Python thread state is not deleted yet</strong>.</p> <p><tt class="docutils literal">_thread._set_sentinel()</tt> creates a lock and registers a <tt class="docutils literal"><span class="pre">tstate-&gt;on_delete</span></tt> callback to release this lock. It's called by <tt class="docutils literal">threading.Thread</tt> when the thread starts to set <tt class="docutils literal">threading.Thread._tstate_lock</tt>. This lock is used by <tt class="docutils literal">threading.Thread.join()</tt> method to wait until the thread completes.</p> <p><tt class="docutils literal">_thread.start_new_thread()</tt> calls the C function <tt class="docutils literal">t_bootstrap()</tt> which ends with:</p> <pre class="literal-block"> tstate-&gt;interp-&gt;num_threads--; PyThreadState_Clear(tstate); PyThreadState_DeleteCurrent(); PyThread_exit_thread(); </pre> <p>When the native thread completes, <tt class="docutils literal">_PyThreadState_DeleteCurrent()</tt> is called: it calls <tt class="docutils literal"><span class="pre">tstate-&gt;on_delete()</span></tt> callback which releases <tt class="docutils literal">threading.Thread._tstate_lock</tt> lock.</p> <p>The root issue is that:</p> <ul class="simple"> <li><tt class="docutils literal">threading._shutdown()</tt> rely on <tt class="docutils literal">threading._alive</tt> dictionary</li> <li><tt class="docutils literal">Py_EndInterpreter()</tt> rely on the interpreter linked list of Python thread states of the interpreter (<tt class="docutils literal"><span class="pre">interp-&gt;tstate_head</span></tt>).</li> </ul> <p>The lock on Python thread states (<tt class="docutils literal">threading.Thread._tstate_lock</tt>) and <tt class="docutils literal">PyThreadState.on_delete</tt> callback were added in 2013 by <strong>Antoine Pitrou</strong> to Python 3.4, <a class="reference external" href="https://github.com/python/cpython/commit/7b4769937fb612d576b6829c3b834f3dd31752f1">commit 7b476993</a> of <a class="reference external" href="https://bugs.python.org/issue18808">bpo-18808</a>:</p> <pre class="literal-block"> Issue #18808: Thread.join() now waits for the underlying thread state to be destroyed before returning. This prevents unpredictable aborts in Py_EndInterpreter() when some non-daemon threads are still running. </pre> </div> <div class="section" id="fix-threading-shutdown"> <h3>Fix threading._shutdown()</h3> <p>Finally in June 2019, I fixed the race condition in <tt class="docutils literal">threading._shutdown()</tt> with <a class="reference external" href="https://github.com/python/cpython/commit/468e5fec8a2f534f1685d59da3ca4fad425c38dd">commit 468e5fec</a>:</p> <pre class="literal-block"> bpo-36402: Fix threading._shutdown() race condition (GH-13948) Fix a race condition at Python shutdown when waiting for threads. Wait until the Python thread state of all non-daemon threads get deleted (join all non-daemon threads), rather than just wait until Python threads complete. </pre> <p>The fix is to modify <tt class="docutils literal">threading._shutdown()</tt> to wait until the Python thread state of all non-daemon threads get deleted, rather than calling the <tt class="docutils literal">join()</tt> method of all non-daemon threads. The <tt class="docutils literal">join()</tt> does not ensure that the Python thread state is deleted.</p> <p>The Python finalization calls <tt class="docutils literal">threading._shutdown()</tt> to wait until all threads complete. Only non-daemon threads are awaited: daemon threads can continue to run after <tt class="docutils literal">threading._shutdown()</tt>.</p> <p><tt class="docutils literal">Py_EndInterpreter()</tt> requires that the Python thread states of all threads have been deleted. <strong>What about daemon threads?</strong> More about that in the next section ;-)</p> <p>Note: This change introduced a regression (memory leak) which is not fixed yet: <a class="reference external" href="https://bugs.python.org/issue37788">bpo-37788</a>.</p> </div> </div> <div class="section" id="forbid-daemon-threads-in-subinterpreters"> <h2>Forbid daemon threads in subinterpreters</h2> <p>In June 2019, while fixing the threading shutdown, I found a reliable way to trigger a bug with daemon threads when a subinterpreter is finalized:</p> <pre class="literal-block"> Fatal Python error: Py_EndInterpreter: not the last thread </pre> <p>By design, daemon threads can run after a Python interpreter is finalized, whereas <tt class="docutils literal">Py_EndInterpreter()</tt> requires that all threads completed.</p> <p>I reported <a class="reference external" href="https://bugs.python.org/issue37266">bpo-37266</a> to propose to forbid the creation of daemon threads in subinterpreters. I fixed the issue with <a class="reference external" href="https://github.com/python/cpython/commit/066e5b1a917ec2134e8997d2cadd815724314252">commit 066e5b1a</a>:</p> <pre class="literal-block"> bpo-37266: Daemon threads are now denied in subinterpreters (GH-14049) In a subinterpreter, spawning a daemon thread now raises an exception. Daemon threads were never supported in subinterpreters. Previously, the subinterpreter finalization crashed with a Pyton fatal error if a daemon thread was still running. </pre> <p>The change adds this check to <tt class="docutils literal">Thread.start()</tt>:</p> <pre class="literal-block"> if self.daemon and not _is_main_interpreter(): raise RuntimeError(&quot;daemon thread are not supported &quot; &quot;in subinterpreters&quot;) </pre> <p>I commented:</p> <blockquote> <strong>Daemon threads must die.</strong> That's a first step towards their death!</blockquote> <p><strong>Antoine Pitrou</strong> created <a class="reference external" href="https://bugs.python.org/issue39812">bpo-39812: Avoid daemon threads in concurrent.futures</a> as a follow-up.</p> <p>In February 2020, when rebuilding Fedora Rawhide with Python 3.9, <strong>Miro Hrončok</strong> of my team noticed that my change <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1792062">broke the python-jep project</a>. I <a class="reference external" href="https://github.com/ninia/jep/issues/229">reported the bug upstream</a>. It has been fixed by using regular threads, rather than daemon threads: <a class="reference external" href="https://github.com/ninia/jep/commit/a31d461c6cacc96de68d68320eaa83e19a45d0cc">commit</a>.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>A random failure on a FreeBSD buildbot was hiding a severe race condition in the threading shutdown. The bug existed since 2013, but was silently ignored since the test passed when re-run.</p> <p>The race condition was that that the threading shutdown didn't ensure that the Python thread state of all non-daemon threads are deleted, whereas it is a <tt class="docutils literal">Py_EndInterpreter()</tt> requirement.</p> <p>I fixed the threading shutdown by waiting until the Python thread state of all non-daemon threads is deleted.</p> <p>I also modified <tt class="docutils literal">Thread.start()</tt> to forbid spawning daemon threads in Python subinterpreters to fix a related issue.</p> </div> Daemon threads and the Python finalization in Python 3.2 and 3.32020-03-26T22:00:00+01:002020-03-26T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-03-26:/daemon-threads-python-finalization-python32.html<a class="reference external image-reference" href="https://twitter.com/LuppiChan/status/1240346448606171136"> <img alt="#CoronaMaison by Luppi" src="https://vstinner.github.io/images/coronamaison_luppi.jpg" /> </a> <p>At exit, the Python finalization calls Python objects finalizers (the <tt class="docutils literal">__del__()</tt> method) and deallocates memory. The daemon threads are a special kind of threads which continue to run during and after the Python finalization. They are causing race conditions and tricky bugs in the Python finalization.</p> <p>This article covers bugs …</p><a class="reference external image-reference" href="https://twitter.com/LuppiChan/status/1240346448606171136"> <img alt="#CoronaMaison by Luppi" src="https://vstinner.github.io/images/coronamaison_luppi.jpg" /> </a> <p>At exit, the Python finalization calls Python objects finalizers (the <tt class="docutils literal">__del__()</tt> method) and deallocates memory. The daemon threads are a special kind of threads which continue to run during and after the Python finalization. They are causing race conditions and tricky bugs in the Python finalization.</p> <p>This article covers bugs fixed in the Python finalization in Python 3.2 and Python 3.3 (2009 to 2011), and a backport in Python 2.7.8 (2014).</p> <p>Drawing: <a class="reference external" href="https://twitter.com/LuppiChan/status/1240346448606171136">#CoronaMaison by Luppi</a>.</p> <div class="section" id="daemon-threads"> <h2>Daemon threads</h2> <p>Python has a special kind of thread: &quot;daemon&quot; threads. The difference with regular threads is that Python doesn't wait until daemon threads complete at exit, whereas it waits until all regular (&quot;non-daemon&quot;) threads complete. Example:</p> <pre class="literal-block"> import threading, time thread = threading.Thread(target=time.sleep, args=(5.0,), daemon=False) thread.start() </pre> <p>This Python program spawns a regular thread which sleeps for 5 seconds. Python takes 5 seconds to exit:</p> <pre class="literal-block"> $ time python3 sleep.py real 0m5,047s </pre> <p>If <tt class="docutils literal">daemon=False</tt> is replaced with <tt class="docutils literal">daemon=True</tt> to spawn a daemon thread instead, Python exits immediately (57 ms):</p> <pre class="literal-block"> $ time python3 sleep.py real 0m0,057s </pre> <p>Note: The <tt class="docutils literal">Thread.join()</tt> method can be called explicitly to wait until a daemon thread completes.</p> </div> <div class="section" id="don-t-destroy-the-gil-at-exit"> <h2>Don't destroy the GIL at exit</h2> <p>In November 2009, <strong>Antoine Pitrou</strong> implemented a new GIL (Global Interpreter Lock) in Python 3.2: <a class="reference external" href="https://github.com/python/cpython/commit/074e5ed974be65fbcfe75a4c0529dbc53f13446f">commit 074e5ed9</a>.</p> <p>In September 2010, he found a crash with daemon threads while stressing <tt class="docutils literal">test_threading</tt>: <a class="reference external" href="https://bugs.python.org/issue9901">bpo-9901: GIL destruction can fail</a>. <tt class="docutils literal">test_finalize_with_trace()</tt> failed with:</p> <pre class="literal-block"> Fatal Python error: pthread_mutex_destroy(gil_mutex) failed </pre> <p>He pushed a fix for this crash in Python 3.2, <a class="reference external" href="https://github.com/python/cpython/commit/b0b384b7c0333bf1183cd6f90c0a3f9edaadd6b9">commit b0b384b7</a>:</p> <pre class="literal-block"> Issue #9901: Destroying the GIL in Py_Finalize() can fail if some other threads are still running. Instead, reinitialize the GIL on a second call to Py_Initialize(). </pre> <p>The Python GIL internally uses a lock. If the lock is destroyed while a daemon thread is waiting for it, the thread can crash. The fix is to <strong>no longer destroy the GIL at exit</strong>.</p> </div> <div class="section" id="exit-the-thread-in-pyeval-restorethread"> <h2>Exit the thread in PyEval_RestoreThread()</h2> <p>The Python finalization clears and deallocates the &quot;Python thread state&quot; of all threads (in <tt class="docutils literal">PyInterpreterState_Delete()</tt>) which calls Python object finalizers of these threads. Calling a finalizer can drop the GIL to call a system call. For example, closing a file drops the GIL. When the GIL is dropped, a daemon thread is awaken to take the GIL. Since the Python thread state was just deallocated, the daemon thread crash.</p> <p>This bug is a race condition. It depends on which order threads are executed, on which order objects are finalized, on which order memory is deallocated, etc.</p> <p>The crash was first reported in April 2005: <a class="reference external" href="https://bugs.python.org/issue1193099">bpo-1193099: Embedded python thread crashes</a>. In January 2008, <strong>Gregory P. Smith</strong> reported <a class="reference external" href="https://bugs.python.org/issue1856#msg60014">bpo-1856: shutdown (exit) can hang or segfault with daemon threads running</a>. He wrote a short Python program reproducing the bug: spawn 40 daemon threads which do some I/O operations and sleep randomly between 0 ms and 5 ms in a loop.</p> <p><strong>Adam Olsen</strong> <a class="reference external" href="https://bugs.python.org/issue1856#msg60059">proposed a solution</a> (with a patch):</p> <blockquote> I think <strong>non-main threads should kill themselves off</strong> if they grab the interpreter lock and the interpreter is tearing down. They're about to get killed off anyway, when the process exits.</blockquote> <p>In May 2011, <strong>Antoine Pitrou</strong> pushed a fix to Python 3.3 (6 years after the first bug report) which implements this solution, <a class="reference external" href="https://github.com/python/cpython/commit/0d5e52d3469a310001afe50689f77ddba6d554d1">commit 0d5e52d3</a>:</p> <pre class="literal-block"> Issue #1856: Avoid crashes and lockups when daemon threads run while the interpreter is shutting down; instead, these threads are now killed when they try to take the GIL. </pre> </div> <div class="section" id="pyeval-restorethread-fix-explanation"> <h2>PyEval_RestoreThread() fix explanation</h2> <p>The fix adds a new <tt class="docutils literal">_Py_Finalizing</tt> variable which is set by <tt class="docutils literal">Py_Finalize()</tt> to the (Python thread state of the) thread which runs the finalization.</p> <p>Simplified patch of the <tt class="docutils literal">PyEval_RestoreThread()</tt> fix:</p> <pre class="literal-block"> &#64;&#64; -440,6 +440,12 &#64;&#64; PyEval_RestoreThread() take_gil(tstate); + if (_Py_Finalizing &amp;&amp; tstate != _Py_Finalizing) { + drop_gil(tstate); + PyThread_exit_thread(); + } </pre> <p>If Python is finalizing (<tt class="docutils literal">_Py_Finalizing</tt> is not NULL) and <tt class="docutils literal">PyEval_RestoreThread()</tt> is called by a thread which is not thread running the finalization, the thread exits immediately (call <tt class="docutils literal">PyThread_exit_thread()</tt>).</p> <p><tt class="docutils literal">PyEval_RestoreThread()</tt> is called when a thread takes the GIL. Typical example of code which drops the GIL to call a system call (close a file descriptor, <tt class="docutils literal">io.FileIO()</tt> finalizer) and then takes again the GIL:</p> <pre class="literal-block"> Py_BEGIN_ALLOW_THREADS close(fd); Py_END_ALLOW_THREADS </pre> <p>The <tt class="docutils literal">Py_BEGIN_ALLOW_THREADS</tt> macro calls <tt class="docutils literal">PyEval_SaveThread()</tt> to drop the GIL, and the <tt class="docutils literal">Py_END_ALLOW_THREADS</tt> macro calls <tt class="docutils literal">PyEval_RestoreThread()</tt> to take the GIL. Pseudo-code:</p> <pre class="literal-block"> PyEval_SaveThread(); // drop the GIL close(fd); PyEval_RestoreThread(); // take the GIL </pre> <p>With Antoine's fix, if Python is finalizing, a thread now exits immediately when calling <tt class="docutils literal">PyEval_RestoreThread()</tt>.</p> </div> <div class="section" id="revert-take-gil-backport-to-2-7"> <h2>Revert take_gil() backport to 2.7</h2> <p>In June 2014, <strong>Benjamin Peterson</strong> (Python 2.7 release manager) backported Antoine's change to Python 2.7: fix included in 2.7.8.</p> <p>Problem: the Ceph project <a class="reference external" href="https://tracker.ceph.com/issues/8797">started to crash with Python 2.7.8</a>.</p> <p>In November 2014, the change was reverted in Python 2.7.9: see <a class="reference external" href="https://bugs.python.org/issue21963">bpo-21963 discussion</a> for the rationale.</p> <p>In 2014, I already wrote:</p> <blockquote> Anyway, <strong>daemon threads are evil</strong> :-( Expecting them to exit cleanly automatically is not good. Last time I tried to improve code to cleanup Python at exit in Python 3.4, I also had a regression (just before the release of Python 3.4.0): see the <a class="reference external" href="https://bugs.python.org/issue21788">issue #21788</a>.</blockquote> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>Daemon threads caused crashes in the Python finalization, first noticed in 2005.</p> <p>Python 3.2 (released in February 2011) got a new GIL and also a bugfix for daemon thread. Python 3.3 (released in September 2012) also got a bugfix for daemon threads. The Python finalization became more reliable.</p> <p>Changing Python finalization is risky. A backport of a bugfix into Python 2.7.8 caused a regression which required to revert the bugfix in Python 2.7.9.</p> </div> Python 3.7 Development Mode2020-01-16T12:00:00+01:002020-01-16T12:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-01-16:/python37-dev-mode.html<a class="reference external image-reference" href="https://twitter.com/guinoir/status/1217146968029331456"> <img alt="Ready to race" src="https://vstinner.github.io/images/ready_to_race.jpg" /> </a> <p>This article describes the discussion on the design of the <a class="reference external" href="https://docs.python.org/dev/using/cmdline.html#id5">development mode (-X dev)</a> that I <strong>added to Python 3.7</strong> and how it has been implemented.</p> <p>The development mode enables runtime checks which are too expensive to be enabled by default. It can be enabled by <tt class="docutils literal">python3 <span class="pre">-X</span> dev …</tt></p><a class="reference external image-reference" href="https://twitter.com/guinoir/status/1217146968029331456"> <img alt="Ready to race" src="https://vstinner.github.io/images/ready_to_race.jpg" /> </a> <p>This article describes the discussion on the design of the <a class="reference external" href="https://docs.python.org/dev/using/cmdline.html#id5">development mode (-X dev)</a> that I <strong>added to Python 3.7</strong> and how it has been implemented.</p> <p>The development mode enables runtime checks which are too expensive to be enabled by default. It can be enabled by <tt class="docutils literal">python3 <span class="pre">-X</span> dev</tt> command line option or by <tt class="docutils literal">PYTHONDEVMODE=1</tt> environment variable. It helps developers to spot bugs in their code and helps them to be prepared for future Python changes.</p> <p>Drawing: <em>Ready to race, by Guillaume Singelin.</em></p> <div class="section" id="email-sent-to-python-ideas"> <h2>Email sent to python-ideas</h2> <p>In March 2016, I proposed <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2016-March/039314.html">Add a developer mode to Python: -X dev command line option</a> on the python-ideas list:</p> <blockquote> <p>When I develop on CPython, I'm always building Python in debug mode using <tt class="docutils literal">./configure <span class="pre">--with-pydebug</span></tt>. This mode enables a <strong>lot</strong> of extra checks which helps me to detect bugs earlier. The debug mode makes Python much slower and so is not enabled by default.</p> <p>I propose to add a &quot;development mode&quot; to Python, to get a few checks to detect bugs earlier: a new <tt class="docutils literal"><span class="pre">-X</span> dev</tt> command line option. Example:</p> <pre class="literal-block"> python3.6 -X dev script.py </pre> <p>I propose to enable:</p> <ul class="simple"> <li>Show <tt class="docutils literal">DeprecationWarning</tt> and <tt class="docutils literal">ResourceWarning warnings</tt>: <tt class="docutils literal">python <span class="pre">-Wd</span></tt></li> <li>Show <tt class="docutils literal">BytesWarning</tt> warning: <tt class="docutils literal">python <span class="pre">-b</span></tt></li> <li>Enable Python assertions (<tt class="docutils literal">assert</tt>) and set <tt class="docutils literal">__debug__</tt> to True: remove (or just ignore) <tt class="docutils literal"><span class="pre">-O</span></tt> or <tt class="docutils literal"><span class="pre">-OO</span></tt> command line arguments</li> <li>faulthandler to get a Python traceback on segfault and fatal errors: <tt class="docutils literal">python <span class="pre">-X</span> faulthandler</tt></li> <li>Debug hooks on Python memory allocators: <tt class="docutils literal">PYTHONMALLOC=debug</tt></li> </ul> </blockquote> <p>I wrote an implementation of this development mode using <tt class="docutils literal">exec()</tt>. <strong>Ronald Oussoren</strong> <a class="reference external" href="https://bugs.python.org/issue26670#msg262659">commented my patch</a>:</p> <blockquote> Why does this patch execv() the interpreter to set options? I'd expect it to be possible to get the same result by updating the argument parsing code in Py_Main.</blockquote> <p>More on that later :-) <strong>Marc-Andre Lemburg</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2016-March/039325.html">didn't buy the idea</a>:</p> <blockquote> <strong>I'm not sure whether this would make things easier for the majority of developers</strong>, e.g. someone not writing C extensions would likely not be interested in debugging memory allocations or segfaults, someone spending more time on numerics wouldn't bother with bytes warnings, etc.</blockquote> <p>Opinion shared by <strong>Ethan Furman</strong>, so I gave up at this point, closed my issue and my PR.</p> </div> <div class="section" id="async-keyword-deprecationwarning-and-pep-565"> <h2>async keyword, DeprecationWarning and PEP 565</h2> <p>At November 1, 2017, Ned Deily, the Python 3.7 release release, sent an email to python-dev: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-November/150061.html">Reminder: 12 weeks to 3.7 feature code cutoff</a>.</p> <p>A discussion started on <tt class="docutils literal">async</tt> and <tt class="docutils literal">await</tt> becoming keywords and how this incompatible change was conducted. Read LWN article <a class="reference external" href="https://lwn.net/Articles/740804/">Who should see Python deprecation warnings?</a> (December 2017) by Jonathan Corbet for the whole story:</p> <blockquote> In early November, one sub-thread of a big discussion on preparing for the Python 3.7 release focused on the await and async identifiers. They will become keywords in 3.7, meaning that any code using those names for any other purpose will break. Nick Coghlan observed that <strong>Python 3.6 does not warn</strong> about the use of those names, calling it &quot;a fairly major oversight/bug&quot;. <strong>In truth, though, Python 3.6 does emit warnings in that case — but users rarely see them.</strong></blockquote> <p>The question is who should see <tt class="docutils literal">DeprecationWarning</tt>. Long time ago, it has been decided to hide them by default to not bother users. Users are not able to fix them, and so it is only a source of annoyance.</p> <p>If the warning is displayed by default, developers can be annoyed by warnings coming from code that they cannot easily fix, like third-party dependencies.</p> <p>At November 12, 2017, Nick Coghlan proposed <a class="reference external" href="https://www.python.org/dev/peps/pep-0565/">PEP 565: Show DeprecationWarning in __main__</a> as a compromise:</p> <blockquote> This change will mean that code entered at the interactive prompt and code in single file scripts will revert to reporting these warnings by default, while they will <strong>continue to be silenced by default for packaged code</strong> distributed as part of an importable module.</blockquote> <p>The PEP has been approved and implemented in Python 3.7. For example, <tt class="docutils literal">DeprecationWarning</tt> is now displayed by default when running a script and in the REPL:</p> <pre class="literal-block"> $ cat example.py import imp $ python3 example.py example.py:1: DeprecationWarning: the imp module is deprecated ... import imp $ python3 Python 3.7.6 (default, Dec 19 2019, 22:52:49) &gt;&gt;&gt; import imp __main__:1: DeprecationWarning: the imp module is deprecated ... </pre> </div> <div class="section" id="development-mode-proposed-on-python-dev"> <h2>Development mode proposed on python-dev</h2> <p>I was not convinced that only displaying warnings in the <tt class="docutils literal">__main__</tt> module is enough to help developers to fix issues in their code. A project is way larger than just this module.</p> <p>I came back with my idea, now on the python-dev list: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-November/150514.html">Add a developer mode to Python: -X dev command line option</a>.</p> <p>This mode shows <tt class="docutils literal">DeprecationWarning</tt> and <tt class="docutils literal">ResourceWarning</tt> is all modules, not only in the <tt class="docutils literal">__main__</tt> module. In my opinion, having an opt-in mode for developers is the best option. Python should not spam users with warnings which are targeting developers.</p> <p><strong>In the context of Python 3.7 incompatible changes, the feedback was way better this time.</strong></p> </div> <div class="section" id="issues-with-the-python-initialization"> <h2>Issues with the Python initialization</h2> <p>When I proposed the idea, my plan was to call exec() to replace the current process with a new process. But when I tried to implement it, it was more tricky than expected. My first blocker issue was to remove <tt class="docutils literal"><span class="pre">-O</span></tt> option from the command line. I hate having to parse the command line: it is very fragile and it's too easy to make mistake.</p> <p>So I tried to write a clean implementation: configure Python properly in &quot;development mode&quot;. The first blocker issue was to implement <tt class="docutils literal">PYTHONMALLOC=debug</tt>. The C code to read and apply the Python configuration used Python objects before the Python initialization even started. For example, <tt class="docutils literal"><span class="pre">-W</span></tt> and <tt class="docutils literal"><span class="pre">-X</span></tt> options were stored as Python lists. It means that the Python memory allocator was used before Python would be able to parse <tt class="docutils literal">PYTHONMALLOC</tt> environment variable.</p> <p>Moreover, the Python configuration is quite complex. Many options are inter-dependent. For example, the <tt class="docutils literal"><span class="pre">-E</span></tt> command line option ignores environment variables with a name staring with <tt class="docutils literal">PYTHON</tt>: like <tt class="docutils literal">PYTHONMALLOC</tt>! Python has to parse the command line before being able to handle <tt class="docutils literal">PYTHONMALLOC</tt>.</p> <p>Python lists depends on the memory allocator which depends on <tt class="docutils literal">PYTHONMALLOC</tt> environment variable which depends on the <tt class="docutils literal"><span class="pre">-E</span></tt> command line option which depends on Python lists...</p> <p>In short, <strong>it wasn't possible to write a clean implementation of the development mode without refactoring the Python initialization code</strong>.</p> </div> <div class="section" id="refactoring-main-c"> <h2>Refactoring main.c</h2> <p>For all these reasons, I refactored Python initialization code in <tt class="docutils literal">main.c</tt>, with <a class="reference external" href="https://bugs.python.org/issue32030">bpo-32030</a> with two <strong>large</strong> changes:</p> <ul class="simple"> <li><a class="reference external" href="https://github.com/python/cpython/commit/f7e5b56c37eb859e225e886c79c5d742c567ee95">commit f7e5b56c</a>: bpo-32030: Split Py_Main() into subfunctions</li> <li><a class="reference external" href="https://github.com/python/cpython/commit/a7368ac6360246b1ef7f8f152963c2362d272183">commit a7368ac6</a>: bpo-32030: Enhance Py_Main()</li> </ul> </div> <div class="section" id="add-x-dev-option"> <h2>Add -X dev option</h2> <p>Since I got enough approval by my peers (core developers), I pushed <a class="reference external" href="https://github.com/python/cpython/commit/ccb0442a338066bf40fe417455e5a374e5238afb">commit ccb0442a</a> of <a class="reference external" href="https://bugs.python.org/issue32043">bpo-32043</a> to add the <tt class="docutils literal"><span class="pre">-X</span> dev</tt> command line option. Thanks to the previous refactoring, the implementation is less intrusive.</p> <p>Effects of the development mode:</p> <ul class="simple"> <li>Add <tt class="docutils literal">default</tt> warnings option. For example, display <tt class="docutils literal">DeprecationWarning</tt> and <tt class="docutils literal">ResourceWarning</tt> warnings.</li> <li>Install <a class="reference external" href="https://docs.python.org/dev/c-api/memory.html#c.PyMem_SetupDebugHooks">debug hooks on memory allocators</a> as if <tt class="docutils literal">PYTHONMALLOC</tt> is set to <tt class="docutils literal">debug</tt>.</li> <li>Enable my <a class="reference external" href="https://docs.python.org/dev/library/faulthandler.html">faulthandler</a> module to dump the Python traceback on a crash.</li> </ul> </div> <div class="section" id="add-pythondevmode-environment-variable"> <h2>Add PYTHONDEVMODE environment variable</h2> <p>In a PR review, Antoine Pitrou <a class="reference external" href="https://github.com/python/cpython/pull/4478#pullrequestreview-77874230">proposed</a>:</p> <blockquote> Speaking of which, perhaps it would be nice to set those environment variables so that child processes launched using subprocess inherit them?</blockquote> <p>I created <a class="reference external" href="https://bugs.python.org/issue32101">bpo-32101</a> to add <tt class="docutils literal">PYTHONDEVMODE</tt> environment variable: <a class="reference external" href="https://github.com/python/cpython/commit/5e3806f8cfd84722fc55d4299dc018ad9b0f8401">commit 5e3806f8</a>.</p> <p>Setting <tt class="docutils literal">PYTHONDEVMODE=1</tt> allows to also enable the development mode in Python child processes, without having to touch their command line.</p> </div> <div class="section" id="enable-asyncio-debug-mode"> <h2>Enable asyncio debug mode</h2> <p>I created <a class="reference external" href="https://bugs.python.org/issue32047">bpo-32047: asyncio: enable debug mode when -X dev is used</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-November/150572.html">asked in the -X dev thread on python-dev</a>:</p> <blockquote> What do you think? Is it ok to include asyncio in the global &quot;developer mode&quot;?</blockquote> <p>Antoine Pitrou didn't like the idea because asyncio debug mode was &quot;quite expensive&quot;, but Yury Selivanov (one of the asyncio maintainers) and Barry Warsaw liked the idea, so I merged my PR: <a class="reference external" href="https://github.com/python/cpython/commit/44862df2eeec62adea20672b0fe2a5d3e160569e">commit 44862df2</a>.</p> <p>Antoine Pitrou created <a class="reference external" href="https://bugs.python.org/issue31970">bpo-31970: asyncio debug mode is very slow</a>. Hopefully, he found a way to make asyncio debug mode more efficient by truncating tracebacks to 10 frames (<a class="reference external" href="https://github.com/python/cpython/commit/921e9432a1461bbf312c9c6dcc2b916be6c05fa0">commit 921e9432</a>).</p> </div> <div class="section" id="fix-warnings-filters"> <h2>Fix warnings filters</h2> <p>While checking warnings filters, I noticed that the development mode was hiding some ResourceWarning warnings. I completed the documentation and fixed warnings filters in <a class="reference external" href="https://bugs.python.org/issue32089">bpo-32089</a>.</p> </div> <div class="section" id="python-3-8-logs-close-exception"> <h2>Python 3.8 logs close() exception</h2> <p>By default, Python ignores silently <tt class="docutils literal">EBADF</tt> error (bad file descriptor) which can lead to a <strong>severe crash</strong> , <a class="reference external" href="https://bugs.python.org/issue18748">bpo-18748</a> (simplified gdb traceback):</p> <pre class="literal-block"> Program received signal SIGABRT, Aborted. [Switching to Thread 0xb7b0eb70 (LWP 17152)] 0xb7fe1424 in __kernel_vsyscall () (gdb) bt #0 0xb7fe1424 in __kernel_vsyscall () #1 0xb7e4e941 in *__GI_raise (sig=6) #2 0xb7e51d72 in *__GI_abort () #3 0xb7e8ae15 in __libc_message (do_abort=1, fmt=0xb7f606f5 &quot;%s&quot;) #4 0xb7e8af44 in *__GI___libc_fatal (message=0xb7fc75ec &quot;libgcc_s.so.1 must be installed for pthread_cancel to work\n&quot;) #5 0xb7fc4ffa in pthread_cancel_init () #6 0xb7fc509d in _Unwind_ForcedUnwind (...) #7 0xb7fc2b98 in *__GI___pthread_unwind (buf=&lt;optimized out&gt;) #8 0xb7fbcce0 in __do_cancel () at pthreadP.h:265 #9 __pthread_exit (value=0x0) at pthread_exit.c:30 ... </pre> <p>Notice the <tt class="docutils literal">&quot;libgcc_s.so.1 must be installed for pthread_cancel to work&quot;</tt> error message: the glibc loads dynamically <tt class="docutils literal">libgcc_s.so.1</tt> library when a thread completes, but another thread closed its file descriptor!</p> <p>The worst is that <strong>the crash is not deterministic</strong>: it's a <strong>race condition</strong> which requires to try many times, even with an example designed to trigger the crash!</p> <p>Since the <tt class="docutils literal">EBADF</tt> error is silently ignored, it is hard to notice or to debug such issue. I modified the development mode in Python 3.8 to <strong>log close() exceptions in io.IOBase destructor</strong>.</p> <p>It was not accepted to always log the <tt class="docutils literal">close()</tt> exception. So having an opt-in development mode is a good practical compromise!</p> </div> <div class="section" id="python-3-9-checks-encoding-and-errors"> <h2>Python 3.9 checks encoding and errors</h2> <p>In June 2019, my colleague <strong>Miro Hrončok</strong> reported <a class="reference external" href="https://bugs.python.org/issue37388">bpo-37388</a>:</p> <blockquote> <p>I was just bit by specifying an nonexisitng error handler for bytes.decode() without noticing.</p> <p>Consider this code:</p> <pre class="literal-block"> &gt;&gt;&gt; 'a'.encode('cp1250').decode('utf-8', errors='Boom, Shaka Laka, Boom!') 'a' </pre> </blockquote> <p>I modified the development mode in Python 3.9, to also check <em>encoding</em> and <em>errors</em> arguments on string encoding and decoding operations, like <tt class="docutils literal">bytes.decode()</tt> or <tt class="docutils literal">str.encode()</tt>.</p> <p>By default, for best performance, the <em>errors</em> argument is only checked at the first encoding/decoding error and the <em>encoding</em> argument is sometimes ignored for empty strings.</p> <p>Having an opt-in development mode allows to enable additional debug checks at runtime, without having to care too much about the performance overhead.</p> <p>Note: I love the choice of the example, &quot;Boom, Shaka Laka, Boom!&quot; from the game Gruntz :-D</p> </div> <div class="section" id="development-mode-example"> <h2>Development Mode Example</h2> <p>Even in the <tt class="docutils literal">__main__</tt> module with PEP 565, <tt class="docutils literal">ResourceWarning</tt> is still not displayed by default (PEP 565 only shows <tt class="docutils literal">DeprecationWarning</tt>):</p> <pre class="literal-block"> $ python3 -c 'print(len(open(&quot;README.rst&quot;).readlines()))' 39 </pre> <p>The development mode shows the warning:</p> <pre class="literal-block"> $ python3 -X dev -c 'print(len(open(&quot;README.rst&quot;).readlines()))' -c:1: ResourceWarning: unclosed file &lt;_io.TextIOWrapper name='README.rst' mode='r' encoding='UTF-8'&gt; ResourceWarning: Enable tracemalloc to get the object allocation traceback 39 </pre> <p>Not closing a resource explicitly can leave a resource open for way longer than expected. It can cause severe issues at Python exit. It is bad in CPython, but it is even worse in PyPy. <strong>Closing resources explicitly makes an application more deterministic and more reliable.</strong></p> <p>If one of the development mode effect causes an issue, it is still possible to override most options. For example, <tt class="docutils literal">PYTHONMALLOC=default python3 <span class="pre">-X</span> dev ...</tt> command enables the development mode without installing debug hooks on memory allocators.</p> </div> Pass the Python thread state explicitly2020-01-08T15:00:00+01:002020-01-08T15:00:00+01:00Victor Stinnertag:vstinner.github.io,2020-01-08:/cpython-pass-tstate.html<img alt="Python C API" src="https://vstinner.github.io/images/capi.jpg" /> <div class="section" id="keeping-python-competitive"> <h2>Keeping Python competitive</h2> <p>I'm trying to find ways to make Python more efficient for many years, see for example my discussion at the Language Summit during Pycon US 2017: <a class="reference external" href="https://lwn.net/Articles/723949/">Keeping Python competitive</a> (LWN article); <a class="reference external" href="https://github.com/vstinner/talks/blob/master/2017-PyconUS/summit.pdf">slides</a>. At EuroPython 2019 (Basel), I gave the keynote &quot;Python Performance: Past, Present and Future&quot;: <a class="reference external" href="https://github.com/vstinner/talks/blob/master/2019-EuroPython/python_performance.pdf">slides …</a></p></div><img alt="Python C API" src="https://vstinner.github.io/images/capi.jpg" /> <div class="section" id="keeping-python-competitive"> <h2>Keeping Python competitive</h2> <p>I'm trying to find ways to make Python more efficient for many years, see for example my discussion at the Language Summit during Pycon US 2017: <a class="reference external" href="https://lwn.net/Articles/723949/">Keeping Python competitive</a> (LWN article); <a class="reference external" href="https://github.com/vstinner/talks/blob/master/2017-PyconUS/summit.pdf">slides</a>. At EuroPython 2019 (Basel), I gave the keynote &quot;Python Performance: Past, Present and Future&quot;: <a class="reference external" href="https://github.com/vstinner/talks/blob/master/2019-EuroPython/python_performance.pdf">slides</a> and <a class="reference external" href="https://www.youtube.com/watch?v=T6vC_LOHBJ4&amp;feature=youtu.be&amp;t=1875">video</a>. I gave my vision on the Python performance and listed 3 projects to speedup Python that I consider as realistic:</p> <ul class="simple"> <li>subinterpreters: see Eric Snow's <a class="reference external" href="https://github.com/ericsnowcurrently/multi-core-python/">multi-core-python</a> project</li> <li>better C API: see <a class="reference external" href="https://github.com/pyhandle/hpy">HPy (new C API)</a> and <a class="reference external" href="https://pythoncapi.readthedocs.io/">pythoncapi.readthedocs.io</a></li> <li>tracing garbage collector for CPython</li> </ul> <p>This article is about <strong>subinterpreters</strong>.</p> </div> <div class="section" id="subinterpreters"> <h2>Subinterpreters</h2> <p>Eric Snow is working on subinterpreters since 2015, see his first blog post published in September 2016: <a class="reference external" href="http://ericsnowcurrently.blogspot.com/2016/09/solving-mutli-core-python.html">Solving Multi-Core Python</a>. See Eric Snow's <a class="reference external" href="https://github.com/ericsnowcurrently/multi-core-python/wiki">multi-core-python project wiki</a> for the whole history.</p> <p>In September 2017, he wrote a concrete proposal: <a class="reference external" href="https://www.python.org/dev/peps/pep-0554/">PEP 554: Multiple Interpreters in the Stdlib</a>.</p> <p>Eric mentions the <a class="reference external" href="https://www.python.org/dev/peps/pep-0432/">PEP 432: Simplifying the CPython startup sequence</a> as one blocker issue. I fixed this issue (at least for the subinterpreters case) with my <a class="reference external" href="https://www.python.org/dev/peps/pep-0587/">PEP 587: Python Initialization Configuration</a> that I implemented in Python 3.8.</p> <p>Sadly, implementing subinterpreters in the 30 years old CPython project is hard since a lot of code has to be updated. CPython is made of not less than <strong>603K lines of C code</strong> (and 815K lines of Python code)!</p> <p>In May 2018, at CPython sprint during Pycon US, I discussed subinterpreters with Eric Snow and Nick Coghlan. I draw an overview of Python internals and the different &quot;states&quot; on a whiteboard:</p> <img alt="Python states" src="https://vstinner.github.io/images/subinterpreters2.jpg" /> <p>Python and Python subinterpreter lifecycles (creation and finalization):</p> <img alt="Python subinterpreter lifecycle" src="https://vstinner.github.io/images/subinterpreters1.jpg" /> <p>As a follow-up of this meeting, I wrote down the current state and what should be done: <a class="reference external" href="https://pythoncapi.readthedocs.io/runtime.html">Reorganize Python “runtime”</a>.</p> </div> <div class="section" id="getting-the-current-python-thread-state"> <h2>Getting the current Python thread state</h2> <p>In the current master branch of Python, getting the current Python thread state is done using these two macros:</p> <pre class="literal-block"> #define _PyRuntimeState_GetThreadState(runtime) \ ((PyThreadState*)_Py_atomic_load_relaxed(&amp;(runtime)-&gt;gilstate.tstate_current)) #define _PyThreadState_GET() _PyRuntimeState_GetThreadState(&amp;_PyRuntime) </pre> <p>These macros depend on the global <tt class="docutils literal">_PyRuntime</tt> variable: instance of the <tt class="docutils literal">_PyRuntimeState</tt> structure. There is exactly one instance of <tt class="docutils literal">_PyRuntimeState</tt>: data shared by all interpreters on purpose (more info about <tt class="docutils literal">_PyRuntimeState</tt> below).</p> <p><tt class="docutils literal">_Py_atomic_load_relaxed()</tt> uses an atomic operation which may become an performance issue if Python is modified to get the Python thread state in more places. I tried to check if it uses a slow atomic read instruction, but it seems like only a write uses an explicit memory fence operation: read seems to be &quot;free&quot; (it's a regular efficient <tt class="docutils literal">MOV</tt> instruction). I only checked the x86-64 machine code, it may be different on other architectures.</p> </div> <div class="section" id="gil-state"> <h2>GIL state</h2> <p>Currently, the <tt class="docutils literal">_PyRuntimeState</tt> structure has a <tt class="docutils literal">gilstate</tt> field which is shared between all subinterpreters. The long term goal of the PEP 554 (subinterpreters) is to <strong>have one GIL per subinterpeters</strong> to <strong>execute multiple interpreters in parallel</strong>. Currently, only one interpreter can be executed at the same time: there is no parallelism, except if a thread releases the GIL which is not the common case.</p> <p>It's tracked by these two issues:</p> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue10915">Make the PyGILState API compatible with multiple interpreters</a></li> <li><a class="reference external" href="https://bugs.python.org/issue15751">Support subinterpreters in the GIL state API</a></li> </ul> <p>I expect that fixing this issue may require to add a lock somewhere which <strong>can hurt performances</strong>, depending on how the GIL state is accessed.</p> </div> <div class="section" id="passing-a-state-to-internal-function-calls"> <h2>Passing a state to internal function calls</h2> <p>To avoid any risk of performance penality with incoming Python internal changes for subinterpreters, but also to make things more explicit, I proposed to <strong>pass explicitly &quot;a state&quot; to internal C function calls</strong>.</p> <p>First, it wasn't obvious which &quot;state&quot; should be passed: <tt class="docutils literal">_PyRuntimeState</tt>, <tt class="docutils literal">PyThreadState</tt>, a structure containing both, or something else?</p> <p>Moreover, it was unclear how to get the runtime from <tt class="docutils literal">PyThreadState</tt>, and how to get <tt class="docutils literal">PyThreadState</tt> from runtime?</p> <p>I started to <strong>pass runtime to some functions</strong> (<tt class="docutils literal">_PyRuntimeState</tt>): <a class="reference external" href="https://bugs.python.org/issue36710">Pass _PyRuntimeState as an argument rather than using the _PyRuntime global variable</a>.</p> <p>Then I pushed more changes to <strong>pass tstate to some other functions</strong> (<tt class="docutils literal">PyThreadState</tt>): <a class="reference external" href="https://bugs.python.org/issue38644">Pass explicitly tstate to function calls</a>.</p> <p>I added <tt class="docutils literal">PyInterpreterState.runtime</tt> so getting <tt class="docutils literal">_PyRuntimeState</tt> from <tt class="docutils literal">PyThreadState</tt> is now done using: <tt class="docutils literal"><span class="pre">tstate-&gt;interp-&gt;runtime</span></tt>. It's no longer needed to pass <tt class="docutils literal">runtime</tt> <strong>and</strong> <tt class="docutils literal">tstate</tt> to internal functions: <tt class="docutils literal">tstate</tt> is enough.</p> <p>Slowly, I modified the internals to only pass <tt class="docutils literal">tstate</tt> to internal functions: <strong>tstate should become the root object to access all Python states</strong>.</p> <p>I ended with a thread on the python-dev mailing list to summarize this work: <a class="reference external" href="https://mail.python.org/archives/list/python-dev&#64;python.org/thread/PQBGECVGVYFTVDLBYURLCXA3T7IPEHHO/#Q4IPXMQIM5YRLZLHADUGSUT4ZLXQ6MYY">Pass the Python thread state to internal C functions</a>. The feedback was quite positive, most core developers agreed that passing explicitly tstate is a good practice and the work should be continued.</p> </div> <div class="section" id="pyruntimestate-and-pyinterpreterstate"> <h2>_PyRuntimeState and PyInterpreterState</h2> <p>Currently, some <tt class="docutils literal">_PyRuntimeState</tt> fields are shared by all interperters, whereas they should be moved into <tt class="docutils literal">PyInterpreterState</tt>: it's still a work in progress.</p> <p>For example, I continued the work started by Eric Snow to move the garbage collector state from <tt class="docutils literal">_PyRuntimeState</tt> to <tt class="docutils literal">PyInterpreterState</tt>: <a class="reference external" href="https://bugs.python.org/issue36854">GC operates out of global runtime state.</a>.</p> <p>As explained above, another example is <tt class="docutils literal">gilstate</tt> that should also be moved to <tt class="docutils literal">PyInterpreterState</tt>, but that's a complex change that should be well prepared to not break anything.</p> </div> <div class="section" id="more-subinterpreter-work"> <h2>More subinterpreter work</h2> <p>Implementing subinterpreters also requires to cleanup various parts of Python internals.</p> <p>For example, I modified Python so Py_NewInterpreter() and Py_EndInterpreter() (create and finalize a subinterpreter) share more code with Py_Initialize() and Py_Finalize() (create and finalize the <strong>main</strong> interpreter): <a class="reference external" href="https://bugs.python.org/issue38858">new_interpreter() should reuse more Py_InitializeFromConfig() code</a>.</p> <p>They are still many issues to be fixed: <strong>it's moving slowly but steadily!</strong></p> </div> Graphics bugs in Firefox and GNOME2019-10-10T17:00:00+02:002019-10-10T17:00:00+02:00Victor Stinnertag:vstinner.github.io,2019-10-10:/graphics-bugs-firefox-gnome.html<p>After explaining how to <a class="reference external" href="https://vstinner.github.io/debug-hybrid-graphics-issues-linux.html">Debug Hybrid Graphics issues on Linux</a>, here is the story of four graphics bugs that I had in GNOME and Firefox on my Fedora 30 between May 2018 and September 2019: bugs in gnome-shell, Gtk, Firefox and mutter.</p> <a class="reference external image-reference" href="https://www.flickr.com/photos/34298393&#64;N06/14488759356/"> <img alt="Glitch" src="https://vstinner.github.io/images/glitch.jpg" /> </a> <div class="section" id="gnome-shell-freezes"> <h2>gnome-shell freezes</h2> <p>In May 2018, six months after …</p></div><p>After explaining how to <a class="reference external" href="https://vstinner.github.io/debug-hybrid-graphics-issues-linux.html">Debug Hybrid Graphics issues on Linux</a>, here is the story of four graphics bugs that I had in GNOME and Firefox on my Fedora 30 between May 2018 and September 2019: bugs in gnome-shell, Gtk, Firefox and mutter.</p> <a class="reference external image-reference" href="https://www.flickr.com/photos/34298393&#64;N06/14488759356/"> <img alt="Glitch" src="https://vstinner.github.io/images/glitch.jpg" /> </a> <div class="section" id="gnome-shell-freezes"> <h2>gnome-shell freezes</h2> <p>In May 2018, six months after I got my Lenovo P50 laptop, gnome-shell was &quot;sometimes&quot; freezing between 1 and 5 seconds. It was annoying because key stokes created repeated keys writing &quot;helloooooooooooooooooooooo&quot; instead of &quot;hello&quot; for example.</p> <p>My colleagues led my to <tt class="docutils literal"><span class="pre">#fedora-desktop</span></tt> of the GIMP IRC server where I met my colleague <strong>Jonas Ådahl</strong> (jadahl) who almost immediately identified my issue! Extract of the IRC chat:</p> <pre class="literal-block"> 15:03 &lt;vstinner&gt; hello. i upgraded from F27 to F28, and it seems like I switched from Xorg to Wayland. sometimes, the desktop hangs a few milliseconds (less than 2 secondes) 15:03 &lt;vstinner&gt; bentiss told me that &quot;libinput error: client bug: timer event7 keyboard: offset negative (-39ms)&quot; can occur when shell is too slow 15:04 &lt;vstinner&gt; journalctl shows me frenquently the bug https://gitlab.gnome.org/GNOME/gnome-shell/issues/1 &quot;Object Shell.GenericContainer (0x559e6bfddc60), has been already finalized. Impossible to get any property from it.&quot; 15:04 &lt;vstinner&gt; i also get &quot;Window manager warning: last_user_time (3093467) is greater than comparison timestamp (3093466). This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW. Trying to work around...&quot; errors in logs (from shell) 15:05 &lt;vstinner&gt; bentiss: ah, i also get &quot;libinput error: client bug: timer event7 trackpoint: offset negative (-352ms)&quot; errors 15:06 &lt;vstinner&gt; it's a recent laptop, Lenovo P50: 32 GB of RAM, 4 physical CPUs (8 threads) Intel(R) Core(TM) i7-6820HQ CPU &#64; 2.70GHz 15:06 &lt;vstinner&gt; so. what can i do to debug such performance issue? may it come from shell? what does it mean if shell is slow? can it be a GPU issue? a javascript issue? ... 15:13 &lt;jadahl&gt; vstinner: whats your hardware? Do you have a hybrid gpu system? 15:13 &lt;jadahl&gt; ah, yes P50 15:14 &lt;jadahl&gt; vstinner: there is a branch on mutter upstream that fixes that issue. want to compile it to test? </pre> <p>Ten minutes after I asked my question, Jonas asked the right question: <strong>Do you have a hybrid gpu system?</strong></p> <p>I was able to workaround the issue by connecting my laptop to my TV using the HDMI port:</p> <pre class="literal-block"> 15:22 &lt; jadahl&gt; for example, IIRC if you have a monitor connected to the HDMI, the issue will go away since the secondary GPU is always awake anyway ... 15:31 &lt; vstinner&gt; jadahl: i plugged a HDMI cable to my TV and it seems like the issue is gone 15:31 &lt; vstinner&gt; jadahl: impressive </pre> <p>When an external monitor is used (like a TV plugged on the HDMI port), my NVIDIA GPU is always active which works around the bug I had in gnome-shell.</p> <p>Jonas provided me a RPM package for Fedora including his work-in-progress fix: <a class="reference external" href="https://gitlab.gnome.org/GNOME/mutter/merge_requests/106">Upload HW cursor sprite on-demand</a>. I confirmed that this change fixed my bug. His mutter change has been merged upstream.</p> </div> <div class="section" id="firefox-crash-when-selecting-text"> <h2>Firefox crash when selecting text</h2> <p>In March 2019, Firefox with Wayland crashed on <tt class="docutils literal">wl_abort()</tt> when selecting more than 4000 characters in a <tt class="docutils literal">&lt;textarea&gt;</tt>. I found the bug in Gmail when selecting the whole email text to remove it. Pressing <strong>CTRL + A</strong> or Right-click + Select All <strong>crashed the whole Firefox process!</strong></p> <p>I reported the bug to Firefox: <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1539773">Firefox with Wayland crash on wl_abort() when selecting more than 4000 characters in a &lt;textarea&gt;</a>.</p> <p>Running gdb in Firefox caused me some troubles since it's a very large binary with many libraries. I also read <a class="reference external" href="https://cgit.freedesktop.org/wayland/wayland-protocols/tree/unstable/text-input/text-input-unstable-v3.xml#n138">Wayland protocol specifications</a>. I managed to analyze the bug and so I reported the bug to Gtk as well, <a class="reference external" href="https://gitlab.gnome.org/GNOME/gtk/issues/1783">On Wayland, notify_surrounding_text() crash on wl_abort() if text is longer than 4000 bytes</a>:</p> <blockquote> According to gdb, <tt class="docutils literal">wl_proxy_marshal_array_constructor_versioned()</tt> calls <tt class="docutils literal">wl_abort()</tt> because the buffer is too short. It seems like <tt class="docutils literal">wl_buffer_put()</tt> fails with <tt class="docutils literal">E2BIG</tt>.</blockquote> <p>Quickly, I identified that <strong>my Gtk bug has already been fixed 3 months before by Carlos Garnacho</strong> (<a class="reference external" href="https://gitlab.gnome.org/GNOME/gtk/merge_requests/438">imwayland: Respect maximum length of 4000 Bytes on strings being sent</a>) and <strong>the fix is part of gtk-3.24.3</strong> (&quot;wayland: Respect length limits in text protocol&quot; says &quot;Overview of Changes in GTK+ 3.24.3&quot;).</p> <p>I requested to upgrade Gtk in Fedora. But it was not possible since the newer version changed the theme. I was asked to cherry-pick the fix and that's what I did: <a class="reference external" href="https://src.fedoraproject.org/rpms/gtk3/pull-request/5">imwayland: Respect maximum length of 4000 Bytes on strings</a>.</p> <p>My PR was merged and a new package was built. I tested it and confirmed that it fixed the crash: <a class="reference external" href="ttps://bodhi.fedoraproject.org/updates/FEDORA-2019-d67ec97b0b">FEDORA-2019-d67ec97b0b</a>. Soon, the package was pushed to the public Fedora package repository.</p> <p><strong>That's the cool part about open source: if you have the skills to hack the code, you can fix an annoying which is affecting you!</strong></p> </div> <div class="section" id="firefox-wayland-window-partially-or-not-updated-when-switching-between-two-tabs"> <h2>Firefox: [Wayland] Window partially or not updated when switching between two tabs</h2> <div class="section" id="analyze-the-bug"> <h3>Analyze the bug</h3> <p>In September 2019, after a large system upgrade (install 6 packages, upgrade 234 packages, remove 5 packages), Firefox started to not update the window content sometimes when I switched from one tab to another. Example:</p> <img alt="Firefox bug of window partially updated" src="https://vstinner.github.io/images/firefox_bug_1.jpg" /> <p>It took me a few hours to analyze the bug to be able to produce an useful bug report.</p> <p>I followed Fedora's guide <a class="reference external" href="https://fedoraproject.org/wiki/How_to_debug_Firefox_problems">How to debug Firefox problems</a> advices.</p> <p>First, I tried to <strong>understand which GPU driver is used</strong>. I finished by blacklisting the nouveau driver in the Linux kernel, to ensure that Firefox was using my Intel IGP. I still reproduced the bug.</p> <p>I <strong>disabled all Firefox extensions</strong>: bug reproduced.</p> <p>Then I created a new Firefox profile and started Firefox in <strong>safe mode</strong>: bug reproduced.</p> <p>I tested the latest Firefox binary from mozilla.org (Firefox 69.0): bug reproduced.</p> <p>Finally, <strong>I tested Firefox Nightly</strong> from mozilla.org (Firefox 71.0a1): bug reproduced.</p> <p>Ok, it was enough data to produce an interesting bug report. I reported <a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1580152">[Wayland] Window partially or not updated when switching between two tabs</a> to Firefox.</p> </div> <div class="section" id="identify-the-regression-using-fedora-packages"> <h3>Identify the regression using Fedora packages</h3> <p>Then I looked at <tt class="docutils literal">/var/log/dnf.log</tt> and I tried to identify which package update could explain the regression.</p> <p>I downgraded <strong>gtk3</strong>-3.24.11-1.fc30.x86_64 to gtk3.x86_64 3.24.10-1.fc30: bug reproduced.</p> <p>I rebooted on oldest available <strong>Linux kernel</strong>, version 5.2.8-200.fc30.x86_64: bug reproduced. I checked journalctl logs to check which Linux version I was running whhen the bug was first seen: Linux 5.2.9-200.fc30.x86_64.</p> <p>I don't know why, but <strong>downgrading Firefox was only my 3rd test</strong>.</p> <p>I downgraded firefox-69.0-2.fc30.x86_64 to firefox-68.0.2-1.fc30.x86_64: the bug is gone! Ok, so <strong>the regression comes from the Firefox package</strong>, and it was introduced between package versions 68.0.2-1.fc30 and 69.0-2.fc30.</p> <p>On IRC, I met my colleague <strong>Martin Stránský</strong> who package Firefox for Fedora. He told me that he is aware of my bug and may have a fix for my bug. Great!</p> <p>Only 9 days later, <strong>Martin Stránský</strong> fix has been merged in Firefox upstream, released in Firefox Nightly, and a new package has been shipped in Fedora 30! Thanks Martin for your efficiency!</p> <p>The final Firefox change is quite large and intrusive: <a class="reference external" href="https://hg.mozilla.org/releases/mozilla-beta/rev/3281a617f22b">[Wayland] Fix rendering glitches on wayland</a></p> </div> </div> <div class="section" id="xwayland-crash-in-xwl-glamor-gbm-create-pixmap"> <h2>Xwayland crash in xwl_glamor_gbm_create_pixmap()</h2> <p>In September 2019, while I was debugging the previous Firefox bug, I started my IRC client hexchat. Suddently, <strong>Xwayland crashed which closed my whole Gnome session</strong>! I was testing various GPU configurations to analyze the Firefox bug.</p> <p>ABRT managed to rebuild an useless traceback and identified an existing bug report. It added my coment to <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1729200#c20">[abrt] xorg-x11-server-Xwayland: OsLookupColor(): Segmentation fault at address 0x28</a> report.</p> <p>At July 26, 2019 (1 month before I got the bug), <strong>Olivier Fourdan</strong> added <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1729200#c9">an interesting comment</a>:</p> <blockquote> <tt class="docutils literal">glamor_get_modifiers+0x767</tt> is <tt class="docutils literal">xwl_glamor_gbm_create_pixmap()</tt> so this is the same as <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1729925">bug 1729925</a> fixed upstream with <a class="reference external" href="https://gitlab.freedesktop.org/xorg/xserver/merge_requests/242">xwayland: Do not free a NULL GBM bo</a>.</blockquote> <p>So in fact, my bug was already fixed by <strong>Olivier Fourdan</strong> in Xwayland upstream, but the fix didn't land into Fedora yet.</p> </div> <div class="section" id="thanks"> <h2>Thanks!</h2> <p>I would like to thank the following developers who fixed my Fedora 30. What a coincidence, all four are my collagues! It seems like Red Hat is investing in the Linux desktop :-)</p> <p><a class="reference external" href="https://blogs.gnome.org/carlosg/">Carlos Garnacho</a> (Red Hat).</p> <a class="reference external image-reference" href="https://www.flickr.com/photos/183829480&#64;N06/48623543091/in/pool-14662216&#64;N23/"> <img alt="Carlos Garnacho" src="https://vstinner.github.io/images/carlos_garnacho.jpg" /> </a> <p><a class="reference external" href="https://gitlab.gnome.org/jadahl">Jonas Ådahl</a> (Red Hat).</p> <a class="reference external image-reference" href="https://www.flickr.com/photos/183829480&#64;N06/48623189663/in/pool-14662216&#64;N23/"> <img alt="Jonas Ådahl" src="https://vstinner.github.io/images/jonas_adahl.jpg" /> </a> <p><a class="reference external" href="http://people.redhat.com/stransky/">Martin Stránský</a> (Red Hat).</p> <a class="reference external image-reference" href="http://people.redhat.com/stransky/"> <img alt="Martin Stránský" src="https://vstinner.github.io/images/mstransky.jpg" /> </a> <p><a class="reference external" href="https://en.wikipedia.org/wiki/Olivier_Fourdan">Olivier Fourdan</a> (Red Hat).</p> <a class="reference external image-reference" href="https://en.wikipedia.org/wiki/Olivier_Fourdan"> <img alt="Olivier Fourdan" src="https://vstinner.github.io/images/olivier_fourdan.jpg" /> </a> </div> Debug Hybrid Graphics issues on Linux2019-09-11T15:50:00+02:002019-09-11T15:50:00+02:00Victor Stinnertag:vstinner.github.io,2019-09-11:/debug-hybrid-graphics-issues-linux.html<p><a class="reference external" href="https://wiki.archlinux.org/index.php/Hybrid_graphics">Hybrid Graphics</a> is a complex hardware and software solution to achieve longer laptop battery life: an <strong>integrated</strong> graphics device is used by default, and a <strong>discrete</strong> graphics device with higher graphics performances is enabled on demand.</p> <a class="reference external image-reference" href="https://www.theregister.co.uk/2010/02/09/inside_nvidia_optimus/"> <img alt="Hybrid Graphics" src="https://vstinner.github.io/images/hybrid_graphics.jpg" /> </a> <p>If it is designed and implemented carefully, users should not notice that a laptop …</p><p><a class="reference external" href="https://wiki.archlinux.org/index.php/Hybrid_graphics">Hybrid Graphics</a> is a complex hardware and software solution to achieve longer laptop battery life: an <strong>integrated</strong> graphics device is used by default, and a <strong>discrete</strong> graphics device with higher graphics performances is enabled on demand.</p> <a class="reference external image-reference" href="https://www.theregister.co.uk/2010/02/09/inside_nvidia_optimus/"> <img alt="Hybrid Graphics" src="https://vstinner.github.io/images/hybrid_graphics.jpg" /> </a> <p>If it is designed and implemented carefully, users should not notice that a laptop has two graphical devices.</p> <p>Sadly, the Linux implementation is not perfect yet. I had to debug different graphics issues on GNOME last months, so I decided to write down an article about this technology.</p> <p>This article is about the <strong>GNOME</strong> desktop environment with <strong>Wayland</strong> running on <strong>Fedora</strong> 30, with Linux kernel <strong>vgaswitcheroo</strong> in muxless mode (more about that above).</p> <div class="section" id="hybrid-graphics-1"> <h2>Hybrid Graphics</h2> <p>Hybrid Graphics are known under different names:</p> <ul class="simple"> <li>Linux kernel <a class="reference external" href="https://www.kernel.org/doc/html/latest/gpu/vga-switcheroo.html">vgaswitcheroo</a></li> <li><a class="reference external" href="https://wiki.archlinux.org/index.php/PRIME">PRIME</a> in Linux open source GPU drivers (nouveau, ati, amdgpu and intel), the &quot;muxless&quot; flavor of hybrid graphics</li> <li><a class="reference external" href="https://wiki.archlinux.org/index.php/bumblebee">Bumblebee</a>: <a class="reference external" href="https://wiki.archlinux.org/index.php/NVIDIA_Optimus">NVIDIA Optimus</a> for Linux</li> <li>&quot;AMD Dynamic Switchable Graphics&quot; for Radeon</li> <li>&quot;Dual GPUs&quot;</li> <li>etc.</li> </ul> <p>Nowadays, most manufacturers utilizes the <strong>muxless</strong> model:</p> <blockquote> Dual GPUs but <strong>only one of them is connected to outputs</strong>. The other one is merely used to <strong>offload rendering</strong>, its results are copied over PCIe into the framebuffer. On Linux this is supported with DRI PRIME.</blockquote> <p>In 2010, the first generation hybrid model used the <strong>muxed</strong> model:</p> <blockquote> Dual GPUs with a hardware multiplexer chip to switch outputs between GPUs. This model makes the user choose (at boot time or at login time) between the two power/graphics profiles and is almost fixed throughout the user session.</blockquote> <p>Note: The development to support hybrid graphics in Linux started in 2010.</p> </div> <div class="section" id="does-my-linux-have-hybrid-graphics"> <h2>Does my Linux have Hybrid Graphics?</h2> <p>On Linux, Hybrid Graphics is used if the <tt class="docutils literal">/sys/kernel/debug/vgaswitcheroo/</tt> directory exists.</p> <p>No Hybrid Graphics, single graphics device:</p> <pre class="literal-block"> $ sudo cat /sys/kernel/debug/vgaswitcheroo/switch cat: /sys/kernel/debug/vgaswitcheroo/switch: No such file or directory </pre> <p>Hybrid Graphics with two graphics devices:</p> <pre class="literal-block"> $ sudo cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynOff:0000:01:00.0 </pre> <p>Command to list graphics devices:</p> <pre class="literal-block"> $ lspci|grep VGA 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06) 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2) </pre> </div> <div class="section" id="hardware"> <h2>Hardware</h2> <p>My employer gave me a Lenovo P50 laptop to work in December 2017. It is my only computer at home, so I needed a powerful laptop (even if it's heavy for traveling to conferences). The CPU, RAM and battery are great, but the hybrid graphics caused me some headaches.</p> <p>My Lenovo P50 has two GPUs:</p> <pre class="literal-block"> $ lspci|grep VGA 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06) 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2) </pre> <ul class="simple"> <li>The <strong>Integrated Graphics Device</strong> is a <strong>Intel</strong> IGP (Intel HD Graphics 530)</li> <li>The <strong>Discrete Graphics Device</strong> is a <strong>NVIDIA</strong> GPU (NVIDIA Quadro M1000M)</li> </ul> <p>I didn't know that that the laptop had two graphics device when I chose the laptop model. I discovered hybrid graphics when I started to debug graphics issues.</p> </div> <div class="section" id="bios"> <h2>BIOS</h2> <p>Hybrid graphics can be configured in the BIOS:</p> <ul class="simple"> <li><strong>Discrete Graphics mode</strong> will achieve higher graphics performances.</li> <li><strong>Hybrid Graphics mode</strong> (default) runs as Integrated Graphics mode to achieve longer battery life, and Discrete Graphics is enabled on demand.</li> </ul> <p>On my Lenovo P50, using the <strong>Discrete Graphics mode</strong> removes &quot;00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530&quot; from <tt class="docutils literal">lspci</tt> command output: the <strong>Intel IGP is fully disabled</strong>. The Linux kernel only sees the NVIDIA GPU.</p> </div> <div class="section" id="linux-kernel"> <h2>Linux kernel</h2> <p>On Linux, hybrid graphics is handled by <strong>vgaswitcheroo</strong>:</p> <pre class="literal-block"> $ sudo cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:02.0 1:DIS: :DynPwr:0000:01:00.0 </pre> <ul class="simple"> <li><tt class="docutils literal">IGD</tt> stands for <strong>Integrated</strong> Graphics Device</li> <li><tt class="docutils literal">DIS</tt> stands for <strong>DIScrete</strong> Graphics Device</li> <li>&quot;+&quot; marks the <strong>active</strong> card</li> <li><tt class="docutils literal">Pwr</tt>: the graphics device is <strong>always active</strong></li> <li><tt class="docutils literal">DynPwr</tt>: the graphics device is actived <strong>on demand</strong></li> </ul> <p>The last field (ex: <tt class="docutils literal">0000:00:02.0</tt>) is based on the PCI identifier:</p> <pre class="literal-block"> $ lspci|grep VGA 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06) 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2) </pre> <p>On my laptop, hybrid graphics is detected by an <a class="reference external" href="https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface">ACPI</a> &quot;Device-Specific Method&quot; (DSM):</p> <pre class="literal-block"> $ journalctl -b -k|grep 'VGA switcheroo' Sep 11 02:29:54 apu kernel: VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle </pre> <p>See: <a class="reference external" href="https://www.kernel.org/doc/html/latest/gpu/vga-switcheroo.html">VGA Switcheroo (Linux kernel documentation)</a>.</p> </div> <div class="section" id="opengl"> <h2>OpenGL</h2> <p><a class="reference external" href="https://en.wikipedia.org/wiki/Mesa_(computer_graphics)">Mesa</a> provides <tt class="docutils literal">glxinfo</tt> utility to get information about the OpenGL driver currently used:</p> <pre class="literal-block"> $ glxinfo|grep -E 'Device|direct rendering' direct rendering: Yes Device: Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2) (0x191b) </pre> <p>On this example, the discrete Intel IGP is used.</p> <p>In Firefox, go to <strong>about:support</strong> page and search for the <tt class="docutils literal">Graphics</tt> section to get information about compositing, WebGL, GPU, etc.</p> </div> <div class="section" id="dri-prime-environment-variable"> <h2>DRI_PRIME environment variable</h2> <p>Set DRI_PRIME=1 environment variable to run an application with the <strong>discrete</strong> GPU.</p> <p>Example:</p> <pre class="literal-block"> $ DRI_PRIME=1 glxinfo|grep -E 'Device|rendering' direct rendering: Yes Device: NV117 (0x13b1) </pre> </div> <div class="section" id="switcheroo-control"> <h2>switcheroo-control</h2> <p><a class="reference external" href="https://github.com/hadess/switcheroo-control">switcheroo-control</a> is a deamon controlling <tt class="docutils literal">/sys/kernel/debug/vgaswitcheroo/switch</tt> (Linux kernel). It can be accessed by DBus.</p> <p>When the daemon starts, it looks for <tt class="docutils literal">xdg.force_integrated=VALUE</tt> parameter in the Linux command line. If <em>VALUE</em> is <tt class="docutils literal">1</tt>, <tt class="docutils literal">true</tt> or <tt class="docutils literal">on</tt>, or if <tt class="docutils literal">xdg.force_integrated=VALUE</tt> is not found in the command line, the daemon writes <tt class="docutils literal">DIGD</tt> into <tt class="docutils literal">/sys/kernel/debug/vgaswitcheroo/switch</tt> (delayed <strong>switch to the integrated graphics device</strong>: my Intel IGP)</p> <p>If <tt class="docutils literal">xdg.force_integrated=0</tt> is found in the command line, the daemon leaves <tt class="docutils literal">/sys/kernel/debug/vgaswitcheroo/switch</tt> unchanged.</p> <p>systemd:</p> <ul class="simple"> <li>Check if the service is running: <tt class="docutils literal">sudo systemctl status <span class="pre">switcheroo-control.service</span></tt></li> <li>Disable the service: <tt class="docutils literal">sudo systemctl disable <span class="pre">switcheroo-control.service</span></tt> and <tt class="docutils literal">sudo systemctl stop <span class="pre">switcheroo-control.service</span></tt></li> </ul> <p>On Fedora, switcheroo-control is installed by default.</p> <p>It is unclear to me if this daemon is still useful for my setup. It seems like the the Linux kernel switcheroo uses the integrated Intel IGP by default anyway.</p> </div> <div class="section" id="disable-the-discrete-gpu-by-blacklisting-its-driver"> <h2>Disable the discrete GPU by blacklisting its driver</h2> <p>To debug graphical bugs, I wanted to ensure that the discrete NVIDIA GPU is never used.</p> <p>I found the solution of fully disabling the nouveau driver in the Linux kernel: add <tt class="docutils literal">modprobe.blacklist=nouveau</tt> to the Linux kernel command line. On Fedora, you can use:</p> <pre class="literal-block"> sudo grubby --update-kernel=ALL --args=&quot;modprobe.blacklist=nouveau&quot; </pre> <p>To reenable nouveau, remove the parameter. On Fedora:</p> <pre class="literal-block"> sudo grubby --update-kernel=ALL --remove-args=&quot;modprobe.blacklist=nouveau&quot; </pre> </div> <div class="section" id="demo"> <h2>Demo!</h2> <p>For this test, my laptop is not connected to anything (no power cable, no external monitor, no dock).</p> <p>When my laptop is idle (no 3D application is running), the NVIDIA GPU is <strong>suspended</strong>:</p> <pre class="literal-block"> $ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/enable 0 $ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/power/runtime_status suspended </pre> <p>I explicitly run a 3D application on it:</p> <pre class="literal-block"> DRI_PRIME=1 glxgears </pre> <p>The NVIDIA GPU becomes <strong>active</strong>:</p> <pre class="literal-block"> $ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/enable 2 $ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/power/runtime_status active </pre> <p>I stop the 3D application. A few seconds later, the NVIDIA GPU is <strong>suspended</strong> again:</p> <pre class="literal-block"> $ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/enable 0 $ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/power/runtime_status suspended </pre> </div> <div class="section" id="graphics-devices-and-monitors"> <h2>Graphics devices and monitors</h2> <p>When I disabled the nouveau driver using <tt class="docutils literal">modprobe.blacklist=nouveau</tt> kernel command line parameter, I was no longer able to use external monitors. I understood that:</p> <ul class="simple"> <li>The <strong>Intel</strong> IGP is connected to the <strong>internal</strong> laptop screen</li> <li>The <strong>NVIDIA</strong> GPU is connected to the <strong>external</strong> monitors (DisplayPort and HDMI ports)</li> </ul> <p>When my laptop has <strong>no external monitor</strong> connected, the <strong>discrete</strong> NVIDIA GPU is <strong>actived on demand</strong> (suspended when idle)</p> <p>When I connect my laptop to <strong>two external monitors</strong> (using my dock), the <strong>discrete</strong> NVIDIA GPU is <strong>always active</strong>:</p> <pre class="literal-block"> $ cat /sys/bus/pci/drivers/nouveau/0000\:01\:00.0/power/runtime_status active </pre> </div> <div class="section" id="links"> <h2>Links</h2> <ul class="simple"> <li><a class="reference external" href="https://wiki.archlinux.org/index.php/Hybrid_graphics">https://wiki.archlinux.org/index.php/Hybrid_graphics</a></li> <li><a class="reference external" href="https://www.kernel.org/doc/html/latest/gpu/vga-switcheroo.html">https://www.kernel.org/doc/html/latest/gpu/vga-switcheroo.html</a></li> <li><a class="reference external" href="https://wiki.archlinux.org/index.php/PRIME">https://wiki.archlinux.org/index.php/PRIME</a></li> <li><a class="reference external" href="https://help.ubuntu.com/community/HybridGraphics">https://help.ubuntu.com/community/HybridGraphics</a></li> <li><a class="reference external" href="https://en.wikipedia.org/wiki/Nvidia_Optimus">https://en.wikipedia.org/wiki/Nvidia_Optimus</a></li> <li><a class="reference external" href="https://en.wikipedia.org/wiki/AMD_Hybrid_Graphics">https://en.wikipedia.org/wiki/AMD_Hybrid_Graphics</a></li> <li><a class="reference external" href="https://nouveau.freedesktop.org/wiki/Optimus">https://nouveau.freedesktop.org/wiki/Optimus</a></li> </ul> </div> Split Include/ directory in Python 3.82019-06-19T12:00:00+02:002019-06-19T12:00:00+02:00Victor Stinnertag:vstinner.github.io,2019-06-19:/split-include-directory-python38.html<a class="reference external image-reference" href="https://www.flickr.com/photos/mortengade/2747989334/"> <img alt="Private way. Trespassers and those disposing rubbish will be prosecuted." src="https://vstinner.github.io/images/private_way.jpg" /> </a> <p>In September 2017, during the CPython sprint at Facebook, I proposed my idea to create <a class="reference external" href="https://vstinner.github.io/new-python-c-api.html">A New C API for CPython</a>. I'm still working on the Python C API at: <a class="reference external" href="http://pythoncapi.readthedocs.io/">pythoncapi.readthedocs.io</a>.</p> <p>My analysis is that the C API leaks too many implementation details which prevent to optimize Python …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/mortengade/2747989334/"> <img alt="Private way. Trespassers and those disposing rubbish will be prosecuted." src="https://vstinner.github.io/images/private_way.jpg" /> </a> <p>In September 2017, during the CPython sprint at Facebook, I proposed my idea to create <a class="reference external" href="https://vstinner.github.io/new-python-c-api.html">A New C API for CPython</a>. I'm still working on the Python C API at: <a class="reference external" href="http://pythoncapi.readthedocs.io/">pythoncapi.readthedocs.io</a>.</p> <p>My analysis is that the C API leaks too many implementation details which prevent to optimize Python and make the implementation of PyPy (cpyext) more painful.</p> <p>In Python 3.8, I created <tt class="docutils literal">Include/cpython/</tt> sub-directory to stop adding new APIs to the stable API by mistake.</p> <p>I moved more private functions into the internal C API: <tt class="docutils literal">Include/internal/</tt> directory.</p> <p>I also converted some macros like <tt class="docutils literal">Py_INCREF()</tt> and <tt class="docutils literal">Py_DECREF()</tt> to static inline functions to have well defined parameter and return type, and to avoid macro pitfals.</p> <p>Finally, I removed 3 functions from the C API.</p> <div class="section" id="include-internal"> <h2>Include/internal/</h2> <p>In Python 3.7, <strong>Eric Snow</strong> created <tt class="docutils literal">Include/internal/</tt> sub-directory for the CPython &quot;internal C API&quot;: API which should not be used outside CPython code base. In Python 3.6, these APIs were surrounded by:</p> <pre class="literal-block"> #ifdef Py_BUILD_CORE ... #endif </pre> <p>In Python 3.8, I continued this work by moving more private functions into this directory: see <a class="reference external" href="https://bugs.python.org/issue35081">bpo-35081</a>.</p> <p>I started a thread on python-dev: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2018-October/155587.html">[Python-Dev] Rename Include/internal/ to Include/pycore/</a>. But it was decided to keep <tt class="docutils literal">Include/internal/</tt> name. It was decided that internal header files must not be included implicitly by the generic <tt class="docutils literal">#include &lt;Python.h&gt;</tt>, but included explicitly. For example, when I moved <tt class="docutils literal">_PyObject_GC_TRACK()</tt> and <tt class="docutils literal">_PyObject_GC_UNTRACK()</tt> to the internal C API, I had to add <tt class="docutils literal">#include &quot;pycore_object.h&quot;</tt> to 32 C files!</p> <p><a class="reference external" href="https://bugs.python.org/issue35296">I also modified make install</a> to install this internal C API, so it can be used for specific needs like debuggers or profilers which have to access CPython internals (access structure fields) but cannot call functions. For example, <strong>Eric Snow</strong> moved the <tt class="docutils literal">PyInterpreterState</tt> structure to the internal C API.</p> <p>Installing the internal C API ease the migration of APIs to internal: if an API is still needed after it's moved, it's now possible to opt-in to use it.</p> <p>Using the internal C API requires to define <tt class="docutils literal">Py_BUILD_CORE_MODULE</tt> macro and use a different include, like <tt class="docutils literal">#include &quot;internal/pycore_pystate.h&quot;</tt>. It's more complicated on purpose: ensure that it's not used by mistake.</p> <p>Python 3.8 now provides 21 internal header files:</p> <pre class="literal-block"> pycore_accu.h pycore_getopt.h pycore_pyhash.h pycore_atomic.h pycore_gil.h pycore_pylifecycle.h pycore_ceval.h pycore_hamt.h pycore_pymem.h pycore_code.h pycore_initconfig.h pycore_pystate.h pycore_condvar.h pycore_object.h pycore_traceback.h pycore_context.h pycore_pathconfig.h pycore_tupleobject.h pycore_fileutils.h pycore_pyerrors.h pycore_warnings.h </pre> </div> <div class="section" id="include-cpython"> <h2>Include/cpython/</h2> <p>The <a class="reference external" href="https://www.python.org/dev/peps/pep-0384/">PEP 384 &quot;Defining a Stable ABI&quot;</a> introduced <tt class="docutils literal">Py_LIMITED_API</tt> macro to exclude functions from the Python C API. The problem is when a new API is added, it has to explicitly be excluded using <tt class="docutils literal">#ifndef Py_LIMITED_API</tt>. If the author forgets it, the function is added to be stable API by mistake.</p> <p>I proposed to move the API which should be excluded from the stable ABI to a new subdirectory. I created a <a class="reference external" href="https://discuss.python.org/t/poll-what-is-your-favorite-name-for-the-new-include-subdirectory/477">poll on the sub-directory name</a>:</p> <ul class="simple"> <li><tt class="docutils literal">Include/cpython/</tt></li> <li><tt class="docutils literal">Include/board/</tt></li> <li><tt class="docutils literal">Include/impl/</tt></li> <li><tt class="docutils literal">Include/pycapi/</tt> (the name that I proposed initially)</li> <li><tt class="docutils literal">Include/unstable/</tt></li> <li>other (add comment)</li> </ul> <p>The <tt class="docutils literal">Include/cpython/</tt> name won with 100% of the 3 votes (and a few more supports in the python-dev discussion and in the bug tracker) :-)</p> <p>I created <a class="reference external" href="https://bugs.python.org/issue35134">bpo-35134: Add a new Include/cpython/ subdirectory for the &quot;CPython API&quot; with implementation details</a>.</p> <p>My initial description of the directory content:</p> <blockquote> The new subdirectory will contain <tt class="docutils literal">#ifndef Py_LIMITED_API</tt> code, not the “Stable ABI” of <a class="reference external" href="https://www.python.org/dev/peps/pep-0384/">PEP 384</a>, but more “implementation details” of CPython.</blockquote> <p>The change is backward compatible: <tt class="docutils literal">#include &lt;Python.h&gt;</tt> will still provide exactly the same API. For example, <tt class="docutils literal">object.h</tt> automatically includes <tt class="docutils literal">cpython/object.h</tt>. But <tt class="docutils literal">Include/cpython/</tt> headers must not be included directly (it would fail with a compilation error).</p> <p>For example, <tt class="docutils literal">Include/object.h</tt> now ends with:</p> <pre class="literal-block"> #ifndef Py_LIMITED_API # define Py_CPYTHON_OBJECT_H # include &quot;cpython/object.h&quot; # undef Py_CPYTHON_OBJECT_H #endif </pre> <p><tt class="docutils literal">Include/cpython/object.h</tt> structure (content replaced with <tt class="docutils literal">...</tt>):</p> <pre class="literal-block"> #ifndef Py_CPYTHON_OBJECT_H # error &quot;this header file must not be included directly&quot; #endif #ifdef __cplusplus extern &quot;C&quot; { #endif ... #ifdef __cplusplus } #endif </pre> <p>In Python 3.8, the work is not complete. I tried to double- or even triple-check my changes to ensure that I don't remove an API by mistake. This work is still on-going in Python 3.9.</p> </div> <div class="section" id="summary-of-include-directories"> <h2>Summary of Include/ directories</h2> <p>The header files have been reorganized to better separate the different kinds of APIs:</p> <ul class="simple"> <li><tt class="docutils literal"><span class="pre">Include/*.h</span></tt> should be the portable public stable C API.</li> <li><tt class="docutils literal"><span class="pre">Include/cpython/*.h</span></tt> should be the unstable C API specific to CPython; public API, with some private API prefixed by <tt class="docutils literal">_Py</tt> or <tt class="docutils literal">_PY</tt>.</li> <li><tt class="docutils literal"><span class="pre">Include/internal/*.h</span></tt> is the private internal C API very specific to CPython. This API comes with no backward compatibility warranty and should not be used outside CPython. It is only exposed for very specific needs like debuggers and profiles which has to access to CPython internals without calling functions. This API is now installed by <tt class="docutils literal">make install</tt>.</li> </ul> </div> <div class="section" id="convert-macros-to-static-inline-functions"> <h2>Convert macros to static inline functions</h2> <p>In <a class="reference external" href="https://bugs.python.org/issue35059">bpo-35059</a>, I converted some macros to static inline functions:</p> <ul class="simple"> <li><tt class="docutils literal">Py_INCREF()</tt>, <tt class="docutils literal">Py_DECREF()</tt></li> <li><tt class="docutils literal">Py_XINCREF()</tt>, <tt class="docutils literal">Py_XDECREF()</tt></li> <li><tt class="docutils literal">PyObject_INIT()</tt>, <tt class="docutils literal">PyObject_INIT_VAR()</tt></li> <li>Private functions: <tt class="docutils literal">_PyObject_GC_TRACK()</tt>, <tt class="docutils literal">_PyObject_GC_UNTRACK()</tt>, <tt class="docutils literal">_Py_Dealloc()</tt></li> </ul> <p>Compared to macros, static inline functions have multiple advantages:</p> <ul class="simple"> <li>Parameter types and return type are well defined;</li> <li>They don't have issues specific to macros: see <a class="reference external" href="https://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html">GCC Macro Pitfals</a>;</li> <li>Variables have a well defined local scope.</li> </ul> <p>Python 3.7 uses ugly macros with comma and semicolon. Example:</p> <pre class="literal-block"> #define _Py_REF_DEBUG_COMMA , #define _Py_CHECK_REFCNT(OP) /* a semicolon */; #define _Py_NewReference(op) ( \ _Py_INC_TPALLOCS(op) _Py_COUNT_ALLOCS_COMMA \ _Py_INC_REFTOTAL _Py_REF_DEBUG_COMMA \ Py_REFCNT(op) = 1) </pre> <p><a class="reference external" href="https://www.python.org/dev/peps/pep-0007/#c-dialect">Python 3.6 requires C99 standard of the C dialect</a>. It was time to start to use it :-)</p> </div> <div class="section" id="removed-functions"> <h2>Removed functions</h2> <p><a class="reference external" href="https://bugs.python.org/issue35713">bpo-35713</a>: I removed <tt class="docutils literal">PyByteArray_Init()</tt> and <tt class="docutils literal">PyByteArray_Fini()</tt> functions. They did nothing since Python 2.7.4 and Python 3.2.0, were excluded from the limited API (stable ABI), and were not documented.</p> <p><a class="reference external" href="https://bugs.python.org/issue36728">bpo-36728</a>: I also removed <tt class="docutils literal">PyEval_ReInitThreads()</tt> function. It should not be called explicitly: use <tt class="docutils literal">PyOS_AfterFork_Child()</tt> instead.</p> </div> Python 3.8 sys.unraisablehook2019-06-15T01:00:00+02:002019-06-15T01:00:00+02:00Victor Stinnertag:vstinner.github.io,2019-06-15:/sys-unraisablehook-python38.html<a class="reference external image-reference" href="https://www.flickr.com/photos/dawnmanser/8046201692/"> <img alt="Hidden kitten" src="https://vstinner.github.io/images/hidden_kitten.jpg" /> </a> <p>I added a new <a class="reference external" href="https://docs.python.org/dev/library/sys.html#sys.unraisablehook">sys.unraisablehook</a> function to allow to set a custom hook to control how &quot;unraisable exceptions&quot; are handled. It is already testable in <a class="reference external" href="https://pythoninsider.blogspot.com/2019/06/python-380b1-is-now-available-for.html">Python 3.8 beta1</a>, released last week!</p> <p>An &quot;unraisable exception&quot; is an error which happens when Python cannot report it to the caller. Examples …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/dawnmanser/8046201692/"> <img alt="Hidden kitten" src="https://vstinner.github.io/images/hidden_kitten.jpg" /> </a> <p>I added a new <a class="reference external" href="https://docs.python.org/dev/library/sys.html#sys.unraisablehook">sys.unraisablehook</a> function to allow to set a custom hook to control how &quot;unraisable exceptions&quot; are handled. It is already testable in <a class="reference external" href="https://pythoninsider.blogspot.com/2019/06/python-380b1-is-now-available-for.html">Python 3.8 beta1</a>, released last week!</p> <p>An &quot;unraisable exception&quot; is an error which happens when Python cannot report it to the caller. Examples: object finalizer error (<tt class="docutils literal">__del__()</tt>), weak reference callback failure, error during a GC collection. At the C level, the <tt class="docutils literal">PyErr_WriteUnraisable()</tt> function is called to handle such exception.</p> <p>Design the new hook was tricky, as its implementation.</p> <p>The photo shows an exception awaiting to catch you ;-)</p> <div class="section" id="kill-python-at-the-first-unraisable-exception"> <h2>Kill Python at the first unraisable exception</h2> <p>One month ago, <strong>Thomas Grainger</strong> opened <a class="reference external" href="https://bugs.python.org/issue36829">bpo-36829</a>: &quot;CLI option to make PyErr_WriteUnraisable abort the current process&quot;. He wrote:</p> <blockquote> Currently it's quite easy for these <strong>errors</strong> to go <strong>unnoticed</strong>. (...) The point for me is that CI will fail if it happens, then <strong>I can use gdb</strong> to find out the cause</blockquote> <p><strong>Zackery Spytz</strong> wrote the <a class="reference external" href="https://github.com/python/cpython/pull/13175">PR 13175</a> to add <tt class="docutils literal"><span class="pre">-X</span> abortunraisable</tt> command line option. When this option is used, <tt class="docutils literal">PyErr_WriteUnraisable()</tt> calls <tt class="docutils literal"><span class="pre">Py_FatalError(&quot;Unraisable</span> exception&quot;)</tt> which calls <tt class="docutils literal">abort()</tt>: it raises <tt class="docutils literal">SIGABRT</tt> signal which kills the process by default.</p> </div> <div class="section" id="handle-unraisable-exception-in-python-sys-unraisablehook"> <h2>Handle unraisable exception in Python: sys.unraisablehook</h2> <p>I concur with Thomas that it's easy to miss such exception, but I dislike killing the process. It's not practical to have to use a low-level debugger like gdb to handle such bug.</p> <p>I proposed a different design: add a new <tt class="docutils literal">sys.unraisablehook</tt> hook allowing to use arbitrary Python code to handle an &quot;unraisable exception&quot;.</p> <p>I wrote a <a class="reference external" href="https://bugs.python.org/issue36829#msg341868">hook example</a> which displays the Python stack where the exception occurred using the <tt class="docutils literal">traceback</tt> module.</p> <p>I chose to pass an single object as argument to <tt class="docutils literal">sys.unraisablehook</tt>. The object has 4 attributes:</p> <ul class="simple"> <li>exc_type: Exception type.</li> <li>exc_value: Exception value, can be None.</li> <li>exc_traceback: Exception traceback, can be None.</li> <li>object: Object causing the exception, can be None.</li> </ul> <p>I wanted to design an <strong>extensible API</strong>: keep the backward compatibility even if tomorrow we want to add a new attribute to the object to pass more information.</p> </div> <div class="section" id="adding-source-parameter-to-the-warnings-module"> <h2>Adding source parameter to the warnings module</h2> <p>To explain the rationale of my proposed <tt class="docutils literal">sys.unraisablehook</tt> design (single objeect with attributes), let me tell you my bad experience with the <tt class="docutils literal">warnings</tt> module.</p> <div class="section" id="use-tracemalloc-for-resourcewarning"> <h3>Use tracemalloc for ResourceWarning</h3> <p>In March 2016, I was tired how debugging <tt class="docutils literal">ResourceWarning</tt> warnings: it's hard to guess where the bug comes from. The warning is logged where the resource is released, but I was interested by where the resource was allocated.</p> <p>My <a class="reference external" href="https://docs.python.org/dev/library/tracemalloc.html">tracemalloc</a> module provides a convenient <a class="reference external" href="https://docs.python.org/dev/library/tracemalloc.html#tracemalloc.get_object_traceback">get_object_traceback()</a> function which provides the traceback where any Python has been allocated.</p> <p>I opened <a class="reference external" href="https://bugs.python.org/issue26604">bpo-26604</a>: &quot;ResourceWarning: Use tracemalloc to display the traceback where an object was allocated when a ResourceWarning is emitted&quot;.</p> </div> <div class="section" id="warnings-hooks-cannot-be-extended"> <h3>warnings hooks cannot be extended</h3> <p>The problem is that the <tt class="docutils literal">showwarning()</tt> and <tt class="docutils literal">formatwarning()</tt> functions of <tt class="docutils literal">warnings</tt> can be overriden. They use a fixed number of positional parameters:</p> <pre class="literal-block"> def showwarning(message, category, filename, lineno, file=None, line=None): ... def formatwarning(message, category, filename, lineno, line=None): ... </pre> <p>If they are called with an additional parameter, they fail with a <tt class="docutils literal">TypeError</tt>. I wanted to add a new <tt class="docutils literal">source</tt> parameter to these functions.</p> </div> <div class="section" id="reuse-existing-warningmessage-class"> <h3>Reuse existing WarningMessage class</h3> <p>To extend the warnings module, I chose to rely on the existing <tt class="docutils literal">WarningMessage</tt> class which can be used to &quot;pack&quot; all parameters as a single object. This class was used by <tt class="docutils literal">catch_warnings</tt> context manager.</p> <p>I had to add new private <tt class="docutils literal">_showwarnmsg()</tt> and <tt class="docutils literal">_formatwarnmsg()</tt> functions. They are called with a <tt class="docutils literal">WarningMessage</tt> instance. The implementation has to detect when <tt class="docutils literal">showwarning()</tt> and <tt class="docutils literal">formatwarning()</tt> is overriden: the overriden function must be called with the legacy API in this case. The backward compatibility requirement makes the implementation complex.</p> </div> <div class="section" id="regression"> <h3>Regression</h3> <p>After Python 3.6 was released with my new feature, <a class="reference external" href="https://bugs.python.org/issue35178">bpo-35178</a> was reported. The <tt class="docutils literal">warnings</tt> module called a custom <tt class="docutils literal">formatwarning()</tt> with the <tt class="docutils literal">line</tt> argument passed as a keyword argument, whereas other arguments are passed as positional arguments. The <a class="reference external" href="https://github.com/python/cpython/commit/be7c460fb50efe3b88a00281025d76acc62ad2fd">fix was trivial</a>, but it shows that backward compatibility is hard.</p> </div> <div class="section" id="example"> <h3>Example</h3> <p>By the way, example of the feature using a <tt class="docutils literal">filebug.py</tt> script:</p> <pre class="literal-block"> def func(): f = open(__file__) f = None func() </pre> <p>The feature adds the &quot;Object allocated at&quot; traceback, whereas existing <tt class="docutils literal">f = None</tt> output is worthless.</p> <pre class="literal-block"> $ python3 -Wd -X tracemalloc=5 filebug.py filebug.py:3: ResourceWarning: unclosed file &lt;_io.TextIOWrapper name='filebug.py' mode='r' encoding='UTF-8'&gt; f = None Object allocated at (most recent call first): File &quot;filebug.py&quot;, lineno 2 f = open(__file__) File &quot;filebug.py&quot;, lineno 5 func() </pre> </div> </div> <div class="section" id="limitations-of-my-unraisablehook-idea"> <h2>Limitations of my unraisablehook idea</h2> <p>To come back to <a class="reference external" href="https://bugs.python.org/issue36829">bpo-36829</a>, I identified a limitation in my <tt class="docutils literal">sys.unraisablehook</tt> idea: unraisable exceptions which occurs very late during Python finalization cannot be handled by a custom hook.</p> <p>Thomas said that he is fine with having to use <tt class="docutils literal">gdb</tt> to debug an issue during Python finalization.</p> <p>In my experience, using <tt class="docutils literal">gdb</tt> on system Python is unpleasant, since it's usually deeply optimized (PGO + LTO optimizations). gdb fails to read variables which are only displayed as <tt class="docutils literal">&lt;optimized out&gt;</tt>. By the way, that's why I fixed the <a class="reference external" href="https://docs.python.org/dev/whatsnew/3.8.html#debug-build-uses-the-same-abi-as-release-build">debug build of Python to be ABI compatible with a release build</a>, but that's a different story.</p> <p>Thomas's idea of killing the process allows to detect unraisable exceptions whenever they occur.</p> </div> <div class="section" id="api-discussed-on-python-dev"> <h2>API discussed on python-dev</h2> <p>I started a discussion on python-dev to get more feedback: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157436.html">bpo-36829: Add sys.unraisablehook()</a>.</p> <div class="section" id="new-exception-while-handling-an-exception"> <h3>New exception while handling an exception</h3> <p><strong>Nathaniel Smith</strong> asked what happens if a custom hook raises a new exception?</p> <p>This problem is easy to fix: <tt class="docutils literal">PyErr_WriteUnraisable()</tt> calls the default hook to handle the new exception (I already implemented this solution).</p> </div> <div class="section" id="positional-arguments"> <h3>Positional arguments</h3> <p><strong>Serhiy Storchaka</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157439.html">preferred</a> passing 5 positional arguments (exc_type, exc_value, exc_tb, obj and msg):</p> <blockquote> Currently we have no plans for adding more details, and I do not think that we will need to do this in future.</blockquote> <p>Later, he added:</p> <blockquote> If you have plans for adding new details in future, I propose to add a 6th parameter &quot;context&quot; or &quot;extra&quot; (always None currently). It is as extensible as packing all arguments into a single structure, but you do not need to introduce the structure type and create its instance until you need to pass additional info.</blockquote> </div> <div class="section" id="reuse-sys-excepthook"> <h3>Reuse sys.excepthook</h3> <p><strong>Steve Dower</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157453.html">proposed to reuse sys.excepthook</a>, rather than adding a new hook, and <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157465.html">create a new exception to pass extra info</a>.</p> <p><strong>Nathaniel</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157460.html">explained</a> that <tt class="docutils literal">sys.excepthook</tt> and <tt class="docutils literal">sys.unraisablehook</tt> have different behavior and so require to be different.</p> </div> <div class="section" id="object-resurrection"> <h3>Object resurrection</h3> <p><strong>Steve Dower</strong> was <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157452.html">concerned by object resurrection</a> and proposed to only pass <tt class="docutils literal">repr(obj)</tt> to the hook.</p> <p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157463.html">I explained</a> that an object can only be resurrected after its finalization, which is different than deallocation. Accessing a finalized object should not crash Python. The deallocation makes an object unsable, except that deallocation only happens once the last references to an object is gone, and so the object is no longer accessible.</p> <p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157467.html">Nathaniel added</a> that <tt class="docutils literal">repr()</tt> would limit features of the hook:</p> <blockquote> A clever hook might want the actual object, so it can pretty-print it, or open an interactive debugger and let it you examine it, or something.</blockquote> </div> <div class="section" id="naming"> <h3>Naming</h3> <p><strong>Gregory P. Smith</strong> proposed the term &quot;uncatchable&quot; rather than &quot;unraisable&quot;.</p> </div> <div class="section" id="keyword-only-arguments"> <h3>Keyword-only arguments</h3> <p><strong>Barry Warsaw</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157457.html">suggested</a> to consider keyword-only arguments to help future proof the call signature.</p> </div> <div class="section" id="avoid-redundant-exc-type-and-exc-traceback-parameters"> <h3>Avoid redundant exc_type and exc_traceback parameters</h3> <p><strong>Petr Viktorin</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157459.html">asked</a> why <tt class="docutils literal">(exc_type, exc_value, exc_traceback)</tt> triple is needed, wheras <em>exc_type</em> could be get from <tt class="docutils literal">type(exc_type)</tt> and <em>exc_traceback</em> from <tt class="docutils literal">exc_value.__traceback__</tt>.</p> <p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2019-May/157462.html">I made some tests</a>. <em>exc_value</em> can be <tt class="docutils literal">NULL</tt> sometimes. In some cases, <em>exc_traceback</em> can be set, whereas <tt class="docutils literal">exc_value.__traceback__</tt> is not set (<tt class="docutils literal">None</tt>).</p> </div> </div> <div class="section" id="productive-discussion"> <h2>Productive discussion!</h2> <p>As usual, the python-dev discussion was very productive. Each corner case has been discussed and the API has been challenged.</p> <p>Thanks to Petr's remark, I enhanced the existing hook to instanciate an exception if <em>exc_value</em> is <tt class="docutils literal">NULL</tt>, create a traceback if <em>exc_traceback</em> is <tt class="docutils literal">NULL</tt>, and set <tt class="docutils literal">exc_value.__traceback__</tt> to the traceback. If one of these actions fail, the failure is silently ignored.</p> <p>I also paid more attention to object resurrection.</p> <p>After one week of discussion, I was not convinced by other alternative propositions, whereas multiple core devs wrote that they like my API.</p> <p>I decided to push my <a class="reference external" href="https://github.com/python/cpython/commit/ef9d9b63129a2f243591db70e9a2dd53fab95d86">commit ef9d9b63</a>:</p> <pre class="literal-block"> commit ef9d9b63129a2f243591db70e9a2dd53fab95d86 Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Wed May 22 11:28:22 2019 +0200 bpo-36829: Add sys.unraisablehook() (GH-13187) Add new sys.unraisablehook() function which can be overridden to control how &quot;unraisable exceptions&quot; are handled. It is called when an exception has occurred but there is no way for Python to handle it. For example, when a destructor raises an exception or during garbage collection (gc.collect()). </pre> </div> <div class="section" id="new-err-msg-attribute"> <h2>New err_msg attribute</h2> <p>Unraisable exception were logged with no context, only an hardcoded &quot;Exception ignored in:&quot; error message.</p> <p>Early in <tt class="docutils literal">sys.unraisablehook</tt> discussion, <strong>Serhiy</strong> proposed to add a new <em>err_msg</em> parameter to pass an optional error message.</p> <p>I implemented this idea in <a class="reference external" href="https://bugs.python.org/issue36829">bpo-36829</a> with <a class="reference external" href="https://github.com/python/cpython/commit/71c52e3048dd07567f0c690eab4e5d57be66f534">commit 71c52e30</a>:</p> <pre class="literal-block"> commit 71c52e3048dd07567f0c690eab4e5d57be66f534 Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Mon May 27 08:57:14 2019 +0200 bpo-36829: Add _PyErr_WriteUnraisableMsg() (GH-13488) </pre> <p>I was able to add a new parameter as a new <em>err_msg</em> attribute without breaking the backward compatibility!</p> </div> <div class="section" id="test-support-catch-unraisable-exception"> <h2>test.support.catch_unraisable_exception()</h2> <p>I wrote a new context manager catching unraisable exceptions: <tt class="docutils literal">test.support.catch_unraisable_exception()</tt>. The exception is stored and so can be used for tests in the context manager, but cleared at context manager exit.</p> <p>I modified tests to use this new context manager:</p> <ul class="simple"> <li>test_coroutines</li> <li>test_cprofile</li> <li>test_exceptions</li> <li>test_generators</li> <li>test_io</li> <li>test_raise</li> <li>test_ssl</li> <li>test_thread</li> <li>test_yield_from</li> </ul> <p>Example:</p> <pre class="literal-block"> class BrokenDel: def __del__(self): raise ValueError(&quot;del is broken&quot;) obj = BrokenDel() with support.catch_unraisable_exception() as cm: del obj self.assertEqual(cm.unraisable.object, BrokenDel.__del__) </pre> </div> <div class="section" id="test-io-memory-leak-regression"> <h2>test_io memory leak regression</h2> <p>I modified test_io to ignore expected unraisable exceptions:</p> <pre class="literal-block"> commit c15a682603a47f5aef5025f6a2e3babb699273d6 Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Thu Jun 13 00:23:49 2019 +0200 bpo-37223: test_io: silence destructor errors (GH-14031) </pre> <p>This change introduced a memory leak, <a class="reference external" href="https://bugs.python.org/issue37261">bpo-37261</a>:</p> <pre class="literal-block"> test_io leaked [23208, 23204, 23208] references, sum=69620 test_io leaked [7657, 7655, 7657] memory blocks, sum=22969 </pre> <p>The problem was this <tt class="docutils literal">catch_unraisable_exception</tt> method:</p> <pre class="literal-block"> def __exit__(self, *exc_info): del self.unraisable sys.unraisablehook = self._old_hook </pre> <p>Sometimes, <tt class="docutils literal">del self.unraisable</tt> triggered a new unraisable exception. At this point, <tt class="docutils literal">catch_unraisable_exception</tt> hook was still registered:</p> <pre class="literal-block"> def _hook(self, unraisable): self.unraisable = unraisable </pre> <p>At the end, <tt class="docutils literal">del self.unraisable</tt> instruction <em>indirectly</em> sets again the <tt class="docutils literal">self.unraisable</tt> attribute.</p> <div class="section" id="first-fix"> <h3>First fix</h3> <p>First, I suspected that the <tt class="docutils literal">io.BufferedRWPair</tt> object which triggered the first unraisable exception was <strong>resurrected</strong>, and that <tt class="docutils literal">del self.unraisable</tt> called again its finalizer or deallocator, which triggered the <em>same</em> unraisable exception again.</p> <p>My first attempt to fix the issue was to clear the <tt class="docutils literal">sys.unraisablehook</tt> by setting it to <tt class="docutils literal">None</tt>, and only later delete the attribute:</p> <pre class="literal-block"> def __exit__(self, *exc_info): self.unraisablehook = None sys.unraisablehook = self._old_hook del self.unraisable </pre> <p>If <tt class="docutils literal">self.unraisablehook = None</tt> triggers a new unraisable exception, it is silently ignored.</p> </div> <div class="section" id="second-correct-fix"> <h3>Second correct fix</h3> <p>But when I chatted with <strong>Pablo Galindo</strong>, he told me that an object cannot be finalized twice thanks to <strong>Antoine Pitrou</strong>'s <a class="reference external" href="https://www.python.org/dev/peps/pep-0442/">PEP 442: Safe object finalization</a>.</p> <p>I looked again into gdb. Oh. In fact, it's more subtle. <tt class="docutils literal">del self.unraisable</tt> clears the last reference to <tt class="docutils literal">BufferedRWPair</tt> which calls its <strong>deallocator</strong>. The dealloactor indirectly calls the <tt class="docutils literal">BufferedWriter</tt> finalizer; the <tt class="docutils literal">BufferedWriter</tt> was stored in the <tt class="docutils literal">BufferedRWPair</tt>. This finalizer triggers a new unraisable exception.</p> <p><tt class="docutils literal">BufferedRWPair</tt> does not trigger two unraisable exception. It's a different object (<tt class="docutils literal">BufferedWriter</tt>).</p> <p>My final fix is to restore the old hook before deleting the <tt class="docutils literal">unraisable</tt> attribute:</p> <pre class="literal-block"> def __exit__(self, *exc_info): sys.unraisablehook = self._old_hook del self.unraisable </pre> <p>And fix test_io using two nested context managers:</p> <pre class="literal-block"> # Ignore BufferedWriter (of the BufferedRWPair) unraisable exception with support.catch_unraisable_exception(): # Ignore BufferedRWPair unraisable exception with support.catch_unraisable_exception(): pair = None support.gc_collect() support.gc_collect() </pre> <p>I also documented corner cases in <tt class="docutils literal">sys.unraisablehook</tt> documentation:</p> <blockquote> <p><tt class="docutils literal">sys.unraisablehook</tt> can be overridden to control how unraisable exceptions are handled.</p> <p>Storing <em>exc_value</em> using a custom hook can create a <strong>reference cycle</strong>. It should be cleared explicitly to break the reference cycle when the exception is no longer needed.</p> <p>Storing <em>object</em> using a custom hook <strong>can resurrect</strong> it if it is set to an object which is being finalized. Avoid storing <em>object</em> after the custom hook completes to avoid resurrecting objects.</p> </blockquote> </div> </div> <div class="section" id="regrtest-now-detects-unraisable-exceptions"> <h2>regrtest now detects unraisable exceptions</h2> <p>Once I fixed tests to silence all expected unraisable exceptions, I created <a class="reference external" href="https://bugs.python.org/issue37069">bpo-37069</a> to modify regrtest to install a custom hook. I merged my <a class="reference external" href="https://github.com/python/cpython/commit/95f61c8b1619e736bd5e29a0da0183234634b6e8">commit 95f61c8b</a>:</p> <pre class="literal-block"> commit 95f61c8b1619e736bd5e29a0da0183234634b6e8 Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Thu Jun 13 01:09:04 2019 +0200 bpo-37069: regrtest uses sys.unraisablehook (GH-13759) regrtest now uses sys.unraisablehook() to mark a test as &quot;environment altered&quot; (ENV_CHANGED) if it emits an &quot;unraisable exception&quot;. Moreover, regrtest logs a warning in this case. Use &quot;python3 -m test --fail-env-changed&quot; to catch unraisable exceptions in tests. </pre> <p>A test is marked as &quot;environment altered&quot; (ENV_CHANGED) if the test triggers an unraisable exception. Using <tt class="docutils literal"><span class="pre">--fail-env-changed</span></tt> option (option used by default on all Python CIs), a test is marked as failed in this case.</p> </div> <div class="section" id="hook-features"> <h2>Hook features</h2> <p>sys.unraisablehook allows to set a custom hook to handle unraisable exceptions. It opens many interesting features:</p> <ul class="simple"> <li>Log the exception into system logs, over the network, or open a popup.</li> <li>Inspect the Python stack: <tt class="docutils literal">traceback.print_stack()</tt></li> <li>Inspect <em>object</em> content (object which caused the exception)</li> <li>Get the traceback where <em>object</em> has been allocated: <tt class="docutils literal">tracemalloc.get_object_traceback()</tt></li> </ul> <p>By the way, reimplementing Thomas's initial idea became trivial:</p> <pre class="literal-block"> import signal, sys def abort_hook(unraisable): signal.raise_signal(signal.SIGABRT) sys.unraisablehook = abort_hook </pre> </div> <div class="section" id="threading-excepthook"> <h2>threading.excepthook</h2> <p>Since I was happy of <tt class="docutils literal">sys.unraisablehook</tt>, I decided to work on the 14-years old issue <a class="reference external" href="https://bugs.python.org/issue1230540">bpo-1230540</a>: I proposed to add <a class="reference external" href="https://docs.python.org/dev/library/threading.html#threading.excepthook">threading.excepthook()</a>, but that's a different story!</p> </div> asyncio WSASend() memory leak2019-03-06T20:00:00+01:002019-03-06T20:00:00+01:00Victor Stinnertag:vstinner.github.io,2019-03-06:/asyncio-proactor-wsasend-memory-leak.html<a class="reference external image-reference" href="https://www.flickr.com/photos/jronaldlee/5996590138/"> <img alt="Leaking tap" src="https://vstinner.github.io/images/leaking_tap.jpg" /> </a> <p>I fixed multiple bugs in asyncio <tt class="docutils literal">ProactorEventLoop</tt> previously. But test_asyncio still failed sometimes. I noticed a memory leak in <tt class="docutils literal">test_asyncio</tt> which will haunt me for 1 year in 2018...</p> <p><strong>Yet another example of a test failure which looks harmless but hides a critical bug.</strong> The bug is that sending a …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/jronaldlee/5996590138/"> <img alt="Leaking tap" src="https://vstinner.github.io/images/leaking_tap.jpg" /> </a> <p>I fixed multiple bugs in asyncio <tt class="docutils literal">ProactorEventLoop</tt> previously. But test_asyncio still failed sometimes. I noticed a memory leak in <tt class="docutils literal">test_asyncio</tt> which will haunt me for 1 year in 2018...</p> <p><strong>Yet another example of a test failure which looks harmless but hides a critical bug.</strong> The bug is that sending a network packet on Windows using asyncio <tt class="docutils literal">ProactorEventLoop</tt> can leak the packet. With such bug, it is easy to imagine a very quick increase of the memory footprint of a network server...</p> <p>I'm curious why nobody noticed it before me? For me, the only explanation is that nobody was running a server using <tt class="docutils literal">ProactorEventLoop</tt>. Before Python 3.8, <tt class="docutils literal">SelectorEventLoop</tt> was the default asyncio event loop on Windows. <a class="reference external" href="https://bugs.python.org/issue34687">bpo-34687</a>: Andrew Svetlov, Yury Selivanov and me agreed to make <tt class="docutils literal">ProactorEventLoop</tt> the default in Python 3.8! <tt class="docutils literal">Lib/asyncio/windows_events.py</tt> change of my <a class="reference external" href="https://github.com/python/cpython/commit/6ea29c5e90dde6c240bd8e0815614b52ac307ea1">commit 6ea29c5e</a>:</p> <pre class="literal-block"> -DefaultEventLoopPolicy = WindowsSelectorEventLoopPolicy +DefaultEventLoopPolicy = WindowsProactorEventLoopPolicy </pre> <p>The bug wasn't a regression. It was only discovered 5 years after the code has been written thanks to new tests.</p> <p><strong>UPDATE:</strong> I updated the article to add the &quot;Regression? Nope&quot; section and elaborate the Conclusion.</p> <p>Previous article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-wsarecv-cancellation-data-loss.html">asyncio: WSARecv() cancellation causing data loss</a>.</p> <div class="section" id="yet-another-random-buildbot-failure"> <h2>Yet another random buildbot failure</h2> <p>One day at the end of January 2018, I noticed a new failure on the AMD64 Windows8.1 Refleaks 3.x&quot; buildbot worker. I reported <a class="reference external" href="https://bugs.python.org/issue32710">bpo-32710</a>:</p> <blockquote> <p>AMD64 Windows8.1 Refleaks 3.x: <a class="reference external" href="http://buildbot.python.org/all/#/builders/80/builds/118">http://buildbot.python.org/all/#/builders/80/builds/118</a></p> <p>test_asyncio leaked [4, 4, 3] memory blocks, sum=11</p> <p>I reproduced the issue. I'm running test.bisect to try to isolate this bug.</p> </blockquote> <p>Only 15 minutes later thanks to my <tt class="docutils literal">test.bisect</tt> tool, I identified the leaking test, <strong>test_sendfile_close_peer_in_middle_of_receiving()</strong>:</p> <pre class="literal-block"> It seems to be related to sendfile(): C:\vstinner\python\master&gt;python -m test -R 3:3 test_asyncio \ -m test.test_asyncio.test_events.ProactorEventLoopTests.test_sendfile_close_peer_in_middle_of_receiving ... test_asyncio leaked [1, 2, 1] memory blocks, sum=4 </pre> <p>The test is identified, so it should take a few hours, maximum, to fix the bug, no? We will see...</p> </div> <div class="section" id="april"> <h2>April</h2> <p>3 months later, I asked:</p> <blockquote> The test is still leaking memory blocks. Any progress on investigating the issue?</blockquote> <p>Nobody replied.</p> <p>At that time, I was busy to fix a bunch of various other bugs reported by buildbots which were easier to fix and I was kind of exhausted by asyncio, I didn't want to touch it.</p> </div> <div class="section" id="june"> <h2>June</h2> <p>Oh, I found again this bug while working on my <a class="reference external" href="https://github.com/python/cpython/pull/7827">PR 7827</a> (detect handle leaks on Windows in regrtest).</p> <p>In 2018, I was very busy with fixing dozens of multiprocessing bugs (fix tests but also fix some bugs in multiprocessing).</p> <p>For example, I noticed another memory leak on AMD64 Windows8.1 Refleaks 3.7, <a class="reference external" href="https://bugs.python.org/issue33735#msg318425">bpo-33735</a>:</p> <blockquote> <p><a class="reference external" href="http://buildbot.python.org/all/#/builders/132/builds/154">http://buildbot.python.org/all/#/builders/132/builds/154</a></p> <p>test_multiprocessing_spawn leaked [1, 2, 1] memory blocks, sum=4</p> </blockquote> <p>This test_multiprocessing_spawn leak and the test_asyncio leak on Windows Refleaks haunted me in 2018...</p> <p>In fact, it wasn't a real leak. After a few runs, <a class="reference external" href="https://bugs.python.org/issue33735#msg320948">the test stopped to leak</a>:</p> <pre class="literal-block"> $ ./python -m test test_multiprocessing_spawn \ -m test.test_multiprocessing_spawn.WithProcessesTestPool.test_imap_unordered \ -R 1:30 ... test_multiprocessing_spawn leaked [4, 5, 1, 5, 1, 2, 0, 0, 0, ..., 0, 0, 0] memory blocks, sum=18 test_multiprocessing_spawn failed in 42 sec 470 ms </pre> <p>I fixed the test with <a class="reference external" href="https://github.com/python/cpython/commit/23401fb960bb94e6ea62d2999527968d53d3fc65">commit 23401fb9</a>.</p> <p>I fixed other multiprocessing bugs like <a class="reference external" href="https://bugs.python.org/issue33929">bpo-33929</a>.</p> <p>These multiprocessing bugs kept me busy.</p> </div> <div class="section" id="july-december"> <h2>July-December</h2> <p>Nothing. Nobody looked at the issue.</p> <p>Again, I was busy fixing various test failures reported by buildbots.</p> </div> <div class="section" id="update-in-january-2019"> <h2>Update in January 2019</h2> <p>In January 2019, after months of hard work on fixing every single buildbot failure, I realized <strong>suddenly</strong> that the <tt class="docutils literal">test_asyncio</tt> leak, <a class="reference external" href="https://bugs.python.org/issue32710">bpo-32710</a>, was one of the last unfixed known test failure! So I decided to have a new look at it.</p> <p>Update on <tt class="docutils literal">test_asyncio.test_sendfile.ProactorEventLoopTests</tt>:</p> <ul class="simple"> <li><tt class="docutils literal">test_sendfile_close_peer_in_the_middle_of_receiving()</tt> leaks 1 reference per run: this leak was the obvious bug <a class="reference external" href="https://bugs.python.org/issue35682">bpo-35682</a>, I already fixed it with <a class="reference external" href="https://github.com/python/cpython/commit/80fda712c83f5dd9560d42bf2aa65a72b18b7759">commit 80fda712</a>.</li> <li><tt class="docutils literal">test_sendfile_fallback_close_peer_in_the_middle_of_receiving()</tt> leaks 1 reference per run: <strong>I don't understand why</strong>.</li> </ul> <p>Note: I had to copy/paste these test names a lot of times. Pleeease, for my comfort, use shorter test names! :-) (I had to copy/paste them, I don't think that a regular human is able to type these very long names!)</p> <p>I spent a lot of time to investigate <tt class="docutils literal">test_sendfile_fallback_close_peer_in_the_middle_of_receiving()</tt> leak and I don't understand the issue.</p> <p>The main loop is <tt class="docutils literal">BaseEventLoop._sendfile_fallback()</tt>. For the specific case of this test, the loop can be simplified to:</p> <pre class="literal-block"> proto = _SendfileFallbackProtocol(transp) try: while True: data = b'x' * (1024 * 64) await proto.drain() transp.write(data) finally: await proto.restore() </pre> <p>The server closes the connection after it gets 1024 bytes. The client socket gets a <tt class="docutils literal">ConnectionAbortedError</tt> exception in <tt class="docutils literal">_ProactorBaseWritePipeTransport._loop_writing()</tt> which calls <tt class="docutils literal">_fatal_error()</tt>:</p> <pre class="literal-block"> except OSError as exc: self._fatal_error(exc, 'Fatal write error on pipe transport') </pre> <p><tt class="docutils literal">_fatal_error()</tt> calls <tt class="docutils literal">_force_close()</tt> which sets <tt class="docutils literal">_closing</tt> to <tt class="docutils literal">True</tt>, and calls <tt class="docutils literal">protocol.connection_lost()</tt>. In the meanwhile, <tt class="docutils literal">drain()</tt> raises <tt class="docutils literal">ConnectionError</tt> because <tt class="docutils literal">is_closing()</tt> is true:</p> <pre class="literal-block"> async def drain(self): if self._transport.is_closing(): raise ConnectionError(&quot;Connection closed by peer&quot;) ... </pre> <p>Said differently: <strong>everything works as expected</strong>.</p> </div> <div class="section" id="regression-caused-by-my-previous-proactor-fix"> <h2>Regression caused by my previous proactor fix?</h2> <p>I suspected my own <a class="reference external" href="https://github.com/python/cpython/commit/79790bc35fe722a49977b52647f9b5fe1deda2b7">commit 79790bc3</a> pushed 7 months ago to fix a race condition in WSARecv() causing data loss (that's my previous article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-wsarecv-cancellation-data-loss.html">asyncio: WSARecv() cancellation causing data loss</a>).</p> <p>Hint: nah, it's unrelated. Moreover, this change has been pushed in May, whereas I reported <a class="reference external" href="https://bugs.python.org/issue32710">bpo-32710 leak</a> in January.</p> </div> <div class="section" id="short-script-reproducing-the-leak"> <h2>Short script reproducing the leak</h2> <p><strong>Identifying a leak of a single reference is really hard</strong> since the test uses hundreds of Python objects! My blocker issue was to repeat the test enough times to trigger the leak N times rather than getting a leak of exactly a single Python reference. The problem was that the test failed when ran more than once.</p> <p>All my previous attempts to identify the bug failed:</p> <ul class="simple"> <li>Use <tt class="docutils literal">gc.get_referrers()</tt> to track references between Python objects.</li> <li>Use <tt class="docutils literal">tracemalloc</tt> to track memory usage: the leak is too small, it's lost in the results &quot;noise&quot;.</li> </ul> <p>I decided to do what I should have done first: <strong>remove as much code as possible</strong> to reduce the code that I have to audit. I removed most Python imports, I inlined manually function calls, I removed a lot of code which was unused in the test, etc.</p> <p>After a few hours, I managed to reduce the giant pile of code used by the test into a very short script of only 159 lines of Python code: <a class="reference external" href="https://bugs.python.org/file48030/test_aiosend.py">test_aiosend.py</a>. The script doesn't call the asyncio <tt class="docutils literal">sendfile()</tt> implementation, but uses its own copy of the code, simplified to do exactly what the test needs:</p> <pre class="literal-block"> async def sendfile(transp): proto = _SendfileFallbackProtocol(transp) try: data = b'x' * (1024 * 24) while True: await proto.drain() transp.write(data) finally: await proto.restore() </pre> <p>with a local copy of the code of <tt class="docutils literal">_SendfileFallbackProtocol</tt> class.</p> <p>Having all code involved in the bug in a single file is way more efficient to follow the control flow and understands what happens.</p> <p>The original code is waaaaay more complex, scattered across multiple Python files in <tt class="docutils literal">Lib/asyncio</tt> and <tt class="docutils literal">Lib/test/test_asyncio/</tt> directories.</p> </div> <div class="section" id="root-bug-identified-wsasend"> <h2>Root bug identified: WSASend()</h2> <p><strong>It took me 1 year, a few sleepless nights, multiple attempts to understand the leak, but I eventually found it!</strong> WSASend() doesn't release the memory if it fails immediately. I expected something way more complex, but it's that simple...</p> <p>Using the <tt class="docutils literal">test_aiosend.py</tt> script that I created, I was finally able to repeat the test in a loop. Thanks to that, it became obvious using <tt class="docutils literal">tracemalloc</tt> that the leaked memory was the memory passed to <tt class="docutils literal">WSASend()</tt>.</p> <p>I pushed <a class="reference external" href="https://github.com/python/cpython/commit/a234e148394c2c7419372ab65b773d53a57f3625">commit a234e148</a> to fix <tt class="docutils literal">WSASend()</tt>:</p> <pre class="literal-block"> commit a234e148394c2c7419372ab65b773d53a57f3625 Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Tue Jan 8 14:23:09 2019 +0100 bpo-32710: Fix leak in Overlapped_WSASend() (GH-11469) Fix a memory leak in asyncio in the ProactorEventLoop when ReadFile() or WSASend() overlapped operation fail immediately: release the internal buffer. </pre> <p>I was very disappointed by the simplicity of the fix, <strong>it only adds a single line</strong>:</p> <pre class="literal-block"> diff --git a/Modules/overlapped.c b/Modules/overlapped.c index 69875a7f37da..bbaa4fb3008f 100644 --- a/Modules/overlapped.c +++ b/Modules/overlapped.c &#64;&#64; -1011,6 +1012,7 &#64;&#64; Overlapped_WSASend(OverlappedObject *self, PyObject *args) case ERROR_IO_PENDING: Py_RETURN_NONE; default: + PyBuffer_Release(&amp;self-&gt;user_buffer); self-&gt;type = TYPE_NOT_STARTED; return SetFromWindowsErr(err); } </pre> <p>So what? One year to add a single line? That's unfair!</p> <p>My commit contains a very similar fix for <tt class="docutils literal">do_ReadFile()</tt> used by <tt class="docutils literal">Overlapped_ReadFile()</tt> and <tt class="docutils literal">Overlapped_ReadFileInto()</tt>.</p> </div> <div class="section" id="fixing-more-memory-leaks"> <h2>Fixing more memory leaks</h2> <p>By the way, the <tt class="docutils literal">_overlapped.Overlapped</tt> type has no traverse function: it may help the garbage collector to add one. Asyncio is famous for building reference cycles by design in <tt class="docutils literal">Future.set_exception()</tt>.</p> <p>I wrote <a class="reference external" href="https://github.com/python/cpython/pull/11489">PR 11489</a> to implement <tt class="docutils literal">tp_traverse</tt> for the <tt class="docutils literal">_overlapped.Overlapped</tt> type. <a class="reference external" href="https://github.com/python/cpython/pull/11489#pullrequestreview-191093765">Serhiy Storchaka added</a>:</p> <blockquote> I suspect that there are leaks when self-&gt;type is set to TYPE_NOT_STARTED.</blockquote> <p>And he was right! I modified my PR to fix all memory leaks. After my PR has been reviewed, I merged it, <a class="reference external" href="https://github.com/python/cpython/commit/5485085b324a45307c1ff4ec7d85b5998d7d5e0d">commit 5485085b</a>:</p> <pre class="literal-block"> commit 5485085b324a45307c1ff4ec7d85b5998d7d5e0d Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Fri Jan 11 14:35:14 2019 +0100 bpo-32710: Fix _overlapped.Overlapped memory leaks (GH-11489) Fix memory leaks in asyncio ProactorEventLoop on overlapped operation failures. Changes: * Implement the tp_traverse slot in the _overlapped.Overlapped type to help to break reference cycles and identify referrers in the garbage collector. * Always clear overlapped on failure: not only set type to TYPE_NOT_STARTED, but release also resources. </pre> </div> <div class="section" id="regression-nope"> <h2>Regression? Nope</h2> <p>Was the memory leak a regression? Nope. The bug existed since the creation of the <tt class="docutils literal">overlapped.c</tt> file in the &quot;Tulip&quot; project in 2013, <a class="reference external" href="https://github.com/python/asyncio/commit/27c403531670f52cad8388aaa2a13a658f753fd5">commit 27c40353</a>:</p> <pre class="literal-block"> commit 27c403531670f52cad8388aaa2a13a658f753fd5 Author: Richard Oudkerk &lt;shibturn&#64;gmail.com&gt; Date: Mon Jan 21 20:34:38 2013 +0000 New experimental iocp branch. </pre> <p>Tulip was the old name of the asyncio project, when it was still an external project on <tt class="docutils literal">code.google.com</tt>. In the meanwhile, <tt class="docutils literal">code.google.com</tt> has been closed and the project moved to <a class="reference external" href="https://github.com/python/asyncio/">https://github.com/python/asyncio/</a> (now read-only).</p> <p><a class="reference external" href="https://github.com/python/asyncio/blob/27c403531670f52cad8388aaa2a13a658f753fd5/overlapped.c#L632-L658">Extract of the original Overlapped_WSASend() implementation</a>, I added a comment to show the location of the bug:</p> <pre class="literal-block"> if (!PyArg_Parse(bufobj, &quot;y*&quot;, &amp;self-&gt;write_buffer)) return NULL; #if SIZEOF_SIZE_T &gt; SIZEOF_LONG if (self-&gt;write_buffer.len &gt; (Py_ssize_t)PY_ULONG_MAX) { PyBuffer_Release(&amp;self-&gt;write_buffer); PyErr_SetString(PyExc_ValueError, &quot;buffer to large&quot;); return NULL; } #endif ... self-&gt;error = err = (ret &lt; 0 ? WSAGetLastError() : ERROR_SUCCESS); switch (err) { case ERROR_SUCCESS: case ERROR_MORE_DATA: case ERROR_IO_PENDING: /********* !!! BUG HERE, BUFFER NOT RELEASED !!! ***********/ Py_RETURN_NONE; ... } </pre> <p><strong>I fixed the memory leak 6 years after the code has been written!</strong></p> <p>So... why was this bug only discovered in 2018? Multiple very asyncio old bugs were discovered only recently thanks to more realistic and more advanced <strong>functional tests</strong>. First tests of asyncio were mostly tiny unit tests mocking most part of the code. It made sense in the early days of asyncio, when the code was not mature.</p> <p>By the way, the <a class="reference external" href="https://github.com/python/cpython/blob/1f58f4fa6a0e3c60cee8df4a35c8dcf3903acde8/Lib/test/test_asyncio/test_sendfile.py#L446-L457">code of the test</a> which helped to discovered the bug is:</p> <pre class="literal-block"> def test_sendfile_close_peer_in_the_middle_of_receiving(self): srv_proto, cli_proto = self.prepare_sendfile(close_after=1024) with self.assertRaises(ConnectionError): self.run_loop( self.loop.sendfile(cli_proto.transport, self.file)) self.run_loop(srv_proto.done) self.assertTrue(1024 &lt;= srv_proto.nbytes &lt; len(self.DATA), srv_proto.nbytes) self.assertTrue(1024 &lt;= self.file.tell() &lt; len(self.DATA), self.file.tell()) self.assertTrue(cli_proto.transport.is_closing()) </pre> <p>Note: The test name has been made even longer in the meanwhile (add &quot;the&quot;) :-)</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>For such complex bugs, <strong>a reliable debugging method is to remove as much code as possible</strong> to reduce the number of lines of code that should be read. <tt class="docutils literal">tracemalloc</tt> remains efficient to identify a memory leak when a test can be run in a loop to make the leak more obvious (I was blocked at the beginning because the test failed when run a second time in a loop).</p> <p>Lessons learned? You should try to <strong>investigate every single failure of your CI</strong>. It is important to have a test suite with functional tests. &quot;Mock tests&quot; are fine to quickly write reliable tests, but there are not enough: functional tests make the difference.</p> <p>Thanks <strong>Richard Oudkerk</strong> for your great code to use Windows native APIs in <strong>asyncio</strong> and <strong>multiprocessing</strong>! I like <a class="reference external" href="https://en.wikipedia.org/wiki/Input/output_completion_port">Windows IOCP</a>, even if the asyncio implementation is quite complex :-)</p> <p>Ok, <tt class="docutils literal">_overlapped.Overlapped</tt> should now have a few less memory leaks :-)</p> </div> asyncio: WSARecv() cancellation causing data loss2019-01-31T15:20:00+01:002019-01-31T15:20:00+01:00Victor Stinnertag:vstinner.github.io,2019-01-31:/asyncio-proactor-wsarecv-cancellation-data-loss.html<a class="reference external image-reference" href="https://www.flickr.com/photos/joybot/6026542856/"> <img alt="Unlocked lock" src="https://vstinner.github.io/images/lock.jpg" /> </a> <p>In December 2017, <strong>Yury Selivanov</strong> pushed the long awaited <tt class="docutils literal">start_tls()</tt> function.</p> <p>A newly added test failed on Windows. Later, the test started to fail randomly on Linux as well. In fact, it was a well hidden race condition in the asynchronous handshake of <tt class="docutils literal">SSLProtocol</tt> which will take 5 months of …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/joybot/6026542856/"> <img alt="Unlocked lock" src="https://vstinner.github.io/images/lock.jpg" /> </a> <p>In December 2017, <strong>Yury Selivanov</strong> pushed the long awaited <tt class="docutils literal">start_tls()</tt> function.</p> <p>A newly added test failed on Windows. Later, the test started to fail randomly on Linux as well. In fact, it was a well hidden race condition in the asynchronous handshake of <tt class="docutils literal">SSLProtocol</tt> which will take 5 months of work to be identified and fixed. The bug wasn't a recent regression, but only spotted thanks to newly added tests.</p> <p>Even after this bug has been fixed, the same test still failed randomly on Windows! Once I found how to reproduce the bug, I understood that it's a <strong>very scary bug</strong>: <tt class="docutils literal">WSARecv()</tt> cancellation randomly caused <strong>data loss</strong>! Again, it was a very well hidden bug which likely existing since the early days of the <tt class="docutils literal">ProactorEventLoop</tt> implementation.</p> <p>Previous article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-connect-pipe-race-condition.html">Asyncio: Proactor ConnectPipe() Race Condition</a>. Next article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-wsasend-memory-leak.html">asyncio: WSASend() memory leak</a>.</p> <div class="section" id="new-start-tls-function"> <h2>New start_tls() function</h2> <p>The &quot;starttls&quot; feature have been requested since creation of asyncio. At October 24, 2013, <strong>Guido van Rossum</strong> created <a class="reference external" href="https://github.com/python/asyncio/issues/79">asyncio issue #79</a>:</p> <blockquote> <strong>Glyph [Lefkowitz]</strong> and <strong>Antoine [Pitrou]</strong> really want a API to upgrade an existing Transport/Protocol pair to SSL/TLS, without having to create a new protocol.</blockquote> <p>At March 23, 2015, <strong>Giovanni Cannata</strong> created <a class="reference external" href="https://bugs.python.org/issue23749">bpo-23749</a> which is basically the same feature request. I <a class="reference external" href="https://bugs.python.org/issue23749#msg239022">replied</a>:</p> <blockquote> asyncio got a new SSL implementation which makes possible to implement STARTTLS. Are you interested to implement it?</blockquote> <p><strong>Elizabeth Myers</strong>, <strong>Antoine Pitrou</strong>, <strong>Guido van Rossum</strong> and <strong>Yury Selivanov</strong> designed the feature. Yury <a class="reference external" href="https://bugs.python.org/issue23749#msg253495">wrote a prototype</a> in 2015 for PostgreSQL. In 2017, <strong>Barry Warsaw</strong> <a class="reference external" href="https://bugs.python.org/issue23749#msg293912">wrote his own implementation for SMTP</a>.</p> <p>At the end of 2017, <strong>four year</strong> after Guido van Rossum created the feature request, <strong>Yury Selivanov</strong> implemented the feature and pushed the <a class="reference external" href="https://github.com/python/cpython/commit/f111b3dcb414093a4efb9d74b69925e535ddc470">commit f111b3dc</a>:</p> <pre class="literal-block"> commit f111b3dcb414093a4efb9d74b69925e535ddc470 Author: Yury Selivanov &lt;yury&#64;magic.io&gt; Date: Sat Dec 30 00:35:36 2017 -0500 bpo-23749: Implement loop.start_tls() (#5039) </pre> </div> <div class="section" id="sslprotocol-race-condition"> <h2>SSLProtocol Race Condition</h2> <div class="section" id="test-fails-on-appveyor-windows-temporary-fix"> <h3>Test fails on AppVeyor (Windows): temporary fix</h3> <p>At December 30, 2017, just after Yury pushed his implementation of <tt class="docutils literal">start_tls()</tt> (the same day), <strong>Antoine Pitrou</strong> reported <a class="reference external" href="https://bugs.python.org/issue32458">bpo-32458</a>: it seems test_asyncio fails sporadically on AppVeyor:</p> <pre class="literal-block"> ERROR: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLS) ---------------------------------------------------------------------- Traceback (most recent call last): File &quot;C:\projects\cpython\lib\test\test_asyncio\test_sslproto.py&quot;, line 284, in test_start_tls_server_1 asyncio.wait_for(main(), loop=self.loop, timeout=10)) File &quot;C:\projects\cpython\lib\asyncio\base_events.py&quot;, line 440, in run_until_complete return future.result() File &quot;C:\projects\cpython\lib\asyncio\tasks.py&quot;, line 398, in wait_for raise futures.TimeoutError() concurrent.futures._base.TimeoutError </pre> <p><strong>Yury Selivanov</strong> <a class="reference external" href="https://bugs.python.org/issue32458#msg309254">wrote</a>:</p> <blockquote> I'm leaving on a two-weeks vacation today. To avoid risking breaking the workflow, I'll mask this tests on AppVeyor. I'll investigate this when I get back.</blockquote> <p>and skipped the test as a <strong>temporary fix</strong>, <a class="reference external" href="https://github.com/python/cpython/commit/0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a">commit 0c36bed1</a>:</p> <pre class="literal-block"> commit 0c36bed1c46d07ef91d3e02e69e974e4f3ecd31a Author: Yury Selivanov &lt;yury&#64;magic.io&gt; Date: Sat Dec 30 15:40:20 2017 -0500 bpo-32458: Temporarily mask start-tls proactor test on Windows (#5054) </pre> </div> <div class="section" id="bug-reproduced-on-linux"> <h3>Bug reproduced on Linux</h3> <p>At May 23, 2018, five month after the bug have been reported, <a class="reference external" href="https://bugs.python.org/issue32458#msg317468">I wrote</a>:</p> <blockquote> test_start_tls_server_1() just failed on my Linux. It likely depends on the system load.</blockquote> <p>Christian Heimes <a class="reference external" href="https://bugs.python.org/issue32458#msg317760">added</a>:</p> <blockquote> [On Linux,] It's failing reproducible with OpenSSL 1.1.1 and TLS 1.3 enabled. I haven't seen it failing with TLS 1.2 yet.</blockquote> <p>At May 28, 2018, I found a reliable way to <a class="reference external" href="https://bugs.python.org/issue32458#msg317833">reproduce the issue on Linux</a>:</p> <blockquote> <p>Open 3 terminals and run these commands in parallel:</p> <ol class="arabic simple"> <li><tt class="docutils literal">./python <span class="pre">-m</span> test test_asyncio <span class="pre">-m</span> test_start_tls_server_1 <span class="pre">-F</span></tt></li> <li><tt class="docutils literal">./python <span class="pre">-m</span> test <span class="pre">-j16</span> <span class="pre">-r</span></tt></li> <li><tt class="docutils literal">./python <span class="pre">-m</span> test <span class="pre">-j16</span> <span class="pre">-r</span></tt></li> </ol> <p>It's a <strong>race condition</strong> which doesn't depend on the OS, but on the system load.</p> </blockquote> </div> <div class="section" id="root-issue-identified"> <h3>Root issue identified</h3> <p>Once I found how to reproduce the bug, I was able to investigate it. I created <a class="reference external" href="https://bugs.python.org/issue33674">bpo-33674</a>.</p> <p>I found a race condition in <tt class="docutils literal">SSLProtocol</tt> of <tt class="docutils literal">asyncio/sslproto.py</tt>. Sometimes, <tt class="docutils literal">_sslpipe.feed_ssldata()</tt> is called before <tt class="docutils literal">_sslpipe.shutdown()</tt>.</p> <ul class="simple"> <li><tt class="docutils literal">SSLProtocol.connection_made()</tt> -&gt; <tt class="docutils literal">SSLProtocol._start_handshake()</tt>: <tt class="docutils literal">self._loop.call_soon(self._process_write_backlog)</tt></li> <li><tt class="docutils literal">SSLProtoco.data_received()</tt>: direct call to <tt class="docutils literal">self._sslpipe.feed_ssldata(data)</tt></li> <li>Later, <tt class="docutils literal">self._process_write_backlog()</tt> calls <tt class="docutils literal">self._sslpipe.do_handshake()</tt></li> </ul> <p>The first <strong>write</strong> is <strong>delayed</strong> by <tt class="docutils literal">call_soon()</tt>, whereas the first <strong>read</strong> is a <strong>direct call</strong> to the SSL pipe.</p> <p>Workaround:</p> <pre class="literal-block"> diff --git a/Lib/asyncio/sslproto.py b/Lib/asyncio/sslproto.py index 2bfa45dd15..4a5dbb38a1 100644 --- a/Lib/asyncio/sslproto.py +++ b/Lib/asyncio/sslproto.py &#64;&#64; -592,7 +592,7 &#64;&#64; class SSLProtocol(protocols.Protocol): # (b'', 1) is a special value in _process_write_backlog() to do # the SSL handshake self._write_backlog.append((b'', 1)) - self._loop.call_soon(self._process_write_backlog) + self._process_write_backlog() self._handshake_timeout_handle = \ self._loop.call_later(self._ssl_handshake_timeout, self._check_handshake_timeout) </pre> <p>Yury Selivanov wrote:</p> <blockquote> <p><strong>The fix is correct and the bug is now obvious</strong>: <tt class="docutils literal">data_received()</tt> occurs pretty much any time after <tt class="docutils literal">connection_made()</tt> call; if <tt class="docutils literal">call_soon()</tt> is used in <tt class="docutils literal">connection_made()</tt>, <tt class="docutils literal">data_received()</tt> may find the protocol in an incorrect state.</p> <p><strong>Kudos Victor for debugging this.</strong></p> </blockquote> <p>I pushed <a class="reference external" href="https://github.com/python/cpython/commit/be00a5583a2cb696335c527b921d1868266a42c6">commit be00a558</a>:</p> <pre class="literal-block"> commit be00a5583a2cb696335c527b921d1868266a42c6 Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Tue May 29 01:33:35 2018 +0200 bpo-33674: asyncio: Fix SSLProtocol race (GH-7175) Fix a race condition in SSLProtocol.connection_made() of asyncio.sslproto: start immediately the handshake instead of using call_soon(). Previously, data_received() could be called before the handshake started, causing the handshake to hang or fail. </pre> <p>... the change is basically a single line change:</p> <pre class="literal-block"> - self._loop.call_soon(self._process_write_backlog) + self._process_write_backlog() </pre> <p>I closed <a class="reference external" href="https://bugs.python.org/issue32458">bpo-32458</a> and <strong>Yury Selivanov</strong> closed <a class="reference external" href="https://bugs.python.org/issue33674">bpo-33674</a>.</p> </div> <div class="section" id="not-a-regression"> <h3>Not a regression</h3> <p>The SSLProtocol race condition wasn't new: it existed since January 2015, <a class="reference external" href="https://github.com/python/cpython/commit/231b404cb026649d4b7172e75ac394ef558efe60">commit 231b404c</a>:</p> <pre class="literal-block"> commit 231b404cb026649d4b7172e75ac394ef558efe60 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Wed Jan 14 00:19:09 2015 +0100 Issue #22560: New SSL implementation based on ssl.MemoryBIO The new SSL implementation is based on the new ssl.MemoryBIO which is only available on Python 3.5. On Python 3.4 and older, the legacy SSL implementation (using SSL_write, SSL_read, etc.) is used. The proactor event loop only supports the new implementation. The new asyncio.sslproto module adds _SSLPipe, SSLProtocol and _SSLProtocolTransport classes. _SSLPipe allows to &quot;wrap&quot; or &quot;unwrap&quot; a socket (switch between cleartext and SSL/TLS). Patch written by Antoine Pitrou. sslproto.py is based on gruvi/ssl.py of the gruvi project written by Geert Jansen. This change adds SSL support to ProactorEventLoop on Python 3.5 and newer! It becomes also possible to implement STARTTTLS: switch a cleartext socket to SSL. </pre> <p>This is the new cool asynchronous SSL implementation written by <strong>Antoine Pitrou</strong> and <strong>Geert Jansen</strong>. It took <strong>3 years</strong> and <strong>new functional tests</strong> to discover the race condition.</p> </div> </div> <div class="section" id="wsarecv-cancellation-causing-data-loss"> <h2>WSARecv() cancellation causing data loss</h2> <div class="section" id="yet-another-very-boring-buildbot-test-failure"> <h3>Yet another very boring buildbot test failure</h3> <p>At May 30, 2018, the day after I fixed SSLProtocol race condition, I created <a class="reference external" href="https://bugs.python.org/issue33694">bpo-33694</a>.</p> <p>test_asyncio.test_start_tls_server_1() got multiple fixes recently (see <a class="reference external" href="https://bugs.python.org/issue32458">bpo-32458</a> and <a class="reference external" href="https://bugs.python.org/issue33674">bpo-33674</a>)... but it still fails on Python on x86 Windows7 3.x at revision bb9474f1fb2fc7c7ed9f826b78262d6a12b5f9e8 which contains all these fixes.</p> <p>The test fails even when test_asyncio is re-run alone (not when other tests run in parallel).</p> <p>Example of failure:</p> <pre class="literal-block"> ERROR: test_start_tls_server_1 (test.test_asyncio.test_sslproto.ProactorStartTLSTests) ---------------------------------------------------------------------- Traceback (most recent call last): File &quot;...\lib\test\test_asyncio\test_sslproto.py&quot;, line 467, in test_start_tls_server_1 self.loop.run_until_complete(run_main()) File &quot;...\lib\asyncio\base_events.py&quot;, line 566, in run_until_complete raise RuntimeError('Event loop stopped before Future completed.') RuntimeError: Event loop stopped before Future completed. </pre> <p>The test fails also on x86 Windows7 3.7. Moreover, 3.7 got an additional failure:</p> <pre class="literal-block"> ERROR: test_pipe_handle (test.test_asyncio.test_windows_utils.PipeTests) ---------------------------------------------------------------------- Traceback (most recent call last): File &quot;...\lib\test\test_asyncio\test_windows_utils.py&quot;, line 73, in test_pipe_handle raise RuntimeError('expected ERROR_INVALID_HANDLE') RuntimeError: expected ERROR_INVALID_HANDLE </pre> </div> <div class="section" id="unable-to-reproduce-the-bug"> <h3>Unable to reproduce the bug</h3> <p><strong>Yury Selivanov</strong> <a class="reference external" href="https://bugs.python.org/issue33694#msg318193">failed to reproduce the issue</a> in Windows 7 VM (on macOS) using:</p> <ol class="arabic simple"> <li>run <tt class="docutils literal">test_asyncio</tt></li> <li>run <tt class="docutils literal">test_asyncio.test_sslproto</tt></li> <li>run <tt class="docutils literal">test_asyncio.test_sslproto <span class="pre">-m</span> test_start_tls_server_1</tt></li> </ol> <p><strong>Andrew Svetlov</strong> <a class="reference external" href="https://bugs.python.org/issue33694#msg318194">added</a>:</p> <blockquote> I used <tt class="docutils literal">SNDBUF</tt> to enforce send buffer overloading. It is not required by sendfile tests but I thought that better to have non-mocked way to test such situations. We can remove the socket buffers size manipulation at all without any problem.</blockquote> <p>But Yury Selivanov <a class="reference external" href="https://bugs.python.org/issue33694#msg318195">replied</a>:</p> <blockquote> When I tried to do that I think <strong>I was having more failures</strong> with that test. But really up to you.</blockquote> <p>Next days, I reported more and more similar failures on Windows buildbots and AppVeyor (our Windows CI).</p> </div> <div class="section" id="root-issue-identified-pause-reading"> <h3>Root issue identified: pause_reading()</h3> <p>Since this bug became more and more frequent, I decided to work on it. Yury and Andrew failed to reproduce it.</p> <p>At June 7, 2018, I managed to <strong>reproduce the bug on Linux</strong> by <a class="reference external" href="https://bugs.python.org/issue33694#msg318869">inserting a sleep at the right place</a>... I understood one hour later that my patch is wrong: &quot;it introduces a bug in the test&quot;.</p> <p>On the other hand, I found the root cause: calling <tt class="docutils literal">pause_reading()</tt> and <tt class="docutils literal">resume_reading()</tt> on the transport is not safe. Sometimes, we loose data. See the <strong>ugly hack</strong> described in the TODO comment below:</p> <pre class="literal-block"> class _ProactorReadPipeTransport(_ProactorBasePipeTransport, transports.ReadTransport): &quot;&quot;&quot;Transport for read pipes.&quot;&quot;&quot; (...) def pause_reading(self): if self._closing or self._paused: return self._paused = True if self._read_fut is not None and not self._read_fut.done(): # TODO: This is an ugly hack to cancel the current read future # *and* avoid potential race conditions, as read cancellation # goes through `future.cancel()` and `loop.call_soon()`. # We then use this special attribute in the reader callback to # exit *immediately* without doing any cleanup/rescheduling. self._read_fut.__asyncio_cancelled_on_pause__ = True self._read_fut.cancel() self._read_fut = None self._reschedule_on_resume = True if self._loop.get_debug(): logger.debug(&quot;%r pauses reading&quot;, self) </pre> <p>If you remove the &quot;ugly hack&quot;, the test no longer hangs...</p> <p>Extract of <tt class="docutils literal">_ProactorReadPipeTransport.set_transport()</tt>:</p> <pre class="literal-block"> if self.is_reading(): # reset reading callback / buffers / self._read_fut self.pause_reading() self.resume_reading() </pre> <p>This method <strong>cancels the pending overlapped</strong> <tt class="docutils literal">WSARecv()</tt>, and then creates a new overlapped <tt class="docutils literal">WSARecv()</tt>.</p> <p>Even after <tt class="docutils literal">CancelIoEx(old overlapped)</tt>, the IOCP loop still gets an event for the completion of the cancelled overlapped <tt class="docutils literal">WSARecv()</tt>. Problem: <strong>since the Python future is cancelled, the event is ignored and so 176 bytes of data are lost</strong>.</p> <p>I'm surprised that an overlapped <tt class="docutils literal">WSARecv()</tt> <strong>cancelled</strong> by <tt class="docutils literal">CancelIoEx()</tt> still returns data when IOCP polls for events.</p> <p>Something else. The bug occurs when <tt class="docutils literal">CancelIoEx()</tt> (on the current overlapped <tt class="docutils literal">WSARecv()</tt>) fails internally with <tt class="docutils literal">ERROR_NOT_FOUND</tt>. According to overlapped.c, it means:</p> <pre class="literal-block"> /* CancelIoEx returns ERROR_NOT_FOUND if the I/O completed in-between */ </pre> <p><tt class="docutils literal">HasOverlappedIoCompleted()</tt> returns 0 in that case.</p> <p>The problem is that currently, <tt class="docutils literal">Overlapped.cancel()</tt> also returns <tt class="docutils literal">None</tt> in that case, and later the asyncio IOCP loop ignores the completion event and so <strong>drops incoming received data</strong>.</p> </div> <div class="section" id="release-blocker-bug"> <h3>Release blocker bug?</h3> <p>Yury, Andrew, Ned: I set the priority to release blocker because I'm scared by what I saw. The START TLS has a race condition in its ProactorEventLoop implementation. But the bug doesn't see to be specific to START TLS, but rather to <tt class="docutils literal">transport.set_transport()</tt>, and even more generally to <tt class="docutils literal">transport.pause_reading()</tt> / <tt class="docutils literal">transport.resume_reading()</tt>. The bug is quite severe: we loose data and it's really hard to know why (I spent a few hours to add many many print and try to reproduce on a very tiny reliable unit test). As an asyncio user, I expect that transports are 100% reliable, and I would first look into my code (like looking into <tt class="docutils literal">start_tls()</tt> implementation in my case).</p> <p>If the bug was very specific to <tt class="docutils literal">start_tls()</tt>, I would suggest to &quot;just&quot; &quot;disable&quot; start_tls() on ProactorEventLoop (sorry, Windows!). But since the data loss seems to concern basically any application using <tt class="docutils literal">ProactorEventLoop</tt>, I don't see any simple workaround.</p> <p><strong>My hope is that a fix can be written shortly</strong> to not block the 3.7.0 final release for too long :-(</p> <p>Yury, Andrew: Can you please just confirm that it's a regression and that a release blocker is justified?</p> </div> <div class="section" id="functional-test-reproducing-the-bug"> <h3>Functional test reproducing the bug</h3> <p>I wrote <a class="reference external" href="https://bugs.python.org/file47632/race.py">race.py script</a>: simple echo client and server sending packets in both directions. Pause/resume reading the client transport every 100 ms to trigger the bug.</p> <p>Using <tt class="docutils literal">ProactorEventLoop</tt> and 2000 packets of 16 KiB, I easily reproduce the bug.</p> <p>So again, it's nothing related to <tt class="docutils literal">start_tls()</tt>, <tt class="docutils literal">start_tls()</tt> was just one way to spot the bug.</p> <p>The bug is in Proactor transport: the cancellation of overlapped <tt class="docutils literal">WSARecv()</tt> sometime drops packets. The bug occurs when <tt class="docutils literal">CancelIoEx()</tt> fails with <tt class="docutils literal">ERROR_NOT_FOUND</tt> which means that the I/O (<tt class="docutils literal">WSARecv()</tt>) completed.</p> <p>One solution would be to not cancel <tt class="docutils literal">WSARecv()</tt> on pause_reading(): wait until the current <tt class="docutils literal">WSARecv()</tt> completes, store data somewhere but don't pass it to <tt class="docutils literal">protocol.data_received()</tt>, and don't schedule a new <tt class="docutils literal">WSARecv()</tt>. Once reading is resumed: call <tt class="docutils literal">protocol.data_received()</tt> and schedule a new <tt class="docutils literal">WSARecv()</tt>.</p> <p>That would be a workaround. I don't know how to really fix <tt class="docutils literal">WSARecv()</tt> cancellation without loosing data. A good start would be to modify <tt class="docutils literal">Overlapped.cancel()</tt> to return a boolean to notice if the overlapped I/O completed even if we just cancelled it. Currently, the corner case (<tt class="docutils literal">CancelIoEx()</tt> fails with <tt class="docutils literal">ERROR_NOT_FOUND</tt>) is silently ignored, and then the IOCP loop silently ignores the event of completed I/O...</p> </div> <div class="section" id="fix-the-bug-no-longer-cancel-wsarecv"> <h3>Fix the bug: no longer cancel WSARecv()</h3> <p>At June 8, 2018, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/79790bc35fe722a49977b52647f9b5fe1deda2b7">commit 79790bc3</a>:</p> <pre class="literal-block"> commit 79790bc35fe722a49977b52647f9b5fe1deda2b7 Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Fri Jun 8 00:25:52 2018 +0200 bpo-33694: Fix race condition in asyncio proactor (GH-7498) The cancellation of an overlapped WSARecv() has a race condition which causes data loss because of the current implementation of proactor in asyncio. No longer cancel overlapped WSARecv() in _ProactorReadPipeTransport to work around the race condition. Remove the optimized recv_into() implementation to get simple implementation of pause_reading() using the single _pending_data attribute. Move _feed_data_to_bufferred_proto() to protocols.py. Remove set_protocol() method which became useless. </pre> <p>I fixed the root issue (in Python 3.7 and future Python 3.8).</p> <p>I used my <tt class="docutils literal">race.py</tt> script to validate that the issue is fixed for real.</p> </div> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>I fixed one race condition in the asynchronous handshake of <tt class="docutils literal">SSLProtocol</tt>.</p> <p>I found and fixed a data loss bug caused by <tt class="docutils literal">WSARecv()</tt> cancellation.</p> <p>Lessons learnt from these two bugs:</p> <ul class="simple"> <li>You should <strong>write an extensive test suite</strong> for your code.</li> <li>You should <strong>keep an eye on your continuous integration (CI)</strong>: any tiny test failure can hide a very severe bug.</li> </ul> </div> Asyncio: Proactor ConnectPipe() Race Condition2019-01-30T18:00:00+01:002019-01-30T18:00:00+01:00Victor Stinnertag:vstinner.github.io,2019-01-30:/asyncio-proactor-connect-pipe-race-condition.html<a class="reference external image-reference" href="https://www.flickr.com/photos/phrawr/7612947262/"> <img alt="Pipes" src="https://vstinner.github.io/images/pipes.jpg" /> </a> <p>Between December 2014 and January 2015, once I succeeded to fix the root issue of the random asyncio crashes on Windows (<a class="reference external" href="https://vstinner.github.io/asyncio-proactor-cancellation-from-hell.html">Proactor Cancellation From Hell</a>), I fixed more race conditions and bugs in <tt class="docutils literal">ProactorEventLoop</tt>:</p> <ul class="simple"> <li><tt class="docutils literal">ConnectPipe()</tt> Race Condition</li> <li>Race Condition in <tt class="docutils literal">BaseSubprocessTransport._try_finish()</tt></li> <li>Close the transport on failure: ResourceWarning</li> <li>Cleanup code …</li></ul><a class="reference external image-reference" href="https://www.flickr.com/photos/phrawr/7612947262/"> <img alt="Pipes" src="https://vstinner.github.io/images/pipes.jpg" /> </a> <p>Between December 2014 and January 2015, once I succeeded to fix the root issue of the random asyncio crashes on Windows (<a class="reference external" href="https://vstinner.github.io/asyncio-proactor-cancellation-from-hell.html">Proactor Cancellation From Hell</a>), I fixed more race conditions and bugs in <tt class="docutils literal">ProactorEventLoop</tt>:</p> <ul class="simple"> <li><tt class="docutils literal">ConnectPipe()</tt> Race Condition</li> <li>Race Condition in <tt class="docutils literal">BaseSubprocessTransport._try_finish()</tt></li> <li>Close the transport on failure: ResourceWarning</li> <li>Cleanup code handling pipes</li> </ul> <p>Previous article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-cancellation-from-hell.html">Proactor Cancellation From Hell</a>. Next article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-wsarecv-cancellation-data-loss.html">asyncio: WSARecv() cancellation causing data loss</a>.</p> <div class="section" id="connectpipe-race-condition"> <h2>ConnectPipe() Race Condition</h2> <p>Once I succeeded to fix the root issue of the random asyncio crashes on Windows (<a class="reference external" href="https://vstinner.github.io/asyncio-proactor-cancellation-from-hell.html">Proactor Cancellation From Hell</a>), I started to look at the ConnectPipe special case: <a class="reference external" href="https://github.com/python/asyncio/issues/204">asyncio issue #204: Investigate IocpProactor.accept_pipe() special case (don't register overlapped)</a> (issue created at 25 Aug 2014).</p> <p>At January 21, 2015, I opened <a class="reference external" href="https://bugs.python.org/issue23293">bpo-23293: race condition related to IocpProactor.connect_pipe()</a>.</p> <p>While fixing <a class="reference external" href="https://bugs.python.org/issue23095">bpo-23095 (race condition when cancelling a _WaitHandleFuture)</a>, I saw that <tt class="docutils literal">IocpProactor.connect_pipe()</tt> causes &quot;GetQueuedCompletionStatus() returned an unexpected event&quot; messages to be logged, but also to hang the test suite.</p> <p><tt class="docutils literal">IocpProactor._register()</tt> contains the comment:</p> <pre class="literal-block"> # Even if GetOverlappedResult() was called, we have to wait for the # notification of the completion in GetQueuedCompletionStatus(). # Register the overlapped operation to keep a reference to the # OVERLAPPED object, otherwise the memory is freed and Windows may # read uninitialized memory. # # For an unknown reason, ConnectNamedPipe() behaves differently: # the completion is not notified by GetOverlappedResult() if we # already called GetOverlappedResult(). For this specific case, we # don't expect notification (register is set to False). </pre> <p><tt class="docutils literal">IocpProactor.close()</tt> contains this comment:</p> <pre class="literal-block"> # The operation was started with connect_pipe() which # queues a task to Windows' thread pool. This cannot # be cancelled, so just forget it. </pre> <p><tt class="docutils literal">IocpProactor.connect_pipe()</tt> is implemented with <tt class="docutils literal">QueueUserWorkItem()</tt> which <strong>starts a thread that cannot be interrupted</strong>. Because of that, this function requires special cases in <tt class="docutils literal">_register()</tt> and <tt class="docutils literal">close()</tt> methods of <tt class="docutils literal">IocpProactor</tt>.</p> <p>I proposed a solution to reimplement <tt class="docutils literal">IocpProactor.connect_pipe()</tt> <strong>without a thread</strong>: <a class="reference external" href="https://code.google.com/p/tulip/issues/detail?id=197">asyncio issue #197: Rewrite IocpProactor.connect_pipe() with non-blocking calls to avoid non interruptible QueueUserWorkItem()</a>.</p> <p>At January 22, 2015, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/7ffa2c5fdda8a9cc254edf67c4458b15db1252fa">commit 7ffa2c5f</a>:</p> <pre class="literal-block"> commit 7ffa2c5fdda8a9cc254edf67c4458b15db1252fa Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Thu Jan 22 22:55:08 2015 +0100 Issue #23293, asyncio: Rewrite IocpProactor.connect_pipe() </pre> <p>The change adds <tt class="docutils literal">_overlapped.ConnectPipe()</tt> which tries to connect to the pipe for asynchronous I/O (overlapped): <strong>call CreateFile() in a loop until it doesn't fail with ERROR_PIPE_BUSY</strong>. Use an increasing delay between 1 ms and 100 ms.</p> </div> <div class="section" id="race-condition-in-basesubprocesstransport-try-finish"> <h2>Race Condition in BaseSubprocessTransport._try_finish()</h2> <p>If the process exited before the <tt class="docutils literal">_post_init()</tt> method was called, scheduling the call to <tt class="docutils literal">_call_connection_lost()</tt> with <tt class="docutils literal">call_soon()</tt> is wrong: <tt class="docutils literal">connection_made()</tt> must be called before <tt class="docutils literal">connection_lost()</tt>.</p> <p>Reuse the <tt class="docutils literal">BaseSubprocessTransport._call()</tt> method to schedule the call to <tt class="docutils literal">_call_connection_lost()</tt> to ensure that <tt class="docutils literal">connection_made()</tt> and <tt class="docutils literal">connection_lost()</tt> are called in the correct order.</p> <p>At Dec 18, 2014, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/1b9763d0a9c62c13dc2a06770032e5906b610c96">commit 1b9763d0</a>. The explanation is long, but the change is basically a single line change, extract:</p> <pre class="literal-block"> - self._loop.call_soon(self._call_connection_lost, None) + self._call(self._call_connection_lost, None) </pre> <p><strong>Ordering properly callbacks in asyncio is challenging!</strong> The order matters for the semantics of asyncio: it is part of the design of the <a class="reference external" href="https://www.python.org/dev/peps/pep-3156/">PEP 3156 -- Asynchronous IO Support Rebooted: the &quot;asyncio&quot; Module</a>.</p> </div> <div class="section" id="close-the-transport-on-failure-resourcewarning"> <h2>Close the transport on failure: ResourceWarning</h2> <p>At January 15, 2015, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/4bf22e033e975f61c33752db5a3764dc0f7d0b03">commit 4bf22e03</a>, extract:</p> <pre class="literal-block"> - yield from transp._post_init() + try: + yield from transp._post_init() + except: + transp.close() + raise </pre> <p>Later, I will spend a lot of time (push many more changes) to ensure that resources are properly released (especially close transports on failure, similar to this change).</p> <p>I will add many <strong>ResourceWarnings</strong> warnings in destructors when a transport, subprocess or event loop is not closed explicitly.</p> <p>For example, notice the <tt class="docutils literal">ResourceWarnings</tt> in the current destructor of <tt class="docutils literal">_SelectorTransport</tt>:</p> <pre class="literal-block"> class _SelectorTransport(transports._FlowControlMixin, transports.Transport): def __del__(self, _warn=warnings.warn): if self._sock is not None: _warn(f&quot;unclosed transport {self!r}&quot;, ResourceWarning, source=self) self._sock.close() </pre> <p>I even enhanced Python 3.6 to be able to provide the <strong>traceback where the leaked resource has been allocated</strong> thanks to my <tt class="docutils literal">tracemalloc</tt> module. Example with <tt class="docutils literal">filebug.py</tt>:</p> <pre class="literal-block"> def func(): f = open(__file__) f = None func() </pre> <p>Output with Python 3.6:</p> <pre class="literal-block"> $ python3 -Wd -X tracemalloc=5 filebug.py filebug.py:3: ResourceWarning: unclosed file &lt;_io.TextIOWrapper name='filebug.py' mode='r' encoding='UTF-8'&gt; f = None Object allocated at (most recent call first): File &quot;filebug.py&quot;, lineno 2 f = open(__file__) File &quot;filebug.py&quot;, lineno 5 func() </pre> <p>The line where the warning is emitted is usually useless to understand the bug, whereas the traceback is very useful to identify the leaked resource.</p> <p>See <a class="reference external" href="https://pythondev.readthedocs.io/debug_tools.html#resourcewarning">my ResourceWarning documentation</a>.</p> </div> <div class="section" id="cleanup-code-handling-pipes"> <h2>Cleanup code handling pipes</h2> <p>Thanks to the new implementation of <tt class="docutils literal">connect_pipe()</tt>, I was able to push changes to simplify the code and remove various hacks in code handling pipes.</p> <p><a class="reference external" href="https://github.com/python/cpython/commit/2b77c5467f376257ae22cbfbcb3a0e5e6349e92d">commit 2b77c546</a>:</p> <pre class="literal-block"> commit 2b77c5467f376257ae22cbfbcb3a0e5e6349e92d Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Thu Jan 22 23:50:03 2015 +0100 asyncio, Tulip issue 204: Fix IocpProactor.accept_pipe() Overlapped.ConnectNamedPipe() now returns a boolean: True if the pipe is connected (if ConnectNamedPipe() failed with ERROR_PIPE_CONNECTED), False if the connection is in progress. This change removes multiple hacks in IocpProactor. </pre> <p><a class="reference external" href="https://github.com/python/cpython/commit/3d2256f671b7ed5c769dd34b27ae597cbc69047c">commit 3d2256f6</a>:</p> <pre class="literal-block"> commit 3d2256f671b7ed5c769dd34b27ae597cbc69047c Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Mon Jan 26 11:02:59 2015 +0100 Issue #23293, asyncio: Cleanup IocpProactor.close() The special case for connect_pipe() is not more needed. connect_pipe() doesn't use overlapped operations anymore. </pre> <p><a class="reference external" href="https://github.com/python/cpython/commit/a19b7b3fcafe52b98245e14466ffc4d6750ca4f1">commit a19b7b3f</a>:</p> <pre class="literal-block"> commit a19b7b3fcafe52b98245e14466ffc4d6750ca4f1 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Mon Jan 26 15:03:20 2015 +0100 asyncio: Fix ProactorEventLoop.start_serving_pipe() If a client connected before the server was closed: drop the client (close the pipe) and exit. </pre> <p><a class="reference external" href="https://github.com/python/cpython/commit/e0fd157ba0cc92e435e7520b4ff641ca68d72244">commit e0fd157b</a>:</p> <pre class="literal-block"> commit e0fd157ba0cc92e435e7520b4ff641ca68d72244 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Mon Jan 26 15:04:03 2015 +0100 Issue #23293, asyncio: Rewrite IocpProactor.connect_pipe() as a coroutine Use a coroutine with asyncio.sleep() instead of call_later() to ensure that the schedule call is cancelled. Add also a unit test cancelling connect_pipe(). </pre> <p><a class="reference external" href="https://github.com/python/cpython/commit/41063d2a59a24e257cd9ce62137e36c862e3ab1e">commit 41063d2a</a>:</p> <pre class="literal-block"> commit 41063d2a59a24e257cd9ce62137e36c862e3ab1e Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Mon Jan 26 22:30:49 2015 +0100 asyncio, Tulip issue 204: Fix IocpProactor.recv() If ReadFile() fails with ERROR_BROKEN_PIPE, the operation is not pending: don't register the overlapped. I don't know if WSARecv() can fail with ERROR_BROKEN_PIPE. Since Overlapped.WSARecv() already handled ERROR_BROKEN_PIPE, let me guess that it has the same behaviour than ReadFile(). </pre> </div> Asyncio: Proactor Cancellation From Hell2019-01-28T20:20:00+01:002019-01-28T20:20:00+01:00Victor Stinnertag:vstinner.github.io,2019-01-28:/asyncio-proactor-cancellation-from-hell.html<img alt="South Park Hell" src="https://vstinner.github.io/images/south_park_hell.jpg" /> <p>Between 2014 and 2015, I was working on the new shiny <tt class="docutils literal">asyncio</tt> module (module added to Python 3.4 released in March 2014). I helped to stabilize the Windows implementation because... well, nobody else was paying attention to it, and I was worried that test_asyncio <strong>randomly crashed</strong> on Windows.</p> <p>One …</p><img alt="South Park Hell" src="https://vstinner.github.io/images/south_park_hell.jpg" /> <p>Between 2014 and 2015, I was working on the new shiny <tt class="docutils literal">asyncio</tt> module (module added to Python 3.4 released in March 2014). I helped to stabilize the Windows implementation because... well, nobody else was paying attention to it, and I was worried that test_asyncio <strong>randomly crashed</strong> on Windows.</p> <p>One bug really annoyed me, I started to fix it in July 2014, but I only succeeded to <strong>fix the root issue</strong> in January 2015: <strong>six months later</strong>!</p> <p>It was really difficult to find documentation on IOCP and asynchronous programming on Windows. <strong>I had to ask for help to someone who had access to the Windows source code</strong> to understand the bug...</p> <p><strong>Spoiler:</strong> cancelling an overlapped <tt class="docutils literal">RegisterWaitForSingleObject()</tt> with <tt class="docutils literal">UnregisterWait()</tt> is asynchronous. The asynchronous part is not well documented and it took me months of debug to understand it. Moreover, the bug was well hidden for various reasons that we will see below.</p> <p>Next article: <a class="reference external" href="https://vstinner.github.io/asyncio-proactor-connect-pipe-race-condition.html">Asyncio: Proactor ConnectPipe() Race Condition</a>.</p> <div class="section" id="fix-cancel-when-called-twice"> <h2>Fix cancel() when called twice</h2> <p>July 2014, <a class="reference external" href="https://github.com/python/asyncio/issues/195">asyncio issue #195</a>: while working on a <tt class="docutils literal">SIGINT</tt> signal handler for the <tt class="docutils literal">ProactorEventLoop</tt> on Windows (<a class="reference external" href="https://github.com/python/asyncio/issues/195">asyncio issue #191</a>), I hit a bug on Windows: <tt class="docutils literal">_WaitHandleFuture.cancel()</tt> crash if the wait event was already unregistered by <tt class="docutils literal">finish_wait_for_handle()</tt>. The bug was that <tt class="docutils literal">UnregisterWait()</tt> was called twice.</p> <p>I pushed <a class="reference external" href="https://github.com/python/cpython/commit/fea6a100dc51012cb0187374ad31de330ebc0035">commit fea6a100</a> to fix this crash:</p> <pre class="literal-block"> commit fea6a100dc51012cb0187374ad31de330ebc0035 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Fri Jul 25 00:54:53 2014 +0200 Improve stability of the proactor event loop, especially operations on overlapped objects (...) </pre> <p>Main changes:</p> <ul class="simple"> <li>Fix a crash: <strong>don't call UnregisterWait() twice if a _WaitHandleFuture is cancelled twice</strong>.</li> <li>Fix another crash: <tt class="docutils literal">_OverlappedFuture.cancel()</tt> doesn't cancel the overlapped anymore if it is already cancelled or completed. Log also an error if the cancellation failed.</li> <li><tt class="docutils literal">IocpProactor.close()</tt> now cancels futures rather than cancelling directly underlaying overlapped objects.</li> <li>Add a destructor to the <tt class="docutils literal">IocpProactor</tt> class which closes it</li> </ul> </div> <div class="section" id="clear-reference-from-overlappedfuture-to-overlapped"> <h2>Clear reference from _OverlappedFuture to overlapped</h2> <p>July 2014, I created <a class="reference external" href="https://github.com/python/asyncio/issues/196">asyncio issue #196</a>: <tt class="docutils literal">_OverlappedFuture.set_result()</tt> should clear the its reference to the overlapped object.</p> <p>It is important to explicitly clear references to Python objects as soon as possible to release resources. Otherwise, an object can remain alive longer than expected.</p> <p>I noticed that _OverlappedFuture kept a reference to the undelying overlapped object even after the asynchronous operation completed. I started to work on a fix but I had many issues to fix completely this bug... it is just the beginning of a long journey.</p> <div class="section" id="clear-the-reference-on-cancellation-and-error"> <h3>Clear the reference on cancellation and error</h3> <p>I pushed a first fix: <a class="reference external" href="https://github.com/python/cpython/commit/18a28dc5c28ae9a953f537486780159ddb768702">commit 18a28dc5</a> clears the reference to the overlapped in <tt class="docutils literal">cancel()</tt> and <tt class="docutils literal">set_exception()</tt> methods of <tt class="docutils literal">_OverlappedFuture</tt>:</p> <pre class="literal-block"> commit 18a28dc5c28ae9a953f537486780159ddb768702 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Fri Jul 25 13:05:20 2014 +0200 * _OverlappedFuture.cancel() now clears its reference to the overlapped object. Make also the _OverlappedFuture.ov attribute private. * _OverlappedFuture.set_exception() now cancels the overlapped operation. * (...) </pre> <p>I started by this change because it didn't make the tests less stable.</p> </div> <div class="section" id="clear-the-reference-in-poll"> <h3>Clear the reference in poll()</h3> <p>Clearing the reference to the overlapped in <tt class="docutils literal">cancel()</tt> and <tt class="docutils literal">set_exception()</tt> <strong>works well</strong>. But when I try to do the same on success (in <tt class="docutils literal">set_result()</tt>), <strong>I get random errors</strong>. Example:</p> <pre class="literal-block"> C:\haypo\tulip&gt;\python33\python.exe runtests.py test_pipe ... Exception RuntimeError: '&lt;_overlapped.Overlapped object at 0x00000000035E7660&gt; s till has pending operation at deallocation, the process may crash' ignored ... Fatal read error on pipe transport protocol: &lt;asyncio.streams.StreamReaderProtocol object at 0x00000000035EE668&gt; transport: &lt;_ProactorDuplexPipeTransport fd=348&gt; Traceback (most recent call last): File &quot;C:\haypo\tulip\asyncio\proactor_events.py&quot;, line 159, in _loop_reading data = fut.result() # deliver data later in &quot;finally&quot; clause File &quot;C:\haypo\tulip\asyncio\futures.py&quot;, line 271, in result raise self._exception File &quot;C:\haypo\tulip\asyncio\windows_events.py&quot;, line 488, in _poll value = callback(transferred, key, ov) File &quot;C:\haypo\tulip\asyncio\windows_events.py&quot;, line 279, in finish_recv return ov.getresult() OSError: [WinError 996] Overlapped I/O event is not in a signaled state ... </pre> <p>It seems that the problem only occurs in the fast-path of <tt class="docutils literal">IocpProactor._register()</tt>, when the overlapped is not added to <tt class="docutils literal">_cache</tt>.</p> <p>Clearing the reference in <tt class="docutils literal">_poll()</tt>, when <tt class="docutils literal">GetQueuedCompletionStatus()</tt> read the status, <strong>works</strong>! I pushed a second fix, <a class="reference external" href="https://github.com/python/cpython/commit/65dd69a3da16257bd86b92900e5ec5a8dd26f1d9">commit 65dd69a3</a> changes <tt class="docutils literal">_poll()</tt>:</p> <pre class="literal-block"> commit 65dd69a3da16257bd86b92900e5ec5a8dd26f1d9 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Fri Jul 25 22:36:05 2014 +0200 IocpProactor._poll() clears the reference to the overlapped operation when the operation is done. (...) </pre> </div> <div class="section" id="ignore-false-alarms"> <h3>Ignore false alarms</h3> <p>I tried to add the overlapped into <tt class="docutils literal">_cache</tt> but <strong>then the event loop started to hang or to fail with new errors</strong>.</p> <p>I analyzed an overlapped <tt class="docutils literal">WSARecv()</tt> which has been cancelled. Just after calling <tt class="docutils literal">CancelIoEx()</tt>, <tt class="docutils literal">HasOverlappedIoCompleted()</tt> returns 0.</p> <p>Even after <tt class="docutils literal">GetQueuedCompletionStatus()</tt> read the status, <tt class="docutils literal">HasOverlappedIoCompleted()</tt> still returns 0.</p> <p><strong>After hours of debug, I eventually found the main issue!</strong></p> <p>Sometimes <tt class="docutils literal">GetQueuedCompletionStatus()</tt> returns an overlapped operation which has not completed yet. I modified <tt class="docutils literal">IocpProactor._poll()</tt> to ignore the false alarm, <a class="reference external" href="https://github.com/python/cpython/commit/51e44ea66aefb4229e506263acf40d35596d279c">commit 51e44ea6</a>:</p> <pre class="literal-block"> commit 51e44ea66aefb4229e506263acf40d35596d279c Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Sat Jul 26 00:58:34 2014 +0200 _OverlappedFuture.set_result() now clears its reference to the overlapped object. IocpProactor._poll() now also ignores false alarms: GetQueuedCompletionStatus() returns the overlapped but it is still pending. </pre> <p>The fix adds this comment:</p> <pre class="literal-block"> # FIXME: why do we get false alarms? </pre> </div> <div class="section" id="keep-a-reference-of-overlapped"> <h3>Keep a reference of overlapped</h3> <p>To stabilize the code, I modified <tt class="docutils literal">ProactorIocp</tt> to keep a reference to the overlapped object (it already kept a reference previously but not in all cases). <strong>Otherwise the memory may be reused and GetQueuedCompletionStatus() may use random bytes and behaves badly</strong>. I pushed <a class="reference external" href="https://github.com/python/cpython/commit/42d3bdeed6e34117b787d61a471563a0dba6a894">commit 42d3bdee</a>:</p> <pre class="literal-block"> commit 42d3bdeed6e34117b787d61a471563a0dba6a894 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Mon Jul 28 00:18:43 2014 +0200 ProactorIocp._register() now registers the overlapped in the _cache dictionary, even if we already got the result. We need to keep a reference to the overlapped object, otherwise the memory may be reused and GetQueuedCompletionStatus() may use random bytes and behaves badly. There is still a hack for ConnectNamedPipe(): the overlapped object is not registered into _cache if the overlapped object completed directly. Log also an error in debug mode in ProactorIocp._loop() if we get an unexpected event. Add a protection in ProactorIocp.close() to avoid blocking, even if it should not happen. I still don't understand exactly why some the completion of some overlapped objects are not notified. </pre> <p>The change adds a long comment:</p> <pre class="literal-block"> # Even if GetOverlappedResult() was called, we have to wait for the # notification of the completion in GetQueuedCompletionStatus(). # Register the overlapped operation to keep a reference to the # OVERLAPPED object, otherwise the memory is freed and Windows may # read uninitialized memory. # # For an unknown reason, ConnectNamedPipe() behaves differently: # the completion is not notified by GetOverlappedResult() if we # already called GetOverlappedResult(). For this specific case, we # don't expect notification (register is set to False). </pre> <p>I pushed another change to attempt to stabilize the code, <a class="reference external" href="https://github.com/python/cpython/commit/313a9809043ed2ed1ad25282af7169e08cdc92a3">commit 313a9809</a>:</p> <pre class="literal-block"> commit 313a9809043ed2ed1ad25282af7169e08cdc92a3 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Tue Jul 29 12:58:23 2014 +0200 * _WaitHandleFuture.cancel() now notify IocpProactor through the overlapped object that the wait was cancelled. * Optimize IocpProactor.wait_for_handle() gets the result if the wait is signaled immediatly. (...) </pre> </div> <div class="section" id="asyncio-issue-196-closed"> <h3>asyncio issue #196 closed</h3> <p>The initial issue &quot;_OverlappedFuture.set_result() should clear its reference to the overlapped object&quot; has been fixed, so <strong>I closed this issue</strong>. I didn't know at this point that all bugs were not fixed yet...</p> <p>I also opened the new <a class="reference external" href="https://github.com/python/asyncio/issues/204">asyncio issue #204</a> to investigate <tt class="docutils literal">accept_pipe()</tt> special case. We will analyze this funny bug in another article.</p> </div> </div> <div class="section" id="bpo-23095-race-condition-when-cancelling-a-waithandlefuture"> <h2>bpo-23095: race condition when cancelling a _WaitHandleFuture</h2> <p>At December 21, 2014, five months after a long serie of changes to stabilize asyncio... <strong>asyncio was still crashing randomly on Windows</strong>! I created <a class="reference external" href="https://bugs.python.org/issue23095">bpo-23095: race condition when cancelling a _WaitHandleFuture</a>.</p> <p>On Windows using the IOCP (proactor) event loop, I noticed race conditions when running the test suite of Trollius (my old deprecated asyncio port to Python 2). For example, sometimes the return code of a process was <tt class="docutils literal">None</tt>, whereas this case <strong>must never happen</strong>. It looks like the <tt class="docutils literal">wait_for_handle()</tt> method doesn't behave properly.</p> <p>When I run the test suite of asyncio in debug mode (PYTHONASYNCIODEBUG=1), sometimes I see the message &quot;GetQueuedCompletionStatus() returned an unexpected event&quot; which <strong>should never occur neither</strong>.</p> <p>I added debug traces. I saw that the <tt class="docutils literal">IocpProactor.wait_for_handle()</tt> calls later <tt class="docutils literal">PostQueuedCompletionStatus()</tt> through its internal C callback (<tt class="docutils literal">PostToQueueCallback</tt>). It looks like <strong>sometimes the callback is called whereas the wait was cancelled/acked</strong> by <tt class="docutils literal">UnregisterWait()</tt>.</p> <p>... I didn't understand the logic between <tt class="docutils literal">RegisterWaitForSingleObject()</tt>, <tt class="docutils literal">UnregisterWait()</tt> and the callback ....</p> <p>It looks like sometimes the overlapped object created in Python (<tt class="docutils literal">ov = _overlapped.Overlapped(NULL)</tt>) is destroyed, before <tt class="docutils literal">PostToQueueCallback()</tt> is called. In the unit tests, <strong>it doesn't crash because a different overlapped object is created and it gets the same memory address</strong> (the memory allocator reuses a just freed memory block).</p> <p>The implementation of <tt class="docutils literal">wait_for_handle()</tt> had an optimization: it polls immediatly the wait to check if it already completed. I tried to remove it, but I got some different issues. If I understood correctly, <strong>this optimization hides other bugs and reduce the probability of getting the race condition</strong>.</p> <p><tt class="docutils literal">wait_for_handle()</tt> is used to wait for the completion of a subprocess, so by all unit tests running subprocesses, but also in <tt class="docutils literal">test_wait_for_handle()</tt> and <tt class="docutils literal">test_wait_for_handle_cancel()</tt> tests. I suspect that running <tt class="docutils literal">test_wait_for_handle()</tt> or <tt class="docutils literal">test_wait_for_handle_cancel()</tt> triggers the bug.</p> <p>Removing <tt class="docutils literal">_winapi.CloseHandle(self._iocp)</tt> in <tt class="docutils literal">IocpProactor.close()</tt> works around the bug. The bug looks to be an expected call to <tt class="docutils literal">PostToQueueCallback()</tt> which calls <tt class="docutils literal">PostQueuedCompletionStatus()</tt> on an IOCP. Not closing the IOCP means using a different IOCP for each test, so the unexpected call to <tt class="docutils literal">PostQueuedCompletionStatus()</tt> has no effect on following tests.</p> <p>I rewrote some parts of the IOCP code in asyncio. Maybe I introduced this issue during the refactoring. Maybe <strong>it already existed before but nobody noticed it, asyncio had fewer unit tests before</strong>.</p> </div> <div class="section" id="fixing-the-root-issue-overlapped-cancellation-from-hell"> <h2>Fixing the root issue: Overlapped Cancellation From Hell</h2> <p>I looked into Twisted implemented of proactor, but it didn't support subprocesses.</p> <p>I looked at libuv: it supported processes but not cancelling a wait on a process handle...</p> <p><strong>I had to ask for help to someone who had access to the Windows source code</strong> to understand the bug...</p> <p><strong>After six months of intense debugging, I eventually identified the root issue</strong> (I pushed the first fix at July 25, 2014). I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/d0a28dee78d099fcadc71147cba4affb6efa0c97">commit d0a28dee</a> (<a class="reference external" href="https://bugs.python.org/issue23095">bpo-23095</a>):</p> <pre class="literal-block"> commit d0a28dee78d099fcadc71147cba4affb6efa0c97 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Wed Jan 21 23:39:51 2015 +0100 Issue #23095, asyncio: Rewrite _WaitHandleFuture.cancel() </pre> <p>This change fixes a race conditon related to <tt class="docutils literal">_WaitHandleFuture.cancel()</tt> leading to a Python crash or &quot;GetQueuedCompletionStatus() returned an unexpected event&quot; logs. Previously, <strong>it was possible that the cancelled wait completes whereas the overlapped object was already destroyed</strong>. Sometimes, a different overlapped was allocated at the same address, emitting a log about unexpected completition (but no crash).</p> <p><tt class="docutils literal">_WaitHandleFuture.cancel()</tt> now <strong>waits until the handle wait is cancelled</strong> (until the cancellation completes) before clearing its reference to the overlapped object. To wait until the cancellation completes, <tt class="docutils literal">UnregisterWaitEx()</tt> is used with an event (instead of using <tt class="docutils literal">UnregisterWait()</tt>).</p> <p>To wait for this event, a new <tt class="docutils literal">_WaitCancelFuture</tt> class was added. It's a simplified version of <tt class="docutils literal">_WaitCancelFuture</tt>. For example, its <tt class="docutils literal">cancel()</tt> method calls <tt class="docutils literal">UnregisterWait()</tt>, not <tt class="docutils literal">UnregisterWaitEx()</tt>. <tt class="docutils literal">_WaitCancelFuture</tt> should not be cancelled.</p> <p>The overlapped object is <strong>kept alive</strong> in <tt class="docutils literal">_WaitHandleFuture</tt> <strong>until the wait is unregistered</strong>.</p> <p>Later, I pushed a few more changes to fix corner cases.</p> <p><a class="reference external" href="https://github.com/python/cpython/commit/1ca9392c7083972c1953c02e6f2cca54934ce0a6">commit 1ca9392c</a>:</p> <pre class="literal-block"> commit 1ca9392c7083972c1953c02e6f2cca54934ce0a6 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Thu Jan 22 00:17:54 2015 +0100 Issue #23095, asyncio: IocpProactor.close() must not cancel pending _WaitCancelFuture futures </pre> <p><a class="reference external" href="https://github.com/python/cpython/commit/752aba7f999b08c833979464a36840de8be0baf0">commit 752aba7f</a>:</p> <pre class="literal-block"> commit 752aba7f999b08c833979464a36840de8be0baf0 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Thu Jan 22 22:47:13 2015 +0100 asyncio: IocpProactor.close() doesn't cancel anymore futures which are already cancelled </pre> <p><a class="reference external" href="https://github.com/python/cpython/commit/24dfa3c1d6b21e731bd167a13153968bba8fa5ce">commit 24dfa3c1</a>:</p> <pre class="literal-block"> commit 24dfa3c1d6b21e731bd167a13153968bba8fa5ce Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Mon Jan 26 22:30:28 2015 +0100 Issue #23095, asyncio: Fix _WaitHandleFuture.cancel() If UnregisterWaitEx() fais with ERROR_IO_PENDING, it doesn't mean that the wait is unregistered yet. We still have to wait until the wait is cancelled. </pre> <p>I think that <em>this</em> issue can now be closed: <tt class="docutils literal">UnregisterWaitEx()</tt> really do what we need in asyncio.</p> <p>I don't like the complexity of the IocpProactor._unregister() method and of the _WaitCancelFuture class, but it looks that it's how we are supposed to wait until a wait for a handle is cancelled...</p> <p>Windows IOCP API is much more complex that what I expected. It's probably because some parts (especially <tt class="docutils literal">RegisterWaitForSingleObject()</tt>) are implemented with threads in user land, not in the kernel.</p> <p>In short, I'm very happy that have fixed this very complex but also very annoying IOCP bug in asyncio.</p> <p>I got a nice comment from <a class="reference external" href="https://bugs.python.org/issue23095#msg234453">Guido van Rossum</a>:</p> <blockquote> <strong>Congrats with the fix, and thanks for your perseverance!</strong></blockquote> </div> <div class="section" id="summary-of-the-race-condition"> <h2>Summary of the race condition</h2> <p>Events of the crashing unit test:</p> <ul class="simple"> <li>The loop (ProactorEventLoop) spawns a subprocess.</li> <li>The loop creates a _WaitHandleFuture object which creates an overlapped to wait until the process completes (call <tt class="docutils literal">RegisterWaitForSingleObject()</tt>): <strong>allocate</strong> memory for the overlapped.</li> <li>The wait future is cancelled (call <tt class="docutils literal">UnregisterWait()</tt>).</li> <li>The overlapped is destroyed: <strong>free</strong> overlapped memory.</li> <li>The overlapped completes: <strong>write</strong> into the overlapped memory.</li> </ul> <p>The main issue is the order of the two last events.</p> <p>Sometimes, the overlapped completed before the memory was freed: everything is fine.</p> <p>Sometimes, the overlapped completed after the memory was freed: Python crashed (segmentation fault).</p> <p>Sometimes, another _WaitHandleFuture was created in the meanwhile and created a second overlapped which was allocated at the same memory address than the freed memory of the previous overlapped. In this case, when the first overlapped completes, Python didn't crash but logged an unexpected completion message.</p> <p>Sometimes, the write was done in freed memory: the write didn't crash Python, but caused bugs which didn't make sense.</p> <p>There were even more cases causing even more surprising behaviors.</p> <p>Summary of the fix:</p> <ul class="simple"> <li>(... similar steps for the beginning ...)</li> <li>The wait future is cancelled: <strong>create an event</strong> to wait until the cancellation completes (call <tt class="docutils literal">UnregisterWaitEx()</tt>).</li> <li>Wait for the event.</li> <li>The event is signalled which means that the cancellation completed: <strong>write</strong> into the overlapped memory.</li> <li>The overlapped is destroyed: <strong>free</strong> overlapped memory.</li> </ul> </div> Locale Bugfixes in Python 32019-01-09T00:30:00+01:002019-01-09T00:30:00+01:00Victor Stinnertag:vstinner.github.io,2019-01-09:/locale-bugfixes-python3.html<a class="reference external image-reference" href="https://www.flickr.com/photos/svensson/40467591/"> <img alt="Unicode Mixed Bag" src="https://vstinner.github.io/images/unicode_bag.jpg" /> </a> <p>This article describes a few locales bugs that I fixed in Python 3 between 2012 (Python 3.3) and 2018 (Python 3.7):</p> <ul class="simple"> <li>Support non-ASCII decimal point and thousands separator</li> <li>Crash with non-ASCII decimal point</li> <li>LC_NUMERIC encoding different than LC_CTYPE encoding</li> <li>LC_MONETARY encoding different than LC_CTYPE encoding</li> <li>Tests non-ASCII locales …</li></ul><a class="reference external image-reference" href="https://www.flickr.com/photos/svensson/40467591/"> <img alt="Unicode Mixed Bag" src="https://vstinner.github.io/images/unicode_bag.jpg" /> </a> <p>This article describes a few locales bugs that I fixed in Python 3 between 2012 (Python 3.3) and 2018 (Python 3.7):</p> <ul class="simple"> <li>Support non-ASCII decimal point and thousands separator</li> <li>Crash with non-ASCII decimal point</li> <li>LC_NUMERIC encoding different than LC_CTYPE encoding</li> <li>LC_MONETARY encoding different than LC_CTYPE encoding</li> <li>Tests non-ASCII locales</li> </ul> <p>See also my previous locale bugfixes: <a class="reference external" href="https://vstinner.github.io/python3-locales-encodings.html">Python 3, locales and encodings</a></p> <div class="section" id="introduction"> <h2>Introduction</h2> <p>Each language and each country has different ways to represent dates, monetary values, numbers, etc. Unix has &quot;locales&quot; to configure applications for a specific language and a specific country. For example, there are <tt class="docutils literal">fr_BE</tt> for Belgium (french) and <tt class="docutils literal">fr_FR</tt> for France (french).</p> <p>In practice, each locale uses its own encoding and problems arise when an application uses a different encoding than the locale. There are LC_NUMERIC locale for numbers, LC_MONETARY locale for monetary and LC_CTYPE for the encoding. Not only it's possible to configure an application to use LC_NUMERIC with a different encoding than LC_CTYPE, but some users use such configuration!</p> <p>In an application which only uses bytes for text, as Python 2 does mostly, it's mostly fine: in the worst case, users see <a class="reference external" href="https://en.wikipedia.org/wiki/Mojibake">mojibake</a>, but the application doesn't &quot;crash&quot; (exit and/or data loss). On the other side, <strong>Python 3 is designed to use Unicode for text and fail with hard Unicode errors if it fails to decode bytes and fails to encode text</strong>.</p> </div> <div class="section" id="support-non-ascii-decimal-point-and-thousands-separator"> <h2>Support non-ASCII decimal point and thousands separator</h2> <p>The Unicode type has been reimplemented in Python 3.3 to use &quot;compact string&quot;: <a class="reference external" href="https://www.python.org/dev/peps/pep-0393/">PEP 393 &quot;Flexible String Representation&quot;</a>. The new implementation is more complex and the format() function has been limited to ASCII for the decimal point and thousands separator (format a number using the &quot;n&quot; type).</p> <p>In January 2012, Stefan Krah noticed the regression (compared to Python 3.2) and reported <a class="reference external" href="https://bugs.python.org/issue13706">bpo-13706</a>. I fixed the code to support non-ASCII in format (<a class="reference external" href="https://github.com/python/cpython/commit/a4ac600d6f9c5b74b97b99888b7cf3a7973cadc8">commit a4ac600d</a>). But when I did more tests, I noticed that the &quot;n&quot; type doesn't decode properly the decimal point and thousands seprator which come from the <tt class="docutils literal">localeconv()</tt> function which uses byte strings.</p> <p>I fixed <tt class="docutils literal">format(int, &quot;n&quot;)</tt> with <a class="reference external" href="https://github.com/python/cpython/commit/41a863cb81608c779d60b49e7be8a115816734fc">commit 41a863cb</a>, decode decimal point and the thousands separator (<tt class="docutils literal">localeconv()</tt> fields) from the locale encoding, rather than latin1, using <tt class="docutils literal">PyUnicode_DecodeLocale()</tt>:</p> <pre class="literal-block"> commit 41a863cb81608c779d60b49e7be8a115816734fc Author: Victor Stinner &lt;victor.stinner&#64;haypocalc.com&gt; Date: Fri Feb 24 00:37:51 2012 +0100 Issue #13706: Fix format(int, &quot;n&quot;) for locale with non-ASCII thousands separator * Decode thousands separator and decimal point using PyUnicode_DecodeLocale() (from the locale encoding), instead of decoding them implicitly from latin1 * Remove _PyUnicode_InsertThousandsGroupingLocale(), it was not used * Change _PyUnicode_InsertThousandsGrouping() API to return the maximum character if unicode is NULL * (...) </pre> <p>Note: I decided to not fix Python 3.2:</p> <blockquote> Hum, <strong>it is not trivial to redo the work on Python 3.2</strong>. I prefer to leave the code unchanged to not introduce a regression, and I wait until a Python 3.2 user complains (the bug exists since Python 3.0 and nobody complained).</blockquote> </div> <div class="section" id="crash-with-non-ascii-decimal-point"> <h2>Crash with non-ASCII decimal point</h2> <p>Six years later, in June 2018, I noticed that Python does crash when running tests on locales:</p> <pre class="literal-block"> $ ./python Python 3.8.0a0 (heads/master-dirty:bcd3a1a18d, Jun 23 2018, 10:31:03) [GCC 8.1.1 20180502 (Red Hat 8.1.1-1)] on linux &gt;&gt;&gt; import locale &gt;&gt;&gt; locale.str(2.5) '2.5' &gt;&gt;&gt; '{:n}'.format(2.5) '2.5' &gt;&gt;&gt; locale.setlocale(locale.LC_ALL, '') 'fr_FR.UTF-8' &gt;&gt;&gt; locale.str(2.5) '2,5' &gt;&gt;&gt; '{:n}'.format(2.5) python: Objects/unicodeobject.c:474: _PyUnicode_CheckConsistency: Assertion `maxchar &lt; 128' failed. Aborted (core dumped) </pre> <p>I reported the issue as <a class="reference external" href="https://bugs.python.org/issue33954">bpo-33954</a>. The bug only occurrs for decimal point larger than U+00FF (code point greater than 255). It was a bug in my <a class="reference external" href="https://bugs.python.org/issue13706">bpo-13706</a> fix: <a class="reference external" href="https://github.com/python/cpython/commit/a4ac600d6f9c5b74b97b99888b7cf3a7973cadc8">commit a4ac600d</a>.</p> <p>I pushed a second fix to properly support all cases, <a class="reference external" href="https://github.com/python/cpython/commit/59423e3ddd736387cef8f7632c71954c1859bed0">commit 59423e3d</a>:</p> <pre class="literal-block"> commit 59423e3ddd736387cef8f7632c71954c1859bed0 Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Mon Nov 26 13:40:01 2018 +0100 bpo-33954: Fix _PyUnicode_InsertThousandsGrouping() (GH-10623) Fix str.format(), float.__format__() and complex.__format__() methods for non-ASCII decimal point when using the &quot;n&quot; formatter. Changes: * Rewrite _PyUnicode_InsertThousandsGrouping(): it now requires a _PyUnicodeWriter object for the buffer and a Python str object for digits. * Rename FILL() macro to unicode_fill(), convert it to static inline function, add &quot;assert(0 &lt;= start);&quot; and rework its code. </pre> </div> <div class="section" id="lc-numeric-encoding-different-than-lc-ctype-encoding"> <h2>LC_NUMERIC encoding different than LC_CTYPE encoding</h2> <p>In August 2017, Petr Viktorin identified a bug in Koji (server building Fedora packages): <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1484497">UnicodeDecodeError in localeconv() makes test_float fail in Koji</a></p> <blockquote> &quot;This is tripped by Python's test suite, namely test_float.GeneralFloatCases.test_float_with_comma&quot;</blockquote> <p>He wrote a short reproducer script:</p> <pre class="literal-block"> import locale locale.setlocale(locale.LC_ALL, 'C.UTF-8') locale.setlocale(locale.LC_NUMERIC, 'fr_FR.ISO8859-1') print(locale.localeconv()) </pre> <p>Two months later, Charalampos Stratakis reported the bug upstream: <a class="reference external" href="https://bugs.python.org/issue31900">bpo-31900</a>. The problem arises when <strong>the LC_NUMERIC locale uses a different encoding than the LC_CTYPE encoding</strong>.</p> <p>The bug was already known:</p> <ul class="simple"> <li>2015-12-05: Serhiy Storchaka reported <a class="reference external" href="https://bugs.python.org/issue25812">bpo-25812</a> with uk_UA locale</li> <li>2016-11-03: Guillaume Pasquet reported <a class="reference external" href="https://bugs.python.org/issue28604">bpo-28604</a> with en_GB locale</li> </ul> <p>Moreover, <strong>the bug was known since 2009</strong>, Stefan Krah reported a very similar bug: <a class="reference external" href="https://bugs.python.org/issue7442">bpo-7442</a>. I was even involved in this issue in 2013, but then I forgot about it (as usual, I am working on too many issues in parallel :-)).</p> <p>In 2010, PostgreSQL <a class="reference external" href="https://www.postgresql.org/message-id/20100422015552.4B7E07541D0&#64;cvs.postgresql.org">had the same issue</a> and <a class="reference external" href="https://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/pg_locale.c?r1=1.53&amp;r2=1.54">fixed the bug by changing temporarily the LC_CTYPE locale to the LC_NUMERIC locale</a>.</p> <p>In January 2018, I came back to this 9 years old bug. I was fixing bugs in the implementation of my <a class="reference external" href="https://www.python.org/dev/peps/pep-0540/">PEP 540 &quot;Add a new UTF-8 Mode&quot;</a>. I pushed a large change to fix locale encodings in <a class="reference external" href="https://bugs.python.org/issue29240">bpo-29240</a>, <a class="reference external" href="https://github.com/python/cpython/commit/7ed7aead9503102d2ed316175f198104e0cd674c">commit 7ed7aead</a>:</p> <pre class="literal-block"> commit 7ed7aead9503102d2ed316175f198104e0cd674c Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Mon Jan 15 10:45:49 2018 +0100 bpo-29240: Fix locale encodings in UTF-8 Mode (#5170) Modify locale.localeconv(), time.tzname, os.strerror() and other functions to ignore the UTF-8 Mode: always use the current locale encoding. Changes: (...) </pre> <p>Stefan Krah asked:</p> <blockquote> I have the exact same questions as Marc-Andre. This is one of the reasons why I blocked the _decimal change. I don't fully understand the role of the new glibc, since #7442 has existed for ages -- and <strong>it is a open question whether it is a bug or not</strong>.</blockquote> <p>I replied:</p> <blockquote> <p>Past 10 years, I repeated to every single user I met that &quot;Python 3 is right, your system setup is wrong&quot;. But that's a waste of time. People continue to associate Python3 and Unicode to annoying bugs, because they don't understand how locales work.</p> <p>Instead of having to repeat to each user that &quot;hum, maybe your config is wrong&quot;, <strong>I prefer to support this non convential setup and work as expected (&quot;it just works&quot;)</strong>. With my latest implementation, setlocale() is only done when LC_CTYPE and LC_NUMERIC are different, which is the corner case which &quot;shouldn't occur in practice&quot;.</p> </blockquote> <p>Marc-Andre Lemburg added:</p> <blockquote> Sounds like a good compromise :-)</blockquote> <p>After doing more tests on FreeBSD, Linux and macOS, I pushed <a class="reference external" href="https://github.com/python/cpython/commit/cb064fc2321ce8673fe365e9ef60445a27657f54">commit cb064fc2</a> to fix <a class="reference external" href="https://bugs.python.org/issue31900">bpo-31900</a> by changing temporarily the LC_CTYPE locale to the LC_NUMERIC locale:</p> <pre class="literal-block"> commit cb064fc2321ce8673fe365e9ef60445a27657f54 Author: Victor Stinner &lt;victor.stinner&#64;gmail.com&gt; Date: Mon Jan 15 15:58:02 2018 +0100 bpo-31900: Fix localeconv() encoding for LC_NUMERIC (#4174) * Add _Py_GetLocaleconvNumeric() function: decode decimal_point and thousands_sep fields of localeconv() from the LC_NUMERIC encoding, rather than decoding from the LC_CTYPE encoding. * Modify locale.localeconv() and &quot;n&quot; formatter of str.format() (for int, float and complex to use _Py_GetLocaleconvNumeric() internally. </pre> <p>I dislike my own fix because changing temporarily the LC_CTYPE locale impacts all threads, not only the current thread. But we failed to find another solution. <strong>The LC_CTYPE locale is only changed if the LC_NUMERIC locale is different than the LC_CTYPE locale and if the decimal point or the thousands separator is non-ASCII.</strong></p> <p>Note: I proposed a change to fix the same bug in the <tt class="docutils literal">decimal</tt> module: <a class="reference external" href="https://github.com/python/cpython/pull/5191">PR #5191</a>, but I abandonned my patch.</p> </div> <div class="section" id="lc-monetary-encoding-different-than-lc-ctype-encoding"> <h2>LC_MONETARY encoding different than LC_CTYPE encoding</h2> <p>Fixing <a class="reference external" href="https://bugs.python.org/issue31900">bpo-31900</a> drained all my energy, but sadly... there was a similar bug with LC_MONETARY!</p> <p>At 2016-11-03, Guillaume Pasquet reported <a class="reference external" href="https://bugs.python.org/issue28604">bpo-28604: Exception raised by python3.5 when using en_GB locale</a>.</p> <p>The fix is similar to the LC_NUMERIC fix: change temporarily the LC_CTYPE locale to the LC_MONETARY locale, <a class="reference external" href="https://github.com/python/cpython/commit/02e6bf7f2025cddcbde6432f6b6396198ab313f4">commit 02e6bf7f</a>:</p> <pre class="literal-block"> commit 02e6bf7f2025cddcbde6432f6b6396198ab313f4 Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Tue Nov 20 16:20:16 2018 +0100 bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606) locale.localeconv() now sets temporarily the LC_CTYPE locale to the LC_MONETARY locale if the two locales are different and monetary strings are non-ASCII. This temporary change affects other threads. Changes: * locale.localeconv() can now set LC_CTYPE to LC_MONETARY to decode monetary fields. * (...) </pre> </div> <div class="section" id="tests-non-ascii-locales"> <h2>Tests non-ASCII locales</h2> <p>To test my bugfixes, I used manual tests. The first issue was to identify locales with problematic characters: non-ASCII decimal point or thousands separator for example. I wrote my own &quot;test suite&quot; for Windows, Linux, macOS and FreeBSD on my website: <a class="reference external" href="https://vstinner.readthedocs.io/unicode.html#test-non-ascii-characters-with-locales">Test non-ASCII characters with locales</a>.</p> <p>Example with localeconv() on Fedora 27:</p> <table border="1" class="docutils"> <colgroup> <col width="15%" /> <col width="8%" /> <col width="16%" /> <col width="25%" /> <col width="36%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">LC_ALL locale</th> <th class="head">Encoding</th> <th class="head">Field</th> <th class="head">Bytes</th> <th class="head">Text</th> </tr> </thead> <tbody valign="top"> <tr><td>es_MX.utf8</td> <td>UTF-8</td> <td>thousands_sep</td> <td><tt class="docutils literal">0xE2 0x80 0x89</tt></td> <td>U+2009</td> </tr> <tr><td>fr_FR.UTF-8</td> <td>UTF-8</td> <td>currency_symbol</td> <td><tt class="docutils literal">0xE2 0x82 0xAC</tt></td> <td>U+20AC (€)</td> </tr> <tr><td>ps_AF.utf8</td> <td>UTF-8</td> <td>thousands_sep</td> <td><tt class="docutils literal">0xD9 0xAC</tt></td> <td>U+066C (٬)</td> </tr> <tr><td>uk_UA.koi8u</td> <td>KOI8-U</td> <td>currency_symbol</td> <td><tt class="docutils literal">0xC7 0xD2 0xCE 0x2E</tt></td> <td>U+0433 U+0440 U+043d U+002E (грн.)</td> </tr> <tr><td>uk_UA.koi8u</td> <td>KOI8-U</td> <td>thousands_sep</td> <td><tt class="docutils literal">0x9A</tt></td> <td>U+00A0</td> </tr> </tbody> </table> <p>Manual tests became more and more complex, since there are so many cases: each operating system use different locale names and the result depends on the libc version. After months of manual tests, I wrote my small personal <strong>portable</strong> locale test suite: <a class="reference external" href="https://github.com/vstinner/misc/blob/master/python/test_all_locales.py">test_all_locales.py</a>. It supports:</p> <ul class="simple"> <li>FreeBSD 11</li> <li>macOS</li> <li>Fedora (Linux)</li> </ul> <p>Example:</p> <pre class="literal-block"> def test_zh_TW_Big5(self): loc = &quot;zh_TW.Big5&quot; if BSD else &quot;zh_TW.big5&quot; if FREEBSD: currency_symbol = u'\uff2e\uff34\uff04' decimal_point = u'\uff0e' thousands_sep = u'\uff0c' date_str = u'\u661f\u671f\u56db 2\u6708' else: currency_symbol = u'NT$' decimal_point = u'.' thousands_sep = u',' if MACOS: date_str = u'\u9031\u56db 2\u6708' else: date_str = u'\u9031\u56db \u4e8c\u6708' self.set_locale(loc, &quot;Big5&quot;) lc = locale.localeconv() self.assertLocaleEqual(lc['currency_symbol'], currency_symbol) self.assertLocaleEqual(lc['decimal_point'], decimal_point) self.assertLocaleEqual(lc['thousands_sep'], thousands_sep) self.assertLocaleEqual(time.strftime('%A %B', FEBRUARY), date_str) </pre> <p>The best would be to integrate directly these tests into the Python test suite, but it's not portable nor future-proof, since most constants are hardcoded and depends on the operating sytem and the libc version.</p> </div> Python 3, locales and encodings2018-09-06T16:00:00+02:002018-09-06T16:00:00+02:00Victor Stinnertag:vstinner.github.io,2018-09-06:/python3-locales-encodings.html<img alt="I □ Unicode" src="https://vstinner.github.io/images/i-square-unicode.jpg" /> <p>Recently, I worked on a change which looked simple: move the code to initialize the <tt class="docutils literal">sys.stdout</tt> encoding before <tt class="docutils literal">Py_Initialize()</tt>. While I was on it, I also decided to move the code which selects the Python &quot;filesystem encoding&quot;. I didn't expect that I would spend 2 weeks on these issues …</p><img alt="I □ Unicode" src="https://vstinner.github.io/images/i-square-unicode.jpg" /> <p>Recently, I worked on a change which looked simple: move the code to initialize the <tt class="docutils literal">sys.stdout</tt> encoding before <tt class="docutils literal">Py_Initialize()</tt>. While I was on it, I also decided to move the code which selects the Python &quot;filesystem encoding&quot;. I didn't expect that I would spend 2 weeks on these issues... This article tells me about my recent journey in locales and encodings on AIX, HP-UX, Windows, Linux, macOS, Solaris and FreeBSD.</p> <p>Table of Contents:</p> <ul class="simple"> <li>Lying HP-UX</li> <li>Standard streams and filesystem encodings</li> <li>POSIX locale on FreeBSD</li> <li>C locale on Windows</li> <li>Back to stdio encoding</li> <li>Back to filesystem encoding</li> <li>Use surrogatepass on Windows</li> <li>Filesystem encoding documentation</li> <li>Final FreeBSD 10 issue</li> <li>Configuration of locales and encodings</li> </ul> <div class="section" id="lying-hp-ux"> <h2>Lying HP-UX</h2> <p>At 2018-08-14, Michael Osipov reported <a class="reference external" href="https://bugs.python.org/issue34403">bpo-34403</a>: &quot;test_utf8_mode.test_cmd_line() fails on HP-UX due to false assumptions&quot;:</p> <pre class="literal-block"> ====================================================================== FAIL: test_cmd_line (test.test_utf8_mode.UTF8ModeTests) ---------------------------------------------------------------------- Traceback (most recent call last): (...) AssertionError: &quot;['h\\xc3\\xa9\\xe2\\x82\\xac']&quot; != &quot;['h\\udcc3\\udca9\\udce2\\udc82\\udcac']&quot; - ['h\xc3\xa9\xe2\x82\xac'] + ['h\udcc3\udca9\udce2\udc82\udcac'] : roman8:['h\xc3\xa9\xe2\x82\xac'] </pre> <p>Interesting, HP-UX uses &quot;roman8&quot; as its locale encoding. What is this &quot;new&quot; encoding? Wikipedia: <a class="reference external" href="https://en.wikipedia.org/wiki/HP_Roman#Roman-8">HP Roman-8</a>. Oh, that's even older than the common ISO 8859 encodings like Latin1!</p> <p>Michael Felt was working on a similar test_utf8_mode failure on AIX, so they tried to debug the issue together, but failed to understand the issue. Osipov proposed to give up and just skip the test on HP-UX...</p> <p>I showed up and proposed a fix for the unit test: <a class="reference external" href="https://github.com/python/cpython/pull/8967/files">PR 8967</a>. The test was hardcoding the expected locale encoding. I modified the test to query the locale encoding at runtime instead.</p> <p>Bad surprise, the test still fails, oh. <a class="reference external" href="https://bugs.python.org/issue34403#msg324219">I commented</a>:</p> <blockquote> Hum, it looks like a bug in the C library of HP-UX.</blockquote> <p>I wrote a C program calling mbstowcs() to check what is the actual encoding used by the C library: <a class="reference external" href="https://bugs.python.org/file47767/c_locale.c">c_locale.c</a>. <a class="reference external" href="https://bugs.python.org/issue34403#msg324225">Result</a>:</p> <blockquote> Well, it confirms what I expected: <tt class="docutils literal">nl_langinfo(CODESET)</tt> announces <tt class="docutils literal">&quot;roman8&quot;</tt>, but <tt class="docutils literal">mbstowcs()</tt> uses Latin1 encoding in practice.</blockquote> <p>So I wrote a workaround similar to the one used on FreeBSD and Solaris: check if the libc is announcing an encoding different than the real encoding, and if it's the case: force the usage of the ASCII encoding in Python. See my <a class="reference external" href="https://github.com/python/cpython/commit/d500e5307aec9c5d535f66d567fadb9c587a9a36">commit d500e530</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Tue Aug 28 17:27:36 2018 +0200 bpo-34403: On HP-UX, force ASCII for C locale (GH-8969) On HP-UX with C or POSIX locale, sys.getfilesystemencoding() now returns &quot;ascii&quot; instead of &quot;roman8&quot; (when the UTF-8 Mode is disabled and the C locale is not coerced). nl_langinfo(CODESET) announces &quot;roman8&quot; whereas it uses the Latin1 encoding in practice. </pre> <p>Extract of the heuristic code:</p> <pre class="literal-block"> if (strcmp(encoding, &quot;roman8&quot;) == 0) { unsigned char ch = (unsigned char)0xA7; wchar_t wch; size_t res = mbstowcs(&amp;wch, (char*)&amp;ch, 1); if (res != (size_t)-1 &amp;&amp; wch == L'\xA7') { /* On HP-UX withe C locale or the POSIX locale, nl_langinfo(CODESET) announces &quot;roman8&quot;, whereas mbstowcs() uses Latin1 encoding in practice. Force ASCII in this case. Roman8 decodes 0xA7 to U+00CF. Latin1 decodes 0xA7 to U+00A7. */ return 1; } } </pre> <p>Python 3.8 will handle better Unicode support on HP-UX. The test_utf8_mode failure was just a hint for a real underlying bug!</p> </div> <div class="section" id="standard-streams-and-filesystem-encodings"> <h2>Standard streams and filesystem encodings</h2> <p>While reworking the Python initialization, I tried to move <strong>all</strong> configuration parameters to a new <tt class="docutils literal">_PyCoreConfig</tt> structure. But I know that I missed at least the standard streams encoding (ex: <tt class="docutils literal">sys.stdout.encoding</tt>). My first attempt failed to move the code, it broke many tests. I created <a class="reference external" href="https://bugs.python.org/issue34485">bpo-34485</a>: &quot;_PyCoreConfig: add stdio_encoding and stdio_errors&quot;.</p> <p>While I was working on stdio encoding, I also recalled that the Python filesystem encoding is also initialized &quot;late&quot;. I also created <a class="reference external" href="https://bugs.python.org/issue34523">bpo-34523</a>: &quot;Choose the filesystem encoding before Python initialization (add _PyCoreConfig.filesystem_encoding)&quot; to move this code as well.</p> <p>I quickly had an implementation, but it didn't go as well as expected...</p> </div> <div class="section" id="posix-locale-on-freebsd"> <h2>POSIX locale on FreeBSD</h2> <p><a class="reference external" href="https://bugs.python.org/issue34485">bpo-34485</a>: For me, the &quot;C&quot; and &quot;POSIX&quot; locales were the same locale: C is an alias to POSIX, or the opposite, it didn't really matter for me. But Python handles them differently in some corner cases. For example, Nick Coghlan's PEP 538 (C locale coercion) is only enabled if the LC_CTYPE locale is equal to &quot;C&quot;, not if it's equal to &quot;POSIX&quot;.</p> <p>In Python 3.5, I changed stdin and stdout error handlers from strict to surrogateescape if the LC_CTYPE locale is &quot;C&quot;: <a class="reference external" href="https://bugs.python.org/issue19977">bpo-19977</a>. But when I tested my stdio and filesystem changes on Linux, FreeBSD and Windows, I noticed that I forgot to handle the &quot;POSIX&quot; locale. On FreeBSD, <tt class="docutils literal">LC_ALL=POSIX</tt> and <tt class="docutils literal">LC_ALL=C</tt> behave differently:</p> <ul class="simple"> <li>With <tt class="docutils literal">LC_ALL=POSIX</tt> environment, <tt class="docutils literal">setlocale(LC_CTYPE, &quot;&quot;)</tt> returns <tt class="docutils literal">&quot;POSIX&quot;</tt></li> <li>With <tt class="docutils literal">LC_ALL=C</tt> environment, <tt class="docutils literal">setlocale(LC_CTYPE, &quot;&quot;)</tt> returns <tt class="docutils literal">&quot;C&quot;</tt></li> </ul> <p>I fixed that to also use the &quot;surrogateescape&quot; error handler for the POSIX locale on FreeBSD. <a class="reference external" href="https://github.com/python/cpython/commit/315877dc361d554bec34b4b62c270479ad36a1be">Commit 315877dc</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Wed Aug 29 09:58:12 2018 +0200 bpo-34485: stdout uses surrogateescape on POSIX locale (GH-8986) Standard streams like sys.stdout now use the &quot;surrogateescape&quot; error handler, instead of &quot;strict&quot;, on the POSIX locale (when the C locale is not coerced and the UTF-8 Mode is disabled). Add tests on sys.stdout.errors with LC_ALL=POSIX. </pre> <p>The most important change is just one line:</p> <pre class="literal-block"> - if (strcmp(ctype_loc, &quot;C&quot;) == 0) { + if (strcmp(ctype_loc, &quot;C&quot;) == 0 || strcmp(ctype_loc, &quot;POSIX&quot;) == 0) { return &quot;surrogateescape&quot;; } </pre> <p><a class="reference external" href="https://bugs.python.org/issue34527">bpo-34527</a>: Since I was testing various configurations, I also noticed that my UTF-8 Mode (PEP 540) had the same bug. Python 3.7 enables it if the LC_CTYPE locale is equal to &quot;C&quot;, but not if it's equal to &quot;POSIX&quot;. I also changed that (<a class="reference external" href="https://github.com/python/cpython/commit/5cb258950ce9b69b1f65646431c464c0c17b1510">commit 5cb25895</a>).</p> </div> <div class="section" id="c-locale-on-windows"> <h2>C locale on Windows</h2> <p>While testing my changes on Windows, I noticed that Python starts with the LC_CTYPE locale equal to &quot;C&quot;, whereas <tt class="docutils literal">locale.setlocale(locale.LC_CTYPE, &quot;&quot;)</tt> changes the LC_CTYPE locale to something like <tt class="docutils literal">English_United States.1252</tt> (English with the code page 1252). Example with Python 3.6:</p> <pre class="literal-block"> C:\&gt; python Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32 &gt;&gt;&gt; import locale &gt;&gt;&gt; locale.setlocale(locale.LC_CTYPE, None) 'C' &gt;&gt;&gt; locale.setlocale(locale.LC_CTYPE, &quot;&quot;) 'English_United States.1252' &gt;&gt;&gt; locale.setlocale(locale.LC_CTYPE, None) 'English_United States.1252' </pre> <p>On UNIX, Python 2 starts with the default C locale, whereas Python 3 always sets the LC_CTYPE locale to my preference. Example on Fedora 28 with <tt class="docutils literal"><span class="pre">LANG=fr_FR.UTF-8</span></tt>:</p> <pre class="literal-block"> $ python2 -c 'import locale; print(locale.setlocale(locale.LC_CTYPE, None))' C $ python3 -c 'import locale; print(locale.setlocale(locale.LC_CTYPE, None))' fr_FR.UTF-8 </pre> <p>I modified Windows to behave as UNIX, <a class="reference external" href="https://github.com/python/cpython/commit/177d921c8c03d30daa32994362023f777624b10d">commit 177d921c</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Wed Aug 29 11:25:15 2018 +0200 bpo-34485, Windows: LC_CTYPE set to user preference (GH-8988) On Windows, the LC_CTYPE is now set to the user preferred locale at startup: _Py_SetLocaleFromEnv(LC_CTYPE) is now called during the Python initialization. Previously, the LC_CTYPE locale was &quot;C&quot; at startup, but changed when calling setlocale(LC_CTYPE, &quot;&quot;) or setlocale(LC_ALL, &quot;&quot;). pymain_read_conf() now also calls _Py_SetLocaleFromEnv(LC_CTYPE) to behave as _Py_InitializeCore(). Moreover, it doesn't save/restore the LC_ALL anymore. On Windows, standard streams like sys.stdout now always use surrogateescape error handler by default (ignore the locale). </pre> <p>Example:</p> <pre class="literal-block"> C:\&gt; python3.6 -c &quot;import locale; print(locale.setlocale(locale.LC_CTYPE, None))&quot; C C:\&gt; python3.8 -c &quot;import locale; print(locale.setlocale(locale.LC_CTYPE, None))&quot; English_United States.1252 </pre> <p>On Windows, Python 3.8 now starts with the LC_CTYPE locale set to my preference, as it was already previously done on UNIX.</p> </div> <div class="section" id="back-to-stdio-encoding"> <h2>Back to stdio encoding</h2> <p>After all previous changes and fixes, I was able to push my <a class="reference external" href="https://github.com/python/cpython/commit/dfe0dc74536dfb6f331131d9b2b49557675bb6b7">commit dfe0dc74</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Wed Aug 29 11:47:29 2018 +0200 bpo-34485: Add _PyCoreConfig.stdio_encoding (GH-8881) * Add stdio_encoding and stdio_errors fields to _PyCoreConfig. * Add unit tests on stdio_encoding and stdio_errors. </pre> </div> <div class="section" id="back-to-filesystem-encoding"> <h2>Back to filesystem encoding</h2> <p><a class="reference external" href="https://github.com/python/cpython/commit/b2457efc78b74a1d6d1b77d11a939e886b8a4e2c">Commit b2457efc</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Wed Aug 29 13:25:36 2018 +0200 bpo-34523: Add _PyCoreConfig.filesystem_encoding (GH-8963) _PyCoreConfig_Read() is now responsible to choose the filesystem encoding and error handler. Using Py_Main(), the encoding is now chosen even before calling Py_Initialize(). _PyCoreConfig.filesystem_encoding is now the reference, instead of Py_FileSystemDefaultEncoding, for the Python filesystem encoding. Changes: * Add filesystem_encoding and filesystem_errors to _PyCoreConfig * _PyCoreConfig_Read() now reads the locale encoding for the file system encoding. * PyUnicode_EncodeFSDefault() and PyUnicode_DecodeFSDefaultAndSize() now use the interpreter configuration rather than Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors global configuration variables. * Add _Py_SetFileSystemEncoding() and _Py_ClearFileSystemEncoding() private functions to only modify Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors in coreconfig.c. * _Py_CoerceLegacyLocale() now takes an int rather than _PyCoreConfig for the warning. </pre> </div> <div class="section" id="use-surrogatepass-on-windows"> <h2>Use surrogatepass on Windows</h2> <p>While working on the filesystem encoding change, I had a bug in _freeze_importlib.exe which failed at startup:</p> <pre class="literal-block"> ValueError: only 'strict' and 'surrogateescape' error handlers are supported, not 'surrogatepass' </pre> <p>I used the following workaround in <tt class="docutils literal">_freeze_importlib.c</tt>:</p> <pre class="literal-block"> #ifdef MS_WINDOWS /* bpo-34523: initfsencoding() is not called if _install_importlib=0, so interp-&gt;fscodec_initialized value remains 0. PyUnicode_EncodeFSDefault() doesn't support the &quot;surrogatepass&quot; error handler in such case, whereas it's the default error handler on Windows. Force the &quot;strict&quot; error handler to work around this bootstrap issue. */ config.filesystem_errors = &quot;strict&quot;; #endif </pre> <p>But I wasn't fully happy with the workaround. When running more manual tests, I found that the <tt class="docutils literal">PYTHONLEGACYWINDOWSFSENCODING</tt> environment variable wasn't handled properly. I pushed a first fix, <a class="reference external" href="https://github.com/python/cpython/commit/c5989cd87659acbfd4d19dc00dbe99c3a0fc9bd2">commit c5989cd8</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Wed Aug 29 19:32:47 2018 +0200 bpo-34523: Py_DecodeLocale() use UTF-8 on Windows (GH-8998) Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding on Windows if Py_LegacyWindowsFSEncodingFlag is zero. pymain_read_conf() now sets Py_LegacyWindowsFSEncodingFlag in its loop, but restore its value at exit. </pre> <p>My intent was to be able to use the <tt class="docutils literal">surrogatepass</tt> error handler. If <tt class="docutils literal">Py_DecodeLocale()</tt> is hardcoded to use UTF-8 on Windows, we should get access to the <tt class="docutils literal">surrogatepass</tt> error handler. Previously, <tt class="docutils literal">mbstowcs()</tt> function was used and this function only support <tt class="docutils literal">strict</tt> or <tt class="docutils literal">surrogateescape</tt> error handlers.</p> <p>I pushed a second big change to add support for the <tt class="docutils literal">surrogatepass</tt> error handler in locale codecs, <a class="reference external" href="https://github.com/python/cpython/commit/3d4226a832cabc630402589cc671cc4035d504e5">commit 3d4226a8</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Wed Aug 29 22:21:32 2018 +0200 bpo-34523: Support surrogatepass in locale codecs (GH-8995) Add support for the &quot;surrogatepass&quot; error handler in PyUnicode_DecodeFSDefault() and PyUnicode_EncodeFSDefault() for the UTF-8 encoding. Changes: * _Py_DecodeUTF8Ex() and _Py_EncodeUTF8Ex() now support the surrogatepass error handler (_Py_ERROR_SURROGATEPASS). * _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() now use the _Py_error_handler enum instead of &quot;int surrogateescape&quot; to pass the error handler. These functions now return -3 if the error handler is unknown. * Add unit tests on _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx() in test_codecs. * Rename get_error_handler() to _Py_GetErrorHandler() and expose it as a private function. * _freeze_importlib doesn't need config.filesystem_errors=&quot;strict&quot; workaround anymore. </pre> <p><tt class="docutils literal">PyUnicode_DecodeFSDefault()</tt> and <tt class="docutils literal">PyUnicode_EncodeFSDefault()</tt> functions use <tt class="docutils literal">Py_DecodeLocale()</tt> and <tt class="docutils literal">Py_EncodeLocale()</tt> before the Python codec of the filesystem encoding is loaded. With this big change, <tt class="docutils literal">Py_DecodeLocale()</tt> and <tt class="docutils literal">Py_EncodeLocale()</tt> now really behave as the Python codec.</p> <p>Previously, Python started with the <tt class="docutils literal">surrogateescape</tt> error handler, and switched to the <tt class="docutils literal">surrogatepass</tt> error handler once the Python codec was loaded.</p> </div> <div class="section" id="filesystem-encoding-documentation"> <h2>Filesystem encoding documentation</h2> <p>One &quot;last&quot; change, I documented how Python selects the filesystem encoding, <a class="reference external" href="https://github.com/python/cpython/commit/de427556746aa41a8b5198924ce423021bc0c718">commit de427556</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Wed Aug 29 23:26:55 2018 +0200 bpo-34523: Py_FileSystemDefaultEncoding NULL by default (GH-9003) * Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors default value is now NULL: initfsencoding() set them during Python initialization. * Document how Python chooses the filesystem encoding and error handler. * Add an assertion to _PyCoreConfig_Read(). </pre> <p>Documentation:</p> <pre class="literal-block"> /* Python filesystem encoding and error handler: sys.getfilesystemencoding() and sys.getfilesystemencodeerrors(). Default encoding and error handler: * if Py_SetStandardStreamEncoding() has been called: they have the highest priority; * PYTHONIOENCODING environment variable; * The UTF-8 Mode uses UTF-8/surrogateescape; * locale encoding: ANSI code page on Windows, UTF-8 on Android, LC_CTYPE locale encoding on other platforms; * On Windows, &quot;surrogateescape&quot; error handler; * &quot;surrogateescape&quot; error handler if the LC_CTYPE locale is &quot;C&quot; or &quot;POSIX&quot;; * &quot;surrogateescape&quot; error handler if the LC_CTYPE locale has been coerced (PEP 538); * &quot;strict&quot; error handler. Supported error handlers: &quot;strict&quot;, &quot;surrogateescape&quot; and &quot;surrogatepass&quot;. The surrogatepass error handler is only supported if Py_DecodeLocale() and Py_EncodeLocale() use directly the UTF-8 codec; it's only used on Windows. initfsencoding() updates the encoding to the Python codec name. For example, &quot;ANSI_X3.4-1968&quot; is replaced with &quot;ascii&quot;. On Windows, sys._enablelegacywindowsfsencoding() sets the encoding/errors to mbcs/replace at runtime. See Py_FileSystemDefaultEncoding and Py_FileSystemDefaultEncodeErrors. */ char *filesystem_encoding; char *filesystem_errors; </pre> </div> <div class="section" id="final-freebsd-10-issue"> <h2>Final FreeBSD 10 issue</h2> <p><a class="reference external" href="https://bugs.python.org/issue34544">bpo-34544</a>: The stdio and filesystem encodings are now properly selected before Py_Initialize(), the LC_CTYPE locale should be properly initialized, the &quot;POSIX&quot; locale is now properly handled, but the FreeBSD 10 buildbot still complained about my recent changes... Many <tt class="docutils literal">test_c_locale_coerce</tt> tests started to fail with:</p> <blockquote> Fatal Python error: get_locale_encoding: failed to get the locale encoding: nl_langinfo(CODESET) failed</blockquote> <p>Sadly, I wasn't able to reproduce the issue on my FreeBSD 11 VM. I also got access to the FreeBSD CURRENT buildbot, but I also failed to reproduce the bug there. I was supposed to get access to the FreeBSD 10 buildbot, but there was a DNS issue.</p> <p>I had to <em>guess</em> the origin of the bug and I attempted a fix, <a class="reference external" href="https://github.com/python/cpython/commit/f01b2a1b84ee08df73a78cf1017eecf15e3cb995">commit f01b2a1b</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Mon Sep 3 14:38:21 2018 +0200 bpo-34544: Fix setlocale() in pymain_read_conf() (GH-9041) bpo-34485, bpo-34544: On some FreeBSD, nl_langinfo(CODESET) fails if LC_ALL or LC_CTYPE is set to an invalid locale name. Replace _Py_SetLocaleFromEnv(LC_CTYPE) with _Py_SetLocaleFromEnv(LC_ALL) to initialize properly locales. Partially revert commit 177d921c8c03d30daa32994362023f777624b10d. </pre> <p>... but it didn't work.</p> <p>I decided to install a FreeBSD 10 VM and one week later... I finally succeded to reproduce the issue!</p> <p>The bug was that the <tt class="docutils literal">_Py_CoerceLegacyLocale()</tt> function doesn't restore the LC_CTYPE to its previous value if it attempted to coerce the LC_CTYPE locale but no locale worked.</p> <p>Previously, it didn't matter, since the LC_CTYPE locale was initialized again later, or it was saved/restored indirectly. But with my latest changes, the LC_CTYPE was left unchanged.</p> <p>The fix is just to restore LC_CTYPE if <tt class="docutils literal">_Py_CoerceLegacyLocale()</tt> fails, <a class="reference external" href="https://github.com/python/cpython/commit/8ea09110d413829f71d979d8c7073008cb87fb03">commit 8ea09110</a>:</p> <pre class="literal-block"> Author: Victor Stinner &lt;vstinner&#64;redhat.com&gt; Date: Mon Sep 3 17:05:18 2018 +0200 _Py_CoerceLegacyLocale() restores LC_CTYPE on fail (GH-9044) bpo-34544: If _Py_CoerceLegacyLocale() fails to coerce the C locale, restore the LC_CTYPE locale to the its previous value. </pre> <p>Finally, I succeded to do what I wanted to do initially, remove the code which saved/restored the LC_ALL locale: <tt class="docutils literal">pymain_read_conf()</tt> is now really responsible to set the LC_CTYPE locale, and it doesn't modify the LC_ALL locale anymore.</p> </div> <div class="section" id="configuration-of-locales-and-encodings"> <h2>Configuration of locales and encodings</h2> <p>Python has <strong>many</strong> options to configure the locales and encodings.</p> <p>Main options of Python 3.7:</p> <ul class="simple"> <li>Legacy Windows stdio (PEP 528)</li> <li>Legacy Windows filesystem encoding (PEP 529)</li> <li>C locale coercion (PEP 538)</li> <li>UTF-8 mode (PEP 540)</li> </ul> <p>The combination of C locale coercion and UTF-8 mode is non-obvious and should be carefully tested!</p> <p>Environment variables:</p> <ul class="simple"> <li><tt class="docutils literal">PYTHONCOERCECLOCALE=0</tt></li> <li><tt class="docutils literal">PYTHONCOERCECLOCALE=1</tt></li> <li><tt class="docutils literal">PYTHONCOERCECLOCALE=warn</tt></li> <li><tt class="docutils literal"><span class="pre">PYTHONIOENCODING=:&lt;errors&gt;</span></tt></li> <li><tt class="docutils literal"><span class="pre">PYTHONIOENCODING=&lt;encoding&gt;:&lt;errors&gt;</span></tt></li> <li><tt class="docutils literal"><span class="pre">PYTHONIOENCODING=&lt;encoding&gt;</span></tt></li> <li><tt class="docutils literal">PYTHONLEGACYWINDOWSFSENCODING=1</tt></li> <li><tt class="docutils literal">PYTHONLEGACYWINDOWSSTDIO=1</tt></li> <li><tt class="docutils literal">PYTHONUTF8=0</tt></li> <li><tt class="docutils literal">PYTHONUTF8=1</tt></li> </ul> <p>Command line options:</p> <ul class="simple"> <li><tt class="docutils literal"><span class="pre">-X</span> utf8=0</tt></li> <li><tt class="docutils literal"><span class="pre">-X</span> utf8</tt> or <tt class="docutils literal"><span class="pre">-X</span> utf8=1</tt></li> <li><tt class="docutils literal"><span class="pre">-E</span></tt> or <tt class="docutils literal"><span class="pre">-I</span></tt> (ignore <tt class="docutils literal">PYTHON*</tt> environment variables)</li> </ul> <p>Global configuration variables:</p> <ul class="simple"> <li><tt class="docutils literal">Py_FileSystemDefaultEncodeErrors</tt></li> <li><tt class="docutils literal">Py_FileSystemDefaultEncoding</tt></li> <li><tt class="docutils literal">Py_LegacyWindowsFSEncodingFlag</tt></li> <li><tt class="docutils literal">Py_LegacyWindowsStdioFlag</tt></li> <li><tt class="docutils literal">Py_UTF8Mode</tt></li> </ul> <p>_PyCoreConfig:</p> <ul class="simple"> <li><tt class="docutils literal">coerce_c_locale</tt></li> <li><tt class="docutils literal">coerce_c_locale_warn</tt></li> <li><tt class="docutils literal">filesystem_encoding</tt></li> <li><tt class="docutils literal">filesystem_errors</tt></li> <li><tt class="docutils literal">stdio_encoding</tt></li> <li><tt class="docutils literal">stdio_errors</tt></li> </ul> <p>The LC_CTYPE locale depends on 3 environment variables:</p> <ul class="simple"> <li><tt class="docutils literal">LC_ALL</tt></li> <li><tt class="docutils literal">LC_CTYPE</tt></li> <li><tt class="docutils literal">LANG</tt></li> </ul> <p>Depending on the platform, the following configuration gives a different LC_CTYPE locale:</p> <ul class="simple"> <li><tt class="docutils literal">LC_ALL= LC_CTYPE= LANG=</tt> (no variable set)</li> <li><tt class="docutils literal">LC_ALL= LC_CTYPE=C LANG=</tt> (C locale)</li> <li><tt class="docutils literal">LC_ALL= LC_CTYPE=POSIX LANG=</tt> (POSIX locale)</li> </ul> <p>In case of doubt, I also tested:</p> <ul class="simple"> <li><tt class="docutils literal">LC_ALL=C LC_CTYPE= LANG=</tt> (C locale)</li> <li><tt class="docutils literal">LC_ALL=POSIX LC_CTYPE= LANG=</tt> (POSIX locale)</li> </ul> <p>The LC_CTYPE encoding (locale encoding) can be queried using <tt class="docutils literal">nl_langinfo(CODESET)</tt>. On FreeBSD, Solaris, HP-UX and maybe other platforms, <tt class="docutils literal">nl_langinfo(CODESET)</tt> announces an encoding which is different than the codec used by <tt class="docutils literal">mbstowcs()</tt> and <tt class="docutils literal">wcstombs()</tt> functions, and so Python forces the usage of the ASCII encoding.</p> <p>The test matrix of all these configurations and all platforms is quite big. Honestly, I would not bet that Python 3.8 will behave properly in all possible cases. At least, I tried to fix all issues that I spotted! Moreover, I added many tests which should help to detect bugs and prevent regressions.</p> </div> Python 3.7 UTF-8 Mode2018-03-27T20:00:00+02:002018-03-27T20:00:00+02:00Victor Stinnertag:vstinner.github.io,2018-03-27:/python37-new-utf8-mode.html<a class="reference external image-reference" href="https://www.flickr.com/photos/99444752&#64;N06/9368903367/"> <img alt="Sunrise" src="https://vstinner.github.io/images/sunrise.jpg" /> </a> <p>Since Python 3.0 was released in 2008, each time an user reported an encoding issue, someone showed up and asked why Python does not &quot;simply&quot; always use UTF-8. Well, it's not that easy. <strong>UTF-8 is the best encoding in most cases, but it is still not the best encoding …</strong></p><a class="reference external image-reference" href="https://www.flickr.com/photos/99444752&#64;N06/9368903367/"> <img alt="Sunrise" src="https://vstinner.github.io/images/sunrise.jpg" /> </a> <p>Since Python 3.0 was released in 2008, each time an user reported an encoding issue, someone showed up and asked why Python does not &quot;simply&quot; always use UTF-8. Well, it's not that easy. <strong>UTF-8 is the best encoding in most cases, but it is still not the best encoding in all cases</strong>, even in 2018. The locale encoding remains the best default filesystem encoding for Python. I would say that <strong>the locale encoding is the least bad filesystem encoding</strong>.</p> <p>This article tells the story of my <a class="reference external" href="https://www.python.org/dev/peps/pep-0540/">PEP 540: Add a new UTF-8 Mode</a> which adds an opt-in option to <strong>&quot;use UTF-8&quot; everywhere&quot;</strong>. Moreover, the UTF-8 Mode is enabled by the POSIX locale: <strong>Python 3.7 now uses UTF-8 for the POSIX locale</strong>. My PEP 540 is complementary to Nick Coghlan's PEP 538.</p> <p>When I started to write this article, I wrote something like: &quot;Hey! I added a new option to use UTF-8, enjoy!&quot;. Written like that, it seems like using UTF-8 was an obvious choice and that it was really easy to write such PEP. No. <strong>Nothing was obvious, nothing was simple.</strong></p> <p>It took me one year to design and implement my PEP 540, and to get it accepted. I wrote five articles before this one to show that the PEP 540 only came after a long painful journey, starting with Python 3.0, to choose the best Python encoding. My PEP rely on the all the great work done previously.</p> <p><strong>This article is the sixth and last in a series of articles telling the history and rationale of the Python 3 Unicode model for the operating system:</strong></p> <ul class="simple"> <li><ol class="first arabic"> <li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li> </ol> </li> <li><ol class="first arabic" start="2"> <li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li> </ol> </li> <li><ol class="first arabic" start="3"> <li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li> </ol> </li> <li><ol class="first arabic" start="4"> <li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li> </ol> </li> <li><ol class="first arabic" start="5"> <li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li> </ol> </li> <li><ol class="first arabic" start="6"> <li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li> </ol> </li> </ul> <div class="section" id="fallback-to-utf-8-if-getting-the-locale-encoding-fails"> <h2>Fallback to UTF-8 if getting the locale encoding fails?</h2> <p>May 2010, I reported <a class="reference external" href="https://bugs.python.org/issue8610">bpo-8610</a>: &quot;Python3/POSIX: errors if file system encoding is None&quot;. I asked what should be the default encoding when getting the locale encoding fails. I proposed to fallback to UTF-8. <a class="reference external" href="https://bugs.python.org/issue8610#msg105008">I wrote</a>:</p> <blockquote> <strong>UTF-8 is also an optimist choice</strong>: I bet that more and more operating systems will move to UTF-8.</blockquote> <p><a class="reference external" href="https://bugs.python.org/issue8610#msg105010">Marc-Andre commented</a>:</p> <blockquote> Ouch, that was a poor choice. <strong>In Python we have a tradition to avoid guessing</strong>, if possible. Since we cannot guarantee that the file system will indeed use UTF-8, it would have been safer to use ASCII. Not sure why this reasoning wasn't applied for the file system encoding.</blockquote> <p>In practice, Python already used UTF-8 when the filesystem encoding was set to <tt class="docutils literal">None</tt>. I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/b744ba1d14c5487576c95d0311e357b707600b47">commit b744ba1d</a> into the Python 3.2 development branch to make the default encoding (UTF-8) more obvious. But before Python 3.2 was released, I removed the fallback with my <a class="reference external" href="https://github.com/python/cpython/commit/e474309bb7f0ba6e6ae824c215c45f00db691889">commit e474309b</a> (Oct 2010):</p> <blockquote> <p><tt class="docutils literal">initfsencoding()</tt>: <tt class="docutils literal">get_codeset()</tt> failure is now a fatal error</p> <p>Don't fallback to UTF-8 anymore to avoid mojibake. I never got any error from his function.</p> </blockquote> </div> <div class="section" id="the-utf8-option-proposed-for-windows"> <h2>The utf8 option proposed for Windows</h2> <p>August 2016, <a class="reference external" href="https://bugs.python.org/issue27781">bpo-27781</a>: when <strong>Steve Dower</strong> <a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">was working on changing the filesystem encoding to UTF-8</a>, I was not sure that Windows should use UTF-8 by default. I was more in favor on <strong>making the backward incompatible change an opt-in option</strong>. <a class="reference external" href="https://bugs.python.org/issue27781#msg272950">I wrote</a>:</p> <blockquote> <p><strong>If you go in this direction, I would like to follow you for the UNIX/BSD side to make the switch portable. I was thinking about &quot;-X utf8&quot; which avoids to change the command line parser.</strong></p> <p>If we agree on a plan, <strong>I would like to write it down as a PEP since I expect a lot of complains and questions which I would prefer to only answer once</strong> (see for example the length of your thread on python-ideas where each people repeated the same things multiple times ;-))</p> </blockquote> <p><a class="reference external" href="https://bugs.python.org/issue27781#msg272962">I added</a>:</p> <blockquote> I mean that <tt class="docutils literal">python3 <span class="pre">-X</span> utf8</tt> should force <tt class="docutils literal">sys.getfilesystemencoding()</tt> to UTF-8 on UNIX/BSD, it would ignore the current locale setting.</blockquote> <p>Since Steve chose to <strong>change the default to UTF-8</strong> on Windows, my <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> option idea was ignored in this issue.</p> </div> <div class="section" id="the-utf8-option-proposed-for-the-posix-locale"> <h2>The utf8 option proposed for the POSIX locale</h2> <p>September 2016: <strong>Jan Niklas Hasse</strong> opened <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a> about Docker images, <strong>&quot;sys.getfilesystemencoding() should default to utf-8&quot;</strong>.</p> <p><a class="reference external" href="https://bugs.python.org/issue28180#msg276707">I proposed again my option</a>:</p> <blockquote> I proposed to add <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> command line option for UNIX to force utf8 encoding. Would it work for you?</blockquote> <p><strong>Jan Niklas Hasse</strong> <a class="reference external" href="https://bugs.python.org/issue28180#msg276709">answered</a>:</p> <blockquote> Unfortunately no, as this would mean I'll have to change all my python invocations in my scripts and it wouldn't work for executable files with</blockquote> <p>December 2016, <a class="reference external" href="https://bugs.python.org/issue28180#msg283408">I added</a>:</p> <blockquote> <p>Usually, when a new option is added to Python, we add a command line option (-X utf8) but also an environment variable: <strong>I propose PYTHONUTF8=1</strong>.</p> <p>Use your favorite method to define the env var &quot;system wide&quot; in your docker containers.</p> <p>Note: Technically, I'm not sure that it's possible to support -E option with PYTHONUTF8, since -E comes from the command line, and we first need to decode command line arguments with an encoding to parse these options.... Chicken-and-egg issue ;-)</p> </blockquote> <p><strong>Nick Coghlan</strong> <a class="reference external" href="https://vstinner.github.io/posix-locale.html">wrote his PEP 538 &quot;Coercing the C locale to a UTF-8 based locale&quot;</a> which has been approved in May 2017 and finally implemented in June 2017.</p> <p>Again, my utf8 idea was ignored in this issue.</p> </div> <div class="section" id="first-version-of-my-pep-540-add-a-new-utf-8-mode"> <h2>First version of my PEP 540: Add a new UTF-8 Mode</h2> <p>January 2017, as a follow-up of <a class="reference external" href="https://bugs.python.org/issue27781">bpo-27781</a> and <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a>, I wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0540/">PEP 540: Add a new UTF-8 Mode</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044089.html">I posted it to python-ideas for comments</a>.</p> <p>Abstract:</p> <blockquote> Add a new UTF-8 mode, opt-in option to use UTF-8 for operating system data instead of the locale encoding. Add <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> command line option and <tt class="docutils literal">PYTHONUTF8</tt> environment variable.</blockquote> <p>After ten hours after and a few messages, I <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044099.html">wrote a second version</a>:</p> <blockquote> I modified my PEP: <strong>the POSIX locale now enables the UTF-8 mode</strong>.</blockquote> <p><strong>INADA Naoki</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044112.html">wrote</a>:</p> <blockquote> <p>I want UTF-8 mode is <strong>enabled by default (opt-out option) even if locale is not POSIX</strong>, like <cite>PYTHONLEGACYWINDOWSFSENCODING</cite>.</p> <p>Users depends on locale know what locale is and how to configure it. They can understand difference between locale mode and UTF-8 mode and they can opt-out UTF-8 mode.</p> <p><strong>But many people lives in &quot;UTF-8 everywhere&quot; world</strong>, and don't know about locale.</p> </blockquote> <p>Always ignoring the locale to <strong>always use UTF-8 would be a backward incompatible change</strong>. I wasn't brave enough to propose it, I only wanted to propose an opt-in option, except of the specific case of the POSIX locale.</p> <p>Not only people had different opinons, but most people had strong opinions on how to handle Unicode and were not ready for compromises.</p> </div> <div class="section" id="third-version-of-my-pep-540"> <h2>Third version of my PEP 540</h2> <p>One week and 59 emails later, I <a class="reference external" href="https://bugs.python.org/issue29240">implemented my PEP 540</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044197.html">I wrote a third version of my PEP</a>:</p> <blockquote> <p>I made multiple changes since the first version of my PEP:</p> <ul class="simple"> <li>The <strong>UTF-8 Strict mode now only uses strict for inputs and outputs</strong>: it keeps surrogateescape for operating system data. Read the &quot;Use the strict error handler for operating system data&quot; alternative for the rationale.</li> <li>The POSIX locale now enables the UTF-8 mode. See the &quot;Don't modify the encoding of the POSIX locale&quot; alternative for the rationale.</li> <li>Specify the priority between -X utf8, PYTHONUTF8, PYTHONIOENCODING, etc.</li> </ul> <p>The PEP version 3 has a longer rationale with more example. (...)</p> </blockquote> <p>The new thread also got 19 emails, total: <strong>78 emails in one month</strong>. The same month, Nick Coghlan's PEP 538 was also under discussion.</p> </div> <div class="section" id="silence-during-one-year"> <h2>Silence during one year</h2> <p>Because of the tone of the python-ideas threads and because I didn't know how to deal with Nick Coghlan's PEP 538, <strong>I decided to do nothing during one year</strong> (January to December 2017).</p> <p>April 2017, Nick <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147795.html">proposed</a> <strong>INADA Naoki</strong> as the BDFL Delegate for his PEP 538 and my PEP 540. Guido <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147796.html">accepted to delegate</a>.</p> <p>May 2017, Naoki approved Nick's PEP 538, and Nick implemented it.</p> </div> <div class="section" id="pep-540-version-3-posted-to-python-dev"> <h2>PEP 540 version 3 posted to python-dev</h2> <p>At the end of 2017, when I looked at my contributions in Python 3.7 in the <a class="reference external" href="https://docs.python.org/dev/whatsnew/3.7.html">What’s New In Python 3.7</a> document, I didn't see any significant contribution. I wanted to propose something. Moreover, the deadline for the Python 3.7 feature freeze (first beta version) was getting close, end of January 2018: see the <a class="reference external" href="https://www.python.org/dev/peps/pep-0537/">PEP 537: Python 3.7 Release Schedule</a>.</p> <p>December 2017, I decided to move to the next step: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151054.html">I sent my PEP to the python-dev mailing list</a>.</p> <p>Guido van Rossum <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151069.html">complained about the length of the PEP</a>:</p> <blockquote> <p>I've been discussing this PEP offline with Victor, but he suggested we should discuss it in public instead.</p> <p><strong>I am very worried about this long and rambling PEP, and I propose that it not be accepted without a major rewrite to focus on clarity of the specification. The &quot;Unicode just works&quot; summary is more a wish than a proper summary of the PEP.</strong></p> <p>(...)</p> <p>So I guess PEP acceptance week is over. :-(</p> </blockquote> </div> <div class="section" id="pep-rewritten-from-scratch"> <h2>PEP rewritten from scratch</h2> <p>Even if <strong>I was not fully convinced myself that my PEP was a good idea</strong>, I wanted to get an official vote, to know if my idea should be implemented or abandonned. I decided to rewrite my PEP from scratch:</p> <ul class="simple"> <li><a class="reference external" href="https://github.com/python/peps/blob/f92b5fbdc2bcd9b182c1541da5a0f4ce32195fb6/pep-0540.txt">PEP version 3 (before rewrite)</a>: 1,017 lines</li> <li><a class="reference external" href="https://github.com/python/peps/blob/0bb19ff93af9855db327e9a02f3e86b6f932a25a/pep-0540.txt">PEP version 4 (after rewrite)</a>: 263 lines (26% of the previous version)</li> </ul> <p>I reduced the rationale to the strict minimum, to explain <strong>key points</strong> of the PEP:</p> <ul class="simple"> <li>Locale encoding and UTF-8</li> <li>Passthough undecodable bytes: surrogateescape</li> <li>Strict UTF-8 for correctness</li> <li>No change by default for best backward compatibility</li> </ul> </div> <div class="section" id="reading-jpeg-pictures-with-surrogateescape"> <h2>Reading JPEG pictures with surrogateescape</h2> <p>December 2017, I sent the <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151074.html">shorter PEP version 4 to python-dev</a>.</p> <p>INADA Naoki, the BDFL-delegate, <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151081.html">spotted a design issue</a>:</p> <blockquote> <p>And I have one worrying point. With UTF-8 mode, <strong>open()'s default</strong> encoding/error handler <strong>is UTF-8/surrogateescape</strong>.</p> <p>(...)</p> <p>And <strong>opening binary file without &quot;b&quot; option is very common mistake</strong> of new developers. If default error handler is surrogateescape, <strong>they lose a chance to notice their bug</strong>.</p> </blockquote> <p>He <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151101.html">gave a concrete example</a>:</p> <blockquote> <p>With PEP 538 (C.UTF-8 locale), <tt class="docutils literal">open()</tt> uses UTF-8/strict, not UTF-8/surrogateescape.</p> <p>For example, this code raises <tt class="docutils literal">UnicodeDecodeError</tt> with PEP 538 if the file is JPEG file.</p> <pre class="literal-block"> with open(fn) as f: f.read() </pre> </blockquote> <p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151132.html">I replied</a>:</p> <blockquote> <p>While I'm not strongly convinced that <tt class="docutils literal">open()</tt> error handler must be changed for <tt class="docutils literal">surrogateescape</tt>, first <strong>I would like to make sure that it's really a very bad idea</strong> before changing it :-)</p> <p>(...)</p> <p>Using a JPEG image, the example is obviously wrong.</p> <p>But using surrogateescape on open() has been chosen to <strong>read text files which are mostly correctly encoded to UTF-8, except a few bytes</strong>.</p> <p>I'm not sure how to explain the issue. The Mercurial wiki page has a good example of this issue that they call the <a class="reference external" href="https://www.mercurial-scm.org/wiki/EncodingStrategy#The_.22makefile_problem.22">&quot;Makefile problem&quot;</a>.</p> </blockquote> <p><strong>Guido van Rossum</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151134.html">finished to convinced me</a>:</p> <blockquote> You will quickly get decoding errors, and that is <strong>INADA</strong>'s point. (Unless you use <tt class="docutils literal"><span class="pre">encoding='Latin-1'</span></tt>.) His worry is that the surrogateescape error handler makes it so that you won't get decoding errors, and then <strong>the failure mode is much harder to debug</strong>.</blockquote> <p>I <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151136.html">wrote a 5th version of my PEP</a>:</p> <blockquote> <p>I made the following two changes to the PEP 540:</p> <ul class="simple"> <li>open() error handler remains <tt class="docutils literal">&quot;strict&quot;</tt></li> <li>Remove the &quot;Strict UTF8 mode&quot; which doesn't make much sense anymore</li> </ul> </blockquote> </div> <div class="section" id="last-question-on-locale-getpreferredencoding"> <h2>Last question on locale.getpreferredencoding()</h2> <p>December 2017, <strong>INADA Naoki</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151144.html">asked</a>:</p> <blockquote> Or <tt class="docutils literal">locale.getpreferredencoding()</tt> returns <tt class="docutils literal"><span class="pre">'UTF-8'</span></tt> in UTF-8 mode too?</blockquote> <p>Oh, that's a good question! I <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151148.html">looked at the code</a> and agreed to return UTF-8:</p> <blockquote> <p>I checked the stdlib, and I found many places where <tt class="docutils literal">locale.getpreferredencoding()</tt> is used to get the user preferred encoding:</p> <ul class="simple"> <li>builtin <tt class="docutils literal">open()</tt>: default encoding</li> <li><tt class="docutils literal">cgi.FieldStorage</tt>: encode the query string</li> <li><tt class="docutils literal">encoding._alias_mbcs()</tt>: check if the requested encoding is the ANSI code page</li> <li><tt class="docutils literal">gettext.GNUTranslations</tt>: <tt class="docutils literal">lgettext()</tt> and <tt class="docutils literal">lngettext()</tt> methods</li> <li><tt class="docutils literal">xml.etree.ElementTree</tt>: <tt class="docutils literal"><span class="pre">ElementTree.write(encoding='unicode')</span></tt></li> </ul> <p>In the UTF-8 mode, I would expect that cgi, gettext and xml.etree all use the UTF-8 encoding by default. So <strong>locale.getpreferredencoding() should return UTF-8 if the UTF-8 mode is enabled</strong>.</p> </blockquote> <p>I <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151151.html">sent a 6th version of my PEP</a>:</p> <blockquote> locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 Mode.</blockquote> <p>Moreover, I also wrote a new much better written &quot;Relationship with the locale coercion (PEP 538)&quot; section replacing the &quot;Annex: Differences between PEP 538 and PEP 540&quot; section. The new section was asked by many people who were confused by the relationship between PEP 538 and PEP 540.</p> <p>Finally, one year after the first PEP version, INADA Naoki <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-December/151193.html">approved my PEP</a>!</p> </div> <div class="section" id="first-incomplete-implementation"> <h2>First incomplete implementation</h2> <p>I started to work on the implementation of my PEP 540 in March 2017. Once the PEP has been approved, I asked INADA Naoki for a review. <a class="reference external" href="https://github.com/python/cpython/pull/855#issuecomment-351089573">He asked me to fix the command line parsing</a> to handle properly the <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> option:</p> <blockquote> And when <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> option is found, we can decode from <tt class="docutils literal">char **argv</tt> again. Since <tt class="docutils literal">mbstowcs()</tt> doesn't guarantee round tripping, it is better than re-encode <tt class="docutils literal">wchar_t **argv</tt>.</blockquote> <p>Implementing properly the <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> option was tricky. Parsing the command line was done on <tt class="docutils literal">wchar_t*</tt> C strings (Unicode), which requires to decode the <tt class="docutils literal">char** argv</tt> C array of byte strings (bytes). Python starts by decoding byte strings from the locale encoding. If the utf8 option is detected, <tt class="docutils literal">argv</tt> byte strings must be decoded again, but now from UTF-8. The problem was that the code was not designed for that, and it required to refactor a lot of code in <tt class="docutils literal">Py_Main()</tt>.</p> <p><a class="reference external" href="https://github.com/python/cpython/pull/855#issuecomment-351252873">I replied</a>:</p> <blockquote> <p><tt class="docutils literal">main()</tt> and <tt class="docutils literal">Py_Main()</tt> are very complex. With the <a class="reference external" href="https://www.python.org/dev/peps/pep-0432/">PEP 432</a>, <strong>Nick Coghlan</strong>, <strong>Eric Snow</strong> and me are working on making this code better. See for example <a class="reference external" href="https://bugs.python.org/issue32030">bpo-32030</a>.</p> <p>(...)</p> <p>For all these reasons, <strong>I propose to merge this uncomplete PR and write a different PR for the most complex part</strong>, re-encode wchar_t* command line arguments, implement Py_UnixMain() or another even better option?</p> </blockquote> <p>I wanted to get my code merged as soon as possible to make sure that it will get into the first Python 3.7 beta, to get a longer testing period before Python 3.7 final.</p> <p>December 2017, <a class="reference external" href="https://bugs.python.org/issue29240">bpo-29240</a>, I pushed my <a class="reference external" href="https://github.com/python/cpython/commit/91106cd9ff2f321c0f60fbaa09fd46c80aa5c266">commit 91106cd9</a>:</p> <blockquote> <p>PEP 540: Add a new UTF-8 Mode</p> <ul class="simple"> <li>Add <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> command line option, <tt class="docutils literal">PYTHONUTF8</tt> environment variable and a new <tt class="docutils literal">sys.flags.utf8_mode</tt> flag.</li> <li><tt class="docutils literal">locale.getpreferredencoding()</tt> now returns 'UTF-8' in the UTF-8 mode. As a side effect, open() now uses the UTF-8 encoding by default in this mode.</li> </ul> </blockquote> </div> <div class="section" id="split-py-main-into-subfunctions"> <h2>Split Py_Main() into subfunctions</h2> <p>November 2017, I created <a class="reference external" href="https://bugs.python.org/issue32030">bpo-32030</a> to split the big <tt class="docutils literal">Py_Main()</tt> function into smaller subfunctions. My motivation was to be able to properly implement my PEP 540.</p> <p>It will take me <strong>3 months of work and 45 commits</strong> to completely cleanup <tt class="docutils literal">Py_Main()</tt> and put almost all Python configuration options into the private C <tt class="docutils literal">_PyCoreConfig</tt> structure.</p> </div> <div class="section" id="parse-again-the-command-line-when-x-utf8-is-used"> <h2>Parse again the command line when -X utf8 is used</h2> <p>December 2017, <a class="reference external" href="https://bugs.python.org/issue32030">bpo-32030</a>, thanks to the <tt class="docutils literal">Py_Main()</tt> refactoring, I was able to finish the implementation of my PEP.</p> <p>I pushed my <a class="reference external" href="https://github.com/python/cpython/commit/9454060e84a669dde63824d9e2fcaf295e34f687">commit 9454060e</a>:</p> <blockquote> <p><tt class="docutils literal">Py_Main()</tt> re-reads config if encoding changes</p> <p>If the encoding change (C locale coerced or UTF-8 Mode changed), <tt class="docutils literal">Py_Main()</tt> now reads again the configuration with the new encoding.</p> </blockquote> <p>If the encoding changed after reading the Python configuration, cleanup the configuration and <strong>read again the configuration with the new encoding.</strong> The key feature here allowed by the refactoring is to be able to cleanup properly all the configuration.</p> </div> <div class="section" id="utf-8-mode-and-the-locale-encoding"> <h2>UTF-8 Mode and the locale encoding</h2> <p>January 2018, while working on <a class="reference external" href="https://bugs.python.org/issue31900">bpo-31900</a> &quot;localeconv() should decode numeric fields from LC_NUMERIC encoding, not from LC_CTYPE encoding&quot;, I tested various combinations of locales and encodings. <strong>I found bugs with the UTF-8 mode.</strong></p> <p>When the UTF-8 mode is enabled explicitly by <tt class="docutils literal"><span class="pre">-X</span> utf8</tt>, the intent is to use UTF-8 &quot;everywhere&quot;. Right. But <strong>there are some places, where the current locale encoding is really the correct encoding</strong>, like the <tt class="docutils literal">time.strftime()</tt> function.</p> <p><a class="reference external" href="https://bugs.python.org/issue29240">bpo-29240</a>: I pushed a first fix, <a class="reference external" href="https://github.com/python/cpython/commit/cb3ae5588bd7733e76dc09277bb7626652d9bb64">commit cb3ae558</a>:</p> <blockquote> <p>Ignore UTF-8 Mode in the <tt class="docutils literal">time</tt> module</p> <p><tt class="docutils literal">time.strftime()</tt> must use the current <tt class="docutils literal">LC_CTYPE</tt> encoding, not UTF-8 if the UTF-8 mode is enabled.</p> </blockquote> <p>I tested more cases and found... <strong>more bugs</strong>. More functions must really use the current locale encoding, rather than UTF-8 if the UTF-8 Mode is enabled.</p> <p>I pushed a second fix, <a class="reference external" href="https://github.com/python/cpython/commit/7ed7aead9503102d2ed316175f198104e0cd674c">commit 7ed7aead</a>:</p> <blockquote> <p>Fix locale encodings in UTF-8 Mode</p> <p>Modify <tt class="docutils literal">locale.localeconv()</tt>, <tt class="docutils literal">time.tzname</tt>, <tt class="docutils literal">os.strerror()</tt> and other functions to ignore the UTF-8 Mode: always use the current locale encoding.</p> </blockquote> <p>The second fix documented the encoding used by the public C functions <a class="reference external" href="https://docs.python.org/dev/c-api/sys.html#c.Py_DecodeLocale">Py_DecodeLocale()</a> and <a class="reference external" href="https://docs.python.org/dev/c-api/sys.html#c.Py_EncodeLocale">Py_EncodeLocale()</a>:</p> <blockquote> <p>Encoding, highest priority to lowest priority:</p> <ul class="simple"> <li><tt class="docutils literal"><span class="pre">UTF-8</span></tt> on macOS and Android;</li> <li><tt class="docutils literal"><span class="pre">UTF-8</span></tt> if the Python UTF-8 mode is enabled;</li> <li><tt class="docutils literal">ASCII</tt> if the <tt class="docutils literal">LC_CTYPE</tt> locale is <tt class="docutils literal">&quot;C&quot;</tt>, <tt class="docutils literal">nl_langinfo(CODESET)</tt> returns the <tt class="docutils literal">ASCII</tt> encoding (or an alias), and <tt class="docutils literal">mbstowcs()</tt> and <tt class="docutils literal">wcstombs()</tt> functions uses the <tt class="docutils literal"><span class="pre">ISO-8859-1</span></tt> encoding.</li> <li>the current locale encoding.</li> </ul> </blockquote> <p>The fix was complex to be written because I had to extend Py_DecodeLocale() and Py_EncodeLocale() to support internally the <tt class="docutils literal">strict</tt> error handler. I also extended to API to report an error message (called &quot;reason&quot;) on failure.</p> <p>For example, <tt class="docutils literal">Py_DecodeLocale()</tt> has the prototype:</p> <pre class="literal-block"> wchar_t* Py_DecodeLocale(const char* arg, size_t *wlen) </pre> <p>whereas the new extended and more generic <tt class="docutils literal">_Py_DecodeLocaleEx()</tt> has a much more complex prototype:</p> <pre class="literal-block"> int _Py_DecodeLocaleEx(const char* arg, wchar_t **wstr, size_t *wlen, const char **reason, int current_locale, int surrogateescape) </pre> <p>To decode, there are two main use cases:</p> <ul class="simple"> <li>(FILENAME) Use UTF-8 if the UTF-8 Mode is enabled, or the locale encoding otherwise. See <tt class="docutils literal">Py_DecodeLocale()</tt> documentation for the exact used encoding, the truth is more complex.</li> <li>(LOCALE) Always use the current locale encoding</li> </ul> <p>(FILENAME) examples:</p> <ul class="simple"> <li><tt class="docutils literal">Py_DecodeLocale()</tt>, <tt class="docutils literal">PyUnicode_DecodeFSDefaultAndSize()</tt>: use the <tt class="docutils literal">surrogateescape</tt> error handler</li> <li><tt class="docutils literal">os.fsdecode()</tt></li> <li><tt class="docutils literal">os.listdir()</tt></li> <li><tt class="docutils literal">os.environ</tt></li> <li><tt class="docutils literal">sys.argv</tt></li> <li>etc.</li> </ul> <p>(LOCALE) examples:</p> <ul class="simple"> <li><tt class="docutils literal">PyUnicode_DecodeLocale()</tt>: the error handler is passed as an argument and must be <tt class="docutils literal">strict</tt> or <tt class="docutils literal">surrogateescape</tt></li> <li><tt class="docutils literal">time.strftime()</tt></li> <li><tt class="docutils literal">locale.localeconv()</tt></li> <li><tt class="docutils literal">time.tzname</tt></li> <li><tt class="docutils literal">os.strerror()</tt></li> <li><tt class="docutils literal">readline</tt> module: internal <tt class="docutils literal">decode()</tt> function</li> <li>etc.</li> </ul> </div> <div class="section" id="summary-of-pep-540-history"> <h2>Summary of PEP 540 history</h2> <ul class="simple"> <li>Version 1: first version sent to python-ideas</li> <li>Version 2: the POSIX locale now enables the UTF-8 mode</li> <li>Version 3: the UTF-8 Strict mode now only uses the <tt class="docutils literal">strict</tt> error handler for inputs and outputs</li> <li>Version 4: PEP rewritten from scratch to be shorter</li> <li>Version 5: open() error handler remains <tt class="docutils literal">strict</tt>, and the &quot;Strict UTF8 mode&quot; has been removed</li> <li>Version 6: locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8 Mode.</li> </ul> <p>Abstract of the final approved PEP:</p> <blockquote> <p>Add a new &quot;UTF-8 Mode&quot; to enhance Python's use of UTF-8. When UTF-8 Mode is active, Python will:</p> <ul class="simple"> <li>use the <tt class="docutils literal"><span class="pre">utf-8</span></tt> encoding, irregardless of the locale currently set by the current platform, and</li> <li>change the <tt class="docutils literal">stdin</tt> and <tt class="docutils literal">stdout</tt> error handlers to <tt class="docutils literal">surrogateescape</tt>.</li> </ul> <p>This mode is off by default, but is automatically activated when using the &quot;POSIX&quot; locale.</p> <p>Add the <tt class="docutils literal"><span class="pre">-X</span> utf8</tt> command line option and <tt class="docutils literal">PYTHONUTF8</tt> environment variable to control UTF-8 Mode.</p> </blockquote> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>It's now time for a well deserved nap... until the next major Unicode issue in Python.</p> <a class="reference external image-reference" href="https://www.flickr.com/photos/manager_2000/2911858714/"> <img alt="Tiger nap" src="https://vstinner.github.io/images/tiger_nap.jpg" /> </a> <p>(I love tigers: my favorite animals!)</p> </div> Python 3.7 and the POSIX locale2018-03-23T13:00:00+01:002018-03-23T13:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-23:/posix-locale.html<a class="reference external image-reference" href="https://www.flickr.com/photos/rj65/15010849568/"> <img alt="Bee" src="https://vstinner.github.io/images/bee.jpg" /> </a> <p>During the childhood of Python 3, encodings issues were common, even on well configured systems. Python used UTF-8 rather than the locale encoding, and so commonly produced <a class="reference external" href="https://en.wikipedia.org/wiki/Mojibake">mojibake</a>. For these reasons, when users complained about the Python behaviour with the POSIX locale, bug reports were closed with a message like …</p><a class="reference external image-reference" href="https://www.flickr.com/photos/rj65/15010849568/"> <img alt="Bee" src="https://vstinner.github.io/images/bee.jpg" /> </a> <p>During the childhood of Python 3, encodings issues were common, even on well configured systems. Python used UTF-8 rather than the locale encoding, and so commonly produced <a class="reference external" href="https://en.wikipedia.org/wiki/Mojibake">mojibake</a>. For these reasons, when users complained about the Python behaviour with the POSIX locale, bug reports were closed with a message like: &quot;your system is not properly configured, please fix your locale&quot;.</p> <p>I only started to make a shy change for the POSIX locale in Python 3.5 at the end of 2013: use <tt class="docutils literal">surrogateescape</tt> for stdin and stdout. We will have to wait for Nick Coghlan in 2017 for significant changes in Python 3.7.</p> <p>This article explains the slow transition, <strong>six years</strong> since the first bug report (2011) to the significant change (2017), from &quot;you must fix your locale&quot; to &quot;maybe Python can do something for you&quot;.</p> <p><strong>This article is the fifth in a series of articles telling the history and rationale of the Python 3 Unicode model for the operating system:</strong></p> <ul class="simple"> <li><ol class="first arabic"> <li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li> </ol> </li> <li><ol class="first arabic" start="2"> <li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li> </ol> </li> <li><ol class="first arabic" start="3"> <li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li> </ol> </li> <li><ol class="first arabic" start="4"> <li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li> </ol> </li> <li><ol class="first arabic" start="5"> <li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li> </ol> </li> <li><ol class="first arabic" start="6"> <li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li> </ol> </li> </ul> <div class="section" id="first-rejected-attempt-2011"> <h2>First rejected attempt, 2011</h2> <p>December 2011, <strong>Martin Packman</strong>, a Bazaar developer, reported <a class="reference external" href="https://bugs.python.org/issue13643">bpo-13643</a> to propose to use UTF-8 in Python if the locale encoding is ASCII:</p> <blockquote> <p>Currently when running Python on a non-OSX posix environment under either the <strong>C locale</strong>, or with an invalid or missing locale, it's <strong>not possible to operate using unicode filenames outside the ascii range</strong>. Using bytes works, as does reading expecting unicode, using the surrogates hack.</p> <p>This makes robustly working with non-ascii filenames on different platforms needlessly annoying, given <strong>no modern nix should have problems just using UTF-8 in these cases</strong>.</p> <p>See the <a class="reference external" href="https://bugs.launchpad.net/bzr/+bug/794353">downstream bzr bug for more</a>.</p> <p>One option is to <strong>just use UTF-8</strong> for encoding and decoding filenames <strong>when otherwise ascii would be used</strong>. As a strict superset, this shouldn't break too many existing assumptions, and <strong>it's unlikely that non-UTF-8 filenames will accidentally be mangled due to a locale setting blip.</strong> See the attached patch for this behaviour change. It does not include a test currently, but it's possible to write one using subprocess and overriden <tt class="docutils literal">LANG</tt> and <tt class="docutils literal">LC_ALL</tt> vars.</p> </blockquote> <p><a class="reference external" href="https://bugs.python.org/issue13643#msg149928">He added</a>:</p> <blockquote> <p>This is more about <strong>un-encodable filenames</strong>.</p> <p>At the moment work with non-ascii filenames in Python robustly requires two branches, one using unicode and one that encodes to bytestrings and deals with the case where the name can't be represented in the declared filesystem encoding.</p> <p><strong>That may be something that just had to be lived with</strong>, but it's a little annoying when even without a UTF-8 locale for a particular process, that's what most systems will want on disk.</p> </blockquote> <p>At this time, I was still traumatised by the <tt class="docutils literal">PYTHONFSENCODING</tt> mess: using a filesystem encoding different than the locale encoding caused many issues (see <a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a>). <a class="reference external" href="https://bugs.python.org/issue13643#msg149926">I wrote</a>:</p> <blockquote> It was already discussed: using a different encoding for filenames and for other things is really not a good idea. (...)</blockquote> <p>and <a class="reference external" href="https://bugs.python.org/issue13643#msg149927">I added</a>:</p> <blockquote> The right fix is to <strong>fix your locale, not Python</strong>.</blockquote> <p>Antoine Pitrou <a class="reference external" href="https://bugs.python.org/issue13643#msg149949">suggested to fix the operating system, not Python</a>:</p> <blockquote> <p>So <strong>why don't these supposedly &quot;modern&quot; systems at least set the appropriate environment variables</strong> for Python to infer the proper character encoding? (since these &quot;modern&quot; systems don't have a well-defined encoding...)</p> <p>Answer: because they are not modern at all, <strong>they are antiquated, inadapted and obsolete pieces of software designed and written by clueless Anglo-American people</strong>. Please report bugs against these systems. <strong>The culprit is not Python, it's the Unix crap</strong> and the utterly clueless attitude of its maintainers (&quot;filesystems are just bytes&quot;, yeah, whatever...).</p> </blockquote> <p><strong>Martin Pool</strong> <a class="reference external" href="https://bugs.python.org/issue13643#msg149951">wrote</a>:</p> <blockquote> The standard encoding is UTF-8. Python shouldn't need to have a variable set to tell it this.</blockquote> <p><a class="reference external" href="https://bugs.python.org/issue13643#msg149952">Antoine replied</a>:</p> <blockquote> How so? I don't know of any Linux or Unix spec which says so.</blockquote> <p>Four days and 34 messages later, <strong>Terry J. Reedy</strong> <a class="reference external" href="https://bugs.python.org/issue13643#msg150204">closed the issue</a>:</p> <blockquote> <p>Martin, after reading most all of the <strong>unusually large sequence of messages</strong>, I am closing this because <strong>three of the core developers</strong> with the most experience in this area are <strong>dead-set against your proposal</strong>.</p> <p>That does not make it 'wrong', but does mean that it will not be approved and implemented without new data and more persuasive arguments than those presented so far. I do not see that continued repetition of what has been said so far will change anything.</p> </blockquote> <p>Getting many messages in short time is common when discussing Unicode issues :-)</p> <p>March 2011, <strong>Armin Ronacher</strong> and <strong>Carl Meyer</strong> reported a similar issue: <a class="reference external" href="https://bugs.python.org/issue11574">bpo-11574</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2011-March/109361.html">[Python-Dev] Low-Level Encoding Behavior on Python 3</a>. I closed the issue as &quot;wont fixed&quot; in April 2012.</p> </div> <div class="section" id="second-attempt-2013"> <h2>Second attempt, 2013</h2> <p>November 2013, <strong>Sworddragon</strong> reported <a class="reference external" href="https://bugs.python.org/issue19846">bpo-19846</a>: <tt class="docutils literal">LANG=C python3 <span class="pre">-c</span> <span class="pre">'print(&quot;\xe4&quot;)'</span></tt> fails with an <tt class="docutils literal">UnicodeEncodeError</tt>.</p> <p><strong>Antoine Pitrou</strong> wrote a patch to use UTF-8 when the locale encoding is ASCII, same approach than the first attempt <a class="reference external" href="https://bugs.python.org/issue13643">bpo-13643</a>.</p> <p><strong>The patch was incomplete and so caused many issues.</strong> Python used the C codec of the locale encoding during Python initialization, and so Python had to use the locale encoding as its filesystem encoding.</p> <p>I listed all functions that should be modified to fix issues and get a fully working solution. Nobody came up with a full implementation, likely because <strong>too many changes were required</strong>.</p> <p>One month and 66 messages (almost the double of the previous attempt) later, again, <a class="reference external" href="https://bugs.python.org/issue19846#msg205675">I closed the issue</a>:</p> <blockquote> <p>I'm closing the issue as invalid, because <strong>Python 3 behaviour is correct</strong> and must not be changed.</p> <p>Standard streams (sys.stdin, sys.stdout, sys.stderr) uses the locale encoding. (...) These encodings and error handlers can be overriden by the <strong>PYTHONIOENCODING</strong>.</p> </blockquote> <p>My <a class="reference external" href="https://bugs.python.org/issue19846#msg205675">full long comment</a> describes encodings used on each platform.</p> </div> <div class="section" id="use-surrogateescape-for-stdin-and-stdout-in-python-3-5"> <h2>Use surrogateescape for stdin and stdout in Python 3.5</h2> <p>December 2013: Just after closing the second attempt <a class="reference external" href="https://bugs.python.org/issue19846">bpo-19846</a>, I created <a class="reference external" href="https://bugs.python.org/issue19977">bpo-19977</a> to propose to use the <tt class="docutils literal">surrogateescape</tt> error handler in <tt class="docutils literal">sys.stdin</tt> and <tt class="docutils literal">sys.stdout</tt> for the POSIX locale.</p> <p><strong>R. David Murray</strong> <a class="reference external" href="https://bugs.python.org/issue19977#msg206131">disliked my idea</a>:</p> <blockquote> <p><strong>Reintroducing moji-bake intentionally doesn't sound like a particularly good idea</strong>, wasn't that what python3 was supposed to help prevent?</p> <p>It does seem like a <strong>utf-8 default is the Way of the Future</strong>. Or even the present, most places.</p> </blockquote> <p>March 2014, since <strong>Serhiy Storchaka</strong> and <strong>Nick Coghlan</strong> supported my idea, I pushed my <a class="reference external" href="https://github.com/python/cpython/commit/7143029d4360637aadbd7ddf386ea5c64fb83095">commit 7143029d</a> in Python 3.5:</p> <blockquote> Issue #19977: When the <tt class="docutils literal">LC_TYPE</tt> locale is the POSIX locale (<tt class="docutils literal">C</tt> locale), <tt class="docutils literal">sys.stdin</tt> and <tt class="docutils literal">sys.stdout</tt> are now using the <tt class="docutils literal">surrogateescape</tt> error handler, instead of the <tt class="docutils literal">strict</tt> error handler.</blockquote> <p>Previously, <strong>Python 3 was very strict on encodings</strong>, all core developers were convinced to be able to force developers to fix their applications. This change is one the <strong>first Python 3 change which can produce &quot;mojibake&quot; on purpose</strong>.</p> <p><strong>Six years after the Python 3.0 release, we started to understand that while developers can fix their code, we cannot ask users to fix their configuration (&quot;fix their locale&quot;).</strong></p> </div> <div class="section" id="read-etc-locale-conf"> <h2>Read /etc/locale.conf?</h2> <p>April 2014, <strong>Nick Coghlan</strong> created <a class="reference external" href="https://bugs.python.org/issue21368">bpo-21368</a>: &quot;Check for systemd locale on startup if current locale is set to POSIX&quot;.</p> <blockquote> If a modern Linux system is using systemd as the process manager, then there will likely be <strong>a &quot;/etc/locale.conf&quot; file</strong> providing settings like LANG - due to problematic requirements in the POSIX specification, <strong>this file</strong> (when available) is <strong>likely to be a better &quot;source of truth&quot; regarding the system encoding</strong> than the environment where the interpreter process is started, at least when the latter is claiming ASCII as the default encoding.</blockquote> <p><a class="reference external" href="https://bugs.python.org/issue21368#msg217328">I disliked the idea</a>:</p> <blockquote> I don't think that Python should read such configuration file. If you consider that something is wrong here, <strong>please report the issue to the C library</strong>.</blockquote> <p>Since no consensus was found, no action was taken.</p> </div> <div class="section" id="misconfigured-locales-in-docker-images"> <h2>Misconfigured locales in Docker images</h2> <p>September 2016: <strong>Jan Niklas Hasse</strong> opened <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a>, <strong>&quot;sys.getfilesystemencoding() should default to utf-8&quot;</strong>.</p> <blockquote> <strong>Working with Docker I often end up with an environment where the locale isn't correctly set.</strong> In these cases <strong>it would be great if sys.getfilesystemencoding() could default to 'utf-8'</strong> instead of <tt class="docutils literal">'ascii'</tt>, as it's the encoding of the future and ascii is a subset of it anyway.</blockquote> <p>December 2016, <strong>Jan Niklas Hasse</strong> <a class="reference external" href="https://bugs.python.org/issue28180#msg282972">mentioned</a> the <tt class="docutils literal"><span class="pre">C.UTF-8</span></tt> locale:</p> <blockquote> <p><a class="reference external" href="https://sourceware.org/glibc/wiki/Proposals/C.UTF-8#Defaults">glibc C.UTF-8 article</a> mentions that <strong>C.UTF-8 should be glibc's default</strong>.</p> <p>This bug report <a class="reference external" href="https://sourceware.org/bugzilla/show_bug.cgi?id=17318">also mentions Python</a>. It <strong>hasn't been fixed yet</strong>, though :/</p> </blockquote> <p><strong>Marc-Andre Lemburg</strong> <a class="reference external" href="https://bugs.python.org/issue28180#msg282977">added</a>:</p> <blockquote> <p>If we just restrict this to the file system encoding (and not the whole LANG setting), how about:</p> <ul class="simple"> <li>default the file system encoding to 'utf-8' and use the surrogate escape handler as default error handler</li> <li>add a <tt class="docutils literal">PYTHONFSENCODING</tt> env var to set the file system encoding to something else (*)</li> </ul> <p>(*) I believe we discussed this at some point already, but don't remember the outcome.</p> </blockquote> <p>The removed <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable, using a filesystem encoding different than the locale encoding, caused many issues: see <a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a>.</p> <p><strong>Nick Coghlan</strong> <cite>proposed to experiment using the C.UTF-8 locale</cite> in Fedora 26:</p> <blockquote> <p><strong>For Fedora 26,</strong> I'm going to explore the feasibility of patching our system 3.6 installation such that the python3 command itself (rather than the shared library) <strong>checks for &quot;LC_CTYPE=C&quot;</strong> as almost the first thing it does, and forcibly <strong>sets LANG and LC_ALL to C.UTF-8</strong> if it gets an answer it doesn't like. If we're able to do that successfully in the more constrained environment of a specific recent Fedora release, then I think it will bode well for doing something similar by default in CPython 3.7</p> <p><a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1404918">Downstream Fedora issue proposing the above idea for F26</a>.</p> </blockquote> <p>Fedora 26 integrated a downstream change in Python 3.6: see <a class="reference external" href="https://fedoraproject.org/wiki/Releases/26/ChangeSet#Python_3_C.UTF-8_locale">Python 3 C.UTF-8 locale</a>.</p> </div> <div class="section" id="pep-538-coercing-the-c-locale-to-a-utf-8-based-locale"> <h2>PEP 538: Coercing the C locale to a UTF-8 based locale</h2> <a class="reference external image-reference" href="http://www.curiousefficiency.org/"> <img alt="Nick Coghlan" src="https://vstinner.github.io/images/nick_coghlan.jpg" /> </a> <p>December 2016, as a follow-up of <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a>, <strong>Nick Coghlan</strong> wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0538/">PEP 538: Coercing the legacy C locale to a UTF-8 based locale</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-ideas/2017-January/044130.html">posted it to python-ideas list</a> and <a class="reference external" href="https://mail.python.org/pipermail/linux-sig/2017-January/000014.html">to the linux-sig list</a>.</p> <p>April 2017, Nick <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147795.html">proposed</a> <strong>INADA Naoki</strong> as the BDFL Delegate for his PEP. Guido <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147796.html">accepted to delegate</a>.</p> <p>May 2017, after 5 months of discussions and changes, INADA Naoki <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-May/148035.html">approved the PEP</a>.</p> <p>June 2017, <a class="reference external" href="https://bugs.python.org/issue28180">bpo-28180</a>: Nick Coghlan pushed the <a class="reference external" href="https://github.com/python/cpython/commit/6ea4186de32d65b1f1dc1533b6312b798d300466">commit 6ea4186d</a>:</p> <blockquote> bpo-28180: Implementation for PEP 538 (#659)</blockquote> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>A first attempt to use a different encoding for the POSIX locale was rejected in 2011. A second attempt was also rejected in 2013.</p> <p>I modified Python 3.5 in 2014 to use the <tt class="docutils literal">surrogateescape</tt> error handler in <tt class="docutils literal">stdin</tt> and <tt class="docutils literal">stdout</tt> for the POSIX locale. Six years after the Python 3.0 release, we started to understand that while developers can fix their code, we cannot ask users to &quot;fix their locale&quot; (configure properly their locale).</p> <p>In 2016, the problem occurred again with misconfigured locales in Docker images. In 2017, Nick Coghlan wrote the PEP 538 &quot;Coercing the legacy C locale to a UTF-8 based locale&quot; which has been approved by INADA Naoki and implemented in Python 3.7.</p> </div> Python 3.6 now uses UTF-8 on Windows2018-03-22T17:00:00+01:002018-03-22T17:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-22:/python36-utf8-windows.html<p>September 2016, a few days before the CPython core dev sprint, <strong>Steve Dower</strong> proposed two major backward incompatible changes for Python 3.6 on Windows: <a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows console encoding to UTF-8</a> and <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/">PEP 529: Change Windows filesystem encoding to UTF-8</a>. At the first read, I was sure that …</p><p>September 2016, a few days before the CPython core dev sprint, <strong>Steve Dower</strong> proposed two major backward incompatible changes for Python 3.6 on Windows: <a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows console encoding to UTF-8</a> and <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/">PEP 529: Change Windows filesystem encoding to UTF-8</a>. At the first read, I was sure that the PEP 529 will break all applications on Windows. This article tells the story behind the PEPs approval.</p> <p><strong>This article is the fourth in a series of articles telling the history and rationale of the Python 3 Unicode model for the operating system:</strong></p> <ul class="simple"> <li><ol class="first arabic"> <li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li> </ol> </li> <li><ol class="first arabic" start="2"> <li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li> </ol> </li> <li><ol class="first arabic" start="3"> <li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li> </ol> </li> <li><ol class="first arabic" start="4"> <li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li> </ol> </li> <li><ol class="first arabic" start="5"> <li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li> </ol> </li> <li><ol class="first arabic" start="6"> <li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li> </ol> </li> </ul> <div class="section" id="pep-529"> <h2>PEP 529</h2> <p>September 2016, <strong>Steve Dower</strong>, who works for Microsoft, wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/">PEP 529: Change Windows filesystem encoding to UTF-8</a> and <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-September/146051.html">posted it to python-dev</a> for comments.</p> <a class="reference external image-reference" href="http://stevedower.id.au/blog/"> <img alt="Steve Dower" src="https://vstinner.github.io/images/steve_dower.jpg" /> </a> <p>Abstract:</p> <blockquote> <p><strong>Historically, Python uses the ANSI APIs</strong> for interacting with the Windows operating system, often via C Runtime functions. However, these have been long discouraged in favor of the UTF-16 APIs. Within the operating system, all text is represented as UTF-16, and the ANSI APIs perform encoding and decoding using the active code page. See Naming Files, Paths, and Namespaces for more details.</p> <p>This PEP proposes <strong>changing the default filesystem encoding on Windows to utf-8</strong>, and changing all filesystem functions to use the Unicode APIs for filesystem paths. This will not affect code that uses strings to represent paths, however those that use bytes for paths will now be able to correctly round-trip all valid paths in Windows filesystems. <strong>Currently, the conversions between Unicode (in the OS) and bytes (in Python) were lossy</strong> and would fail to round-trip characters outside of the user's active code page.</p> <p>Notably, this does not impact the encoding of the contents of files. These will continue to default to <tt class="docutils literal">locale.getpreferredencoding()</tt> (for text files) or plain bytes (for binary files). This only affects the encoding used when users pass a bytes object to Python where it is then passed to the operating system as a path name.</p> </blockquote> </div> <div class="section" id="my-analysis"> <h2>My analysis</h2> <p>Here is my analysis on the rationale for the PEP 529 change.</p> <p><strong>On Unix, the native type for filenames is bytes</strong>. A filename is seen by the Linux kernel as an opaque object. The ext4 filesystem stores filenames as bytes. If a Python 2 application uses Unicode for filenames, filesystem operations can fail with a Unicode error (encoding or decoding error) depending on the locale encoding. If the locale encoding is ASCII, Unicode errors are likely to occur at the first non-ASCII filename. For example, Mercurial handles filenames as bytes.</p> <p>On Python 3, handling filenames as Unicode works thanks to the <tt class="docutils literal">surrogateescape</tt> error handler. <strong>Most Python 2 applications ported to Python 3 keep their Python 2 support, and so still handle filenames bytes.</strong></p> <p>Problems arise when such software is used on Windows.</p> <p><strong>On Windows, the native type for filenames is Unicode</strong>. Many functions come in two flavors: &quot;ANSI&quot; (bytes) and &quot;Wide&quot; (Unicode) versions. In my opinion, the ANSI flavor mostly exists for backward compatibility. In Python 3.5, passing a filename as bytes uses the ANSI flavor, whereas the Wide flavor is used for Unicode filenames. The ANSI flavor uses the ANSI code page which is very limited compared to Unicode, usually only 256 code points or less. Some filenames not encodable to the ANSI code page simply cannot be opened, renamed, etc. using the ANSI API.</p> <p>The other issue is that <strong>some developers only develop on Unix</strong> (ex: Linux or macOS) <strong>and never test their application on Windows</strong>.</p> <p>For a better rationale, read the <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/#background">Background section</a> of Steve Dower's PEP :-)</p> </div> <div class="section" id="discussion-at-the-cpython-sprint-and-guido-s-approval"> <h2>Discussion at the CPython sprint and Guido's approval</h2> <p>Honestly, <strong>at the first read, I was sure that the PEP 529 will break all applications on Windows</strong>.</p> <p>Hopefully, thanks to the PSF and Instagram, I was able to attend my first CPython sprint at Instagram headquarters: <a class="reference external" href="https://vstinner.github.io/cpython-sprint-2016.html">CPython sprint, september 2016</a>. I discussed there with <strong>Steve who reassured me and explained me his PEP</strong>. Later, we talked with <strong>Guido van Rossum</strong>.</p> <p>Even if I liked the idea of using UTF-8, I was still not fully confident that the change will not break the world. <strong>We agreed to try the change during Python 3.6 beta phase</strong>, but revert it if something bad happens.</p> <a class="reference external image-reference" href="http://blog.python.org/2016/09/python-core-development-sprint-2016-36.html"> <img alt="CPython developers at the Facebook sprint" src="https://vstinner.github.io/images/cpython_sprint_2016_photo.jpg" /> </a> <p>Following this talk, <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-September/146277.html">Guido accepted the PEP under conditions</a></p> <blockquote> <p>I'm hijacking this thread to <strong>provisionally accept PEP 529</strong>. (I'll also do this for PEP 528, in its own thread.)</p> <p><strong>I've talked things over with Steve and Victor and we're going to do an experiment</strong> (as <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/#beta-experiment">now written up in the PEP</a>) to tease out any issues with this change during the beta. <strong>If serious problems crop up we may have to roll back the changes and reject the PEP</strong> -- we won't get another chance at getting this right. (That would also mean that using the binary filesystem APIs will remain deprecated and will eventually be disallowed; as long as the PEP remains accepted they are undeprecated.)</p> <p>Congrats Steve! Thanks for the massive amount of work on the implementation and the thinking that went into the design. Thanks everyone else for their feedback.</p> <p class="attribution">&mdash;Guido</p> </blockquote> <p><strong>I was honoured that Guido listened to my Unicode experience</strong> to take a decision on the PEP ;-)</p> <p>Steve chose the right timing to get his PEP accepted. Thanks to the sprint which helped to quickly discussed such backward incompatible change, <strong>the PEP has been approved in just 12 days</strong>! For comparison, some of my PEPs like my <a class="reference external" href="https://www.python.org/dev/peps/pep-0446/">PEP 446: Make newly created file descriptors non-inheritable</a> (another backward incompatible change) took 8 months to get accepted.</p> </div> <div class="section" id="pep-528-windows-console"> <h2>PEP 528: Windows console</h2> <p>Just before the PEP 529, Steve Dower also wrote <a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows console encoding to UTF-8</a>. This change only impacts the Windows console, so there is a lower risk of breaking the world.</p> <p>This PEP was also <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-September/146278.html">quickly approved by Guido</a> during the CPython sprint. Steve implemented it in Python 3.6.</p> <p>Even if it's smaller change, it is <strong>yet another change towards using UTF-8 everywhere</strong>.</p> </div> <div class="section" id="great-success"> <h2>Great success!</h2> <p>Hopefully, I was wrong about the risk of breaking the world. <strong>No user complained about these two backward incompatible changes: Python 3.6 on Windows is a success!</strong></p> <p>Python 3.6 now has a <strong>better Unicode support</strong> on Windows thanks to the PEP 528 and PEP 529!</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>September 2016: Steve Dower proposed two major backward incompatible changes for Python 3.6 on Windows: <a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows console encoding to UTF-8</a> and <a class="reference external" href="https://www.python.org/dev/peps/pep-0529/">PEP 529: Change Windows filesystem encoding to UTF-8</a>.</p> <p>At the first read, I was sure that the PEP 529 (filesystem encoding) will break all applications on Windows.</p> <p>Thanks to the CPython core dev sprint, I was able to discuss with Steve who reassured me and explained me his PEP 529. We agreed with Guido van Rossum to try the change during Python 3.6 beta phase, but revert it if something bad happens. I was honoured that Guido listened to my Unicode experience to take a decision on the PEP.</p> <p>The <a class="reference external" href="https://www.python.org/dev/peps/pep-0528/">PEP 528: Change Windows console encoding to UTF-8</a> was also quickly approved, another change towards using UTF-8 everywhere.</p> <p>No user complained about these two backward incompatible changes: Python 3.6 on Windows is a success!</p> <p>Python 3.6 now has a better Unicode support thanks on Windows to the PEP 528 and PEP 529!</p> </div> Python 3.2 Painful History of the Filesystem Encoding2018-03-15T23:00:00+01:002018-03-15T23:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-15:/painful-history-python-filesystem-encoding.html<p>Between Python 3.0 released in 2008 and Python 3.4 released in 2014, the Python filesystem encoding changed multiple times. <strong>It took 6 years to choose the best Python filesystem encoding on each platform.</strong></p> <p><strong>I have been officially promoted as a core developer</strong> in January 2010 by <strong>Martin von …</strong></p><p>Between Python 3.0 released in 2008 and Python 3.4 released in 2014, the Python filesystem encoding changed multiple times. <strong>It took 6 years to choose the best Python filesystem encoding on each platform.</strong></p> <p><strong>I have been officially promoted as a core developer</strong> in January 2010 by <strong>Martin von Loewis</strong>. I spent the whole year of 2010 to fix dozens of encoding issues during the development of Python 3.2, following my Unicode work started in 2008.</p> <p>This article is focused on the long discussions to choose the best Python filesystem encoding on each platform in 2010 for Python 3.2.</p> <p><strong>This article is the third in a series of articles telling the history and rationale of the Python 3 Unicode model for the operating system:</strong></p> <ul class="simple"> <li><ol class="first arabic"> <li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li> </ol> </li> <li><ol class="first arabic" start="2"> <li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li> </ol> </li> <li><ol class="first arabic" start="3"> <li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li> </ol> </li> <li><ol class="first arabic" start="4"> <li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li> </ol> </li> <li><ol class="first arabic" start="5"> <li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li> </ol> </li> <li><ol class="first arabic" start="6"> <li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li> </ol> </li> </ul> <a class="reference external image-reference" href="https://commons.wikimedia.org/wiki/File:Longleat-maze.jpg"> <img alt="Maze" src="https://vstinner.github.io/images/maze.jpg" /> </a> <div class="section" id="python-3-0-loves-utf-8"> <h2>Python 3.0 loves UTF-8</h2> <p>When Python 3.0 was released, it was unclear which encodings should be used for:</p> <ul class="simple"> <li>File content: <tt class="docutils literal"><span class="pre">open().read()</span></tt></li> <li>Filenames: <tt class="docutils literal">os.listdir()</tt>, <tt class="docutils literal">open()</tt>, etc.</li> <li>Command line arguments: <tt class="docutils literal">sys.argv</tt> and <tt class="docutils literal">subprocess.Popen</tt> arguments</li> <li>Environment variables: <tt class="docutils literal">os.environ</tt></li> <li>etc.</li> </ul> <p>Python 3.0 was forked from Python 2.6 and functions were modified to use Unicode. Many Python 3 functions only used UTF-8 because the implementation were modified to use the default encoding which is UTF-8: it was not a deliberate choice.</p> <p><strong>While UTF-8 is a good choice in most cases, it is not the best choice in all cases.</strong> Almost everything worked well in Python 3.0 when all data used UTF-8, but Python 3.0 failed badly if the locale encoding was not UTF-8.</p> <p>Python 3.1, 3.2 and 3.3 will get a lot of changes to adjust encodings in all corners of the standard library.</p> <p>Python 3.1 got the <tt class="docutils literal">surrogateescape</tt> error handler (PEP 383) which reduced Unicode errors: read my previous article <a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a>.</p> </div> <div class="section" id="add-sys-setfilesystemencoding"> <h2>Add sys.setfilesystemencoding()</h2> <p>September 2008, <a class="reference external" href="https://bugs.python.org/issue3187">bpo-3187</a>: To fix <tt class="docutils literal">os.listdir(str)</tt> to support undecodable filenames, <strong>Martin v. Löwis</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg74080">proposed a new function to change the filesystem encoding</a>:</p> <blockquote> Here is a patch that solves the issue in a different way: it introduces sys.setfilesystemencoding. <strong>If applications invoke sys.setfilesystemencoding(&quot;iso-8859-1&quot;), all file names can be successfully converted into a character string.</strong></blockquote> <p>The ISO-8859-1 encoding has a very interesting property for bytes: it maps exactly the <tt class="docutils literal">0x00 - 0xff</tt> byte range to the U+0000 - U+00ff Unicode range, the decoder cannot fail:</p> <pre class="literal-block"> $ python3.6 -q &gt;&gt;&gt; all(ord((b'%c' % byte).decode('iso-8859-1')) == byte for byte in range(256)) True &gt;&gt;&gt; all(ord(('%c' % char).encode('iso-8859-1')) == char for char in range(256)) True </pre> <p>Guido van Rossum <a class="reference external" href="https://bugs.python.org/issue3187#msg74173">commented</a>:</p> <blockquote> <p>I will check in Victor's changes (with some edits).</p> <p>Together this means that the various <strong>suggested higher-level solutions</strong> (like returning path-like objects, or some kind of roudtripping almost-but-not-quite-utf-8 encoding) <strong>can be implemented in pure Python</strong>.</p> </blockquote> <p>October 2008, <strong>Martin v. Löwis</strong> pushed the <a class="reference external" href="https://github.com/python/cpython/commit/04dc25c53728f5c2fe66d9e66af67da0c9b8959d">commit 04dc25c5</a>:</p> <pre class="literal-block"> Issue #3187: Add sys.setfilesystemencoding. </pre> <p>Python 3.0 will be the first major release with this function.</p> <p>In retrospective, I see this function as asking developers and users to be smart and choose the encoding themself.</p> <p>While the ISO-8859-1 encoding trick is tempting, we will see later that <tt class="docutils literal">setfilesystemencoding()</tt> is broken by design and so cannot be used in practice.</p> </div> <div class="section" id="what-if-getting-the-locale-encoding-fails"> <h2>What if getting the locale encoding fails?</h2> <p>May 2010, I reported <a class="reference external" href="https://bugs.python.org/issue8610">bpo-8610</a>, &quot;Python3/POSIX: errors if file system encoding is None&quot;:</p> <blockquote> On POSIX (but not on Mac OS X), Python3 calls get_codeset() to get the file system encoding. If this function fails, sys.getfilesystemencoding() returns None.</blockquote> <p>I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/b744ba1d14c5487576c95d0311e357b707600b47">commit b744ba1d</a>:</p> <blockquote> Issue #8610: Load file system codec at startup, and <strong>display a fatal error on failure</strong>. <strong>Set the file system encoding to utf-8</strong> (instead of None) <strong>if getting the locale encoding failed</strong>, or if nl_langinfo(CODESET) function is missing.</blockquote> <p>This change <strong>adds the function initfsencoding()</strong>: logic to initialize the filesystem encoding.</p> <p>In practice, Python already used UTF-8 when the filesystem encoding was set to <tt class="docutils literal">None</tt>, but this change makes the default more obvious. The change also makes the error case better defined: Python exits immediately with a fatal error.</p> </div> <div class="section" id="support-locale-encodings-different-than-utf-8"> <h2>Support locale encodings different than UTF-8</h2> <p>My biggest Unicode project in Python 3 was to <strong>fix the encoding</strong> in all corners of the standard library. This task kept me busy between Python 3.0 and Python 3.4, at least.</p> <p>May 2010, I created <a class="reference external" href="https://bugs.python.org/issue8611">bpo-8611</a>:</p> <blockquote> <strong>Python3 is unable to start</strong> (bootstrap failure) on a POSIX system <strong>if the locale encoding is different than utf8 and the Python path</strong> (standard library path where the encoding module is stored) <strong>contains a non-ASCII character</strong>. (Windows and Mac OS X are not affected by this issue because the file system encoding is hardcoded.)</blockquote> <p>For example, <a class="reference external" href="https://bugs.python.org/issue8242">bpo-8242</a> &quot;Improve support of PEP 383 (surrogates) in Python3&quot; is a meta issue tracking multiple issues:</p> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue7606">bpo-7606</a>: test_xmlrpc fails with non-ascii path</li> <li><a class="reference external" href="https://bugs.python.org/issue8092">bpo-8092</a>: utf8, backslashreplace and surrogates</li> <li><a class="reference external" href="https://bugs.python.org/issue8383">bpo-8383</a>: pickle is unable to encode unicode surrogates</li> <li><a class="reference external" href="https://bugs.python.org/issue8390">bpo-8390</a>: tarfile: use surrogates for undecode fields</li> <li><a class="reference external" href="https://bugs.python.org/issue8391">bpo-8391</a>: os.execvpe() doesn't support surrogates in env</li> <li><a class="reference external" href="https://bugs.python.org/issue8393">bpo-8393</a>: subprocess: support undecodable current working directory on POSIX OS</li> <li><a class="reference external" href="https://bugs.python.org/issue8394">bpo-8394</a>: ctypes.dlopen() doesn't support surrogates</li> <li><a class="reference external" href="https://bugs.python.org/issue8412">bpo-8412</a>: os.system() doesn't support surrogates nor bytes</li> <li><a class="reference external" href="https://bugs.python.org/issue8467">bpo-8467</a>: subprocess: surrogates of the error message (Python implementation on non-Windows)</li> <li><a class="reference external" href="https://bugs.python.org/issue8468">bpo-8468</a>: bz2: support surrogates in filename, and bytes/bytearray filename</li> <li><a class="reference external" href="https://bugs.python.org/issue8477">bpo-8477</a>: _ssl: support surrogates in filenames, and bytes/bytearray filenames</li> <li><a class="reference external" href="https://bugs.python.org/issue8485">bpo-8485</a>: Don't accept bytearray as filenames, or simplify the API</li> </ul> <p>I fixed all these issues, and reported most of them.</p> <p>October 2010, finally, five months later, I succeeded to close the issue!</p> <blockquote> Starting at r85691, the full test suite of Python 3.2 pass with ASCII, ISO-8859-1 and UTF-8 locale encodings in a non-ascii directory. <strong>The work on this issue is done.</strong></blockquote> <p>At that time, I didn't know that it will take me a few more years to really fix <strong>all</strong> encoding issues. For example, it will take me <strong>3 years</strong> to modify the core of the import machinery to pass filenames as Unicode on Windows: <a class="reference external" href="https://bugs.python.org/issue3080">bpo-3080</a> <strong>Full unicode import system</strong>.</p> </div> <div class="section" id="add-pythonfsencoding-environment-variable"> <h2>Add PYTHONFSENCODING environment variable</h2> <p>May 2010, while discussing how to fix <a class="reference external" href="https://bugs.python.org/issue8610">bpo-8610</a> &quot;Python3/POSIX: errors if file system encoding is None&quot;, I asked what is the best encoding if reading the locale encoding fails. As a follow-up, <strong>Marc-Andre Lemburg</strong> created <a class="reference external" href="https://bugs.python.org/issue8622">bpo-8622</a>:</p> <blockquote> <p>As discussed on issue8610, we need a way to <strong>override the automatic detection of the file system encoding</strong> - for much the same reasons we also do for the I/O encoding: the detection mechanism isn't fail-safe.</p> <p>We should add a new environment variable with the same functionality as <tt class="docutils literal">PYTHONIOENCODING</tt>:</p> <pre class="literal-block"> PYTHONFSENCODING: Encoding[:errors] used for file system. </pre> </blockquote> <p>I implemented the idea since I liked it. August 2010, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/94908bbc1503df830d1d615e7b57744ae1b41079">commit 94908bbc</a>:</p> <blockquote> <p>Issue #8622: Add <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable to override the filesystem encoding.</p> <p><tt class="docutils literal">initfsencoding()</tt> displays also a better error message if <tt class="docutils literal">get_codeset()</tt> failed.</p> </blockquote> </div> <div class="section" id="remove-sys-setfilesystemencoding"> <h2>Remove sys.setfilesystemencoding()</h2> <p>August 2010, just after adding <tt class="docutils literal">PYTHONFSENCODING</tt>, I opened <a class="reference external" href="https://bugs.python.org/issue9632">bpo-9632</a> to remove the <tt class="docutils literal">sys.setfilesystemencoding()</tt> function:</p> <blockquote> <p>The <tt class="docutils literal">sys.setfilesystemencoding()</tt> function is <strong>dangerous</strong> because it introduces a lot of inconsistencies: this function is <strong>unable to reencode all filenames</strong> of all objects (eg. Python is unable to find filenames in user objects or 3rd party libraries). Eg. if you change the filesystem from utf8 to ascii, it will not be possible to use existing non-ascii (unicode) filenames: they will raise UnicodeEncodeError.</p> <p>As <tt class="docutils literal">sys.setdefaultencoding()</tt> in Python2, I think that <tt class="docutils literal">sys.setfilesystemencoding()</tt> is the <strong>root of evil</strong> :-) <strong>PYTHONFSENCODING</strong> (issue #8622) <strong>is the right solution</strong> to set the filesysteme encoding.</p> </blockquote> <p><strong>Marc-Andre Lemburg</strong> complained that applications embedding Python may want to set the encoding used by Python. I proposed to use the <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable as a workaround, even if it was not the best option.</p> <p>One month later, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/5b519e02016ea3a51f784dee70eead3be4ab1aff">commit 5b519e02</a>:</p> <blockquote> Issue #9632: Remove <tt class="docutils literal">sys.setfilesystemencoding()</tt> function: use <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable to set the filesystem encoding at Python startup. <tt class="docutils literal">sys.setfilesystemencoding()</tt> created inconsistencies because it was unable to reencode all filenames of all objects.</blockquote> </div> <div class="section" id="reencode-filenames-when-setting-the-filesystem-encoding"> <h2>Reencode filenames when setting the filesystem encoding</h2> <p>August 2010, I created <a class="reference external" href="https://bugs.python.org/issue9630">bpo-9630</a>: &quot;Reencode filenames when setting the filesystem encoding&quot;.</p> <p>Since the beginning of 2010, I identified a design flaw in the Python initialization. Python starts by <strong>decoding strings from the default encoding UTF-8</strong>. Later, Python reads the locale encoding and loads the Python codec of this encoding. Then Python <strong>decodes string from the locale encoding</strong>. Problem: if the locale encoding is not UTF-8, <strong>encoding strings decoded from UTF-8 to the locale encoding can fail</strong> in different ways.</p> <p>I wrote a patch to &quot;reencode&quot; filenames of all module and code objects once the filesystem encoding is set, in <tt class="docutils literal">initfsencoding()</tt>,</p> <p>When I wrote the patch, I knew that it was an <strong>ugly hack and not the proper design</strong>. I proposed to try to avoid importing any Python module before the Python codec of the locale encoding is loaded, but there was a pratical issue. Python only has builtin implementation (written in C) of the most popular encodings like ASCII and UTF-8. Some encodings like ISO-8859-15 are only implemented in Python.</p> <p>I also proposed to &quot;unload all modules, clear all caches and delete all code objects&quot; after setting the filesystem encoding. This option would be very inefficient and make Python startup slower, whereas Python 3 startup was already way slower than Python 2 startup.</p> <p>September 2010, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/c39211f51e377919952b139c46e295800cbc2a8d">commit c39211f5</a>:</p> <blockquote> <p>Issue #9630: Redecode filenames when setting the filesystem encoding</p> <p>Redecode the filenames of:</p> <blockquote> <ul class="simple"> <li>all modules: __file__ and __path__ attributes</li> <li>all code objects: co_filename attribute</li> <li>sys.path</li> <li>sys.meta_path</li> <li>sys.executable</li> <li>sys.path_importer_cache (keys)</li> </ul> </blockquote> <p>Keep weak references to all code objects until <tt class="docutils literal">initfsencoding()</tt> is called, to be able to redecode co_filename attribute of all code objects.</p> </blockquote> <p>The list of weak references to code objects really looks like a hack and I disliked it, but I failed to find a better way to fix Python startup.</p> </div> <div class="section" id="pythonfsencoding-dead-end"> <h2>PYTHONFSENCODING dead end</h2> <p>Even with my latest big and ugly &quot;redecode filenames when setting the filesystem encoding&quot; fix, there were <strong>issues when the filesystem encoding was different than the locale encoding</strong>. I identified 4 bugs:</p> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue9992">bpo-9992</a>, <tt class="docutils literal">sys.argv</tt>: decoded from the <strong>locale</strong> encoding, but subprocess encodes process arguments to the <strong>filesystem</strong> encoding</li> <li><a class="reference external" href="https://bugs.python.org/issue10014">bpo-10014</a>, <tt class="docutils literal">sys.path</tt>: decoded from the <strong>locale</strong> encoding, but import encodes paths to the <strong>filesystem</strong> encoding</li> <li><a class="reference external" href="https://bugs.python.org/issue10039">bpo-10039</a>, the script name: read on the command line (ex: <tt class="docutils literal">python script.py</tt>) which is decoded from the locale encoding, whereas it is used to fill <tt class="docutils literal">sys.path[0]</tt> and import encodes paths to the <strong>filesystem</strong> encoding.</li> <li><a class="reference external" href="https://bugs.python.org/issue9988">bpo-9988</a>, <tt class="docutils literal">PYTHONWARNINGS</tt> environment variable: decoded from the <strong>locale</strong> encoding, but <tt class="docutils literal">subprocess</tt> encodes environment variables to the <strong>filesystem</strong> encoding.</li> </ul> <p>October 2010, I wrote an email to the python-dev list: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2010-October/104509.html">Inconsistencies if locale and filesystem encodings are different</a>. I proposed two solutions:</p> <ul class="simple"> <li>(a) use the same encoding to encode and decode values (it can be different for each issue).</li> <li>(b) <strong>remove PYTHONFSENCODING variable</strong> and raise an error if locale and filesystem encodings are different (ensure that both encodings are the same).</li> </ul> <p><strong>Marc-Andre Lemburg</strong> <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2010-October/104511.html">replied</a>:</p> <blockquote> <p>You have to differentiate between the meaning of a file system encoding and the locale:</p> <p>A file system encoding defines how the applications interact with the file system.</p> <p>A locale defines how the user expects to interact with the application.</p> <p>It is well possible that the two are different. Mac OS X is just one example. Another common example is having a Unix account using the C locale (=ASCII) while working on a UTF-8 file system.</p> </blockquote> <p>This email is a good example of dilemma we had when having to choose <strong>one</strong> encoding. There is a big temptation to use multiple encodings, but at the end, <strong>data are not isolated</strong>. A filename can be found in command line arguments (<tt class="docutils literal">python3 script.py file.txt</tt>), in environment variables (<tt class="docutils literal">LOG_FILE=log.txt</tt>), in file content (ex: <tt class="docutils literal">Makefile</tt> or a configuration file), etc. Using multiple encodings does not work in practice.</p> <img alt="Dead end" src="https://vstinner.github.io/images/dead_end.jpg" /> </div> <div class="section" id="remove-pythonfsencoding"> <h2>Remove PYTHONFSENCODING</h2> <p>September 2010, I reported <a class="reference external" href="https://bugs.python.org/issue9992">bpo-9992</a>: Command-line arguments are not correctly decoded if locale and fileystem encodings are different.</p> <p>I proposed a patch to use the <strong>locale encoding</strong> to decode and encode command line arguments, rather than using the <strong>filesystem encoding</strong>.</p> <p><strong>Martin v. Löwis</strong> proposed to use the <strong>locale encoding</strong> for the command line arguments, environment variables and all filenames. <a class="reference external" href="https://bugs.python.org/issue9992#msg118352">My summary</a>:</p> <blockquote> <p>You mean that we should use the following encoding:</p> <ul class="simple"> <li>Mac OS X: UTF-8</li> <li>Windows: unicode for command line/env, mbcs to decode filenames</li> <li>others OSes: <strong>locale encoding</strong></li> </ul> <p>To do that, we have to:</p> <ul class="simple"> <li>&quot;others OSes&quot;: <strong>delete the PYTHONFSENCODING variable</strong></li> <li>Mac OS X: use UTF-8 to decode the command line arguments (we can use <tt class="docutils literal">PyUnicode_DecodeUTF8()</tt> + <tt class="docutils literal">PyUnicode_AsWideCharString()</tt> before Python is initialized)</li> </ul> </blockquote> <p>October 2010, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/8f6b6b0cc3febd15e33a96bd31dcb3cbef2ad1ac">commit 8f6b6b0c</a>:</p> <blockquote> Issue #9992: Remove PYTHONFSENCODING environment variable.</blockquote> <p>Two days later, I pushed an important change to <strong>use the locale encoding</strong> and remove the ugly <tt class="docutils literal">redecode_filenames()</tt> hack, <a class="reference external" href="https://github.com/python/cpython/commit/f3170ccef8809e4a3f82fe9f82dc7a4a486c28c1">commit f3170cce</a>:</p> <blockquote> <p>Use locale encoding if <tt class="docutils literal">Py_FileSystemDefaultEncoding</tt> is not set</p> <ul class="simple"> <li><tt class="docutils literal">PyUnicode_EncodeFSDefault()</tt>, <tt class="docutils literal">PyUnicode_DecodeFSDefaultAndSize()</tt> and <tt class="docutils literal">PyUnicode_DecodeFSDefault()</tt> use the locale encoding instead of UTF-8 if <tt class="docutils literal">Py_FileSystemDefaultEncoding</tt> is <tt class="docutils literal">NULL</tt></li> <li><tt class="docutils literal">redecode_filenames()</tt> functions and <tt class="docutils literal">_Py_code_object_list</tt> (issue #9630) are no more needed: remove them</li> </ul> </blockquote> <p>This change has been made possible by enhancements of <tt class="docutils literal">PyUnicode_EncodeFSDefault()</tt> and <tt class="docutils literal">PyUnicode_DecodeFSDefaultAndSize()</tt>. Previously, <strong>these functions used UTF-8</strong> before the filesystem was set. With my change, these functions <strong>now use the C implementation of the locale encoding</strong>: use <tt class="docutils literal">mbstowcs()</tt> to decode and <tt class="docutils literal">wcstombs()</tt> to encode. In practice, the code is more complex because Python uses the <tt class="docutils literal">surrogateescape</tt> error handler.</p> <p>Using the C implementation of the locale encoding fixed a lot of &quot;bootstrap&quot; issues of the Python initialization. It works because <strong>the Python codec of the locale encoding is 100% compatible with the C implementation</strong> of the locale codec.</p> </div> <div class="section" id="encodings-used-by-python-3-2"> <h2>Encodings used by Python 3.2</h2> <p>February 2011, Python 3.2 has been released. Summary of the used filesystem encodings:</p> <ul class="simple"> <li><strong>ANSI code page</strong> on Windows;</li> <li><strong>UTF-8</strong> on macOS;</li> <li><strong>locale encoding</strong> on other platforms.</li> </ul> <p>Note: UTF-8 is used if the <tt class="docutils literal">nl_langinfo(CODESET)</tt> function is not available.</p> </div> <div class="section" id="force-ascii-encoding-on-freebsd-and-solaris"> <h2>Force ASCII encoding on FreeBSD and Solaris</h2> <p>November 2012, I created <a class="reference external" href="https://bugs.python.org/issue16455">bpo-16455</a>:</p> <blockquote> <p>On FreeBSD and OpenIndiana, <tt class="docutils literal">sys.getfilesystemencoding()</tt> returns <tt class="docutils literal">'ascii'</tt> when the locale is not set, whereas the locale encoding is <tt class="docutils literal"><span class="pre">ISO-8859-1</span></tt> in practice.</p> <p>This inconsistency causes different issues.</p> </blockquote> <p>December 2012, I pushed the <a class="reference external" href="https://github.com/python/cpython/commit/d45c7f8d74d30de0a558b10e04541b861428b7c1">commit d45c7f8d</a>:</p> <blockquote> Issue #16455: On FreeBSD and Solaris, if the locale is C, the ASCII/surrogateescape codec is now used, instead of the locale encoding, to decode the command line arguments. This change fixes inconsistencies with os.fsencode() and os.fsdecode() because these operating systems announces an ASCII locale encoding, whereas the ISO-8859-1 encoding is used in practice.</blockquote> <p>Extract of the main comment:</p> <blockquote> <p>Workaround FreeBSD and OpenIndiana locale encoding issue with the C locale. On these operating systems, <strong>nl_langinfo(CODESET) announces an alias of the ASCII encoding, whereas mbstowcs() and wcstombs() functions use the ISO-8859-1 encoding</strong>. The problem is that os.fsencode() and <tt class="docutils literal">os.fsdecode()</tt> use <tt class="docutils literal">locale.getpreferredencoding()</tt> codec. For example, if command line arguments are decoded by <tt class="docutils literal">mbstowcs()</tt> and encoded back by <tt class="docutils literal">os.fsencode()</tt>, we get a <tt class="docutils literal">UnicodeEncodeError</tt> instead of retrieving the original byte string.</p> <p>The workaround is enabled if <tt class="docutils literal">setlocale(LC_CTYPE, NULL)</tt> returns <tt class="docutils literal">&quot;C&quot;</tt>, <tt class="docutils literal">nl_langinfo(CODESET)</tt> announces <tt class="docutils literal">&quot;ascii&quot;</tt> (or an alias to ASCII), and at least one byte in range 0x80-0xff can be decoded from the locale encoding. The workaround is also enabled on error, for example if getting the locale failed.</p> </blockquote> <p>Python 3.4 will be the first major release getting fix (March 2014), but I also backported the change to Python 3.2 and 3.3 branches.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p><strong>It took 6 years</strong> to fix Python to use the best Python filesystem encoding.</p> <p>Python 3.0 mostly uses UTF-8 everywhere, but it was not a deliberate choice and it caused many issues when the locale encoding was not UTF-8. Python 3.1 got the <tt class="docutils literal">surrogateescape</tt> error handler (PEP 383) which reduced Unicode errors.</p> <p>October 2008, <strong>Martin v. Löwis</strong> added <tt class="docutils literal">sys.setfilesystemencoding()</tt> to Python 3.0.</p> <p>August 2010, I added a new <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable, <strong>Marc-Andre Lemburg</strong>'s idea.</p> <p>September 2010, I removed the <tt class="docutils literal">sys.setfilesystemencoding()</tt> function because it creates mojibake by design. I also pushed an ugly change to reencode filenames to fix many <tt class="docutils literal">PYTHONFSENCODING</tt> bugs.</p> <p>October 2010, I fixed all tests when Python lives in a non-ASCII directory: first milestone of supporting locale encodings different than UTF-8. I also removed the <tt class="docutils literal">PYTHONFSENCODING</tt> environment variable after a long discussion. Moreover, I pushed the most important Python 3.2 change: <strong>Python now uses the locale encoding as the filesystem encoding</strong>. This change fixed many issues.</p> <p>December 2012, I forced the filesystem encoding to ASCII on FreeBSD and Solaris when the announced locale encoding is wrong.</p> </div> Python 3.1 surrogateescape error handler (PEP 383)2018-03-15T18:00:00+01:002018-03-15T18:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-15:/pep-383.html<p>In my previous article, I wrote that <tt class="docutils literal">os.listdir(str)</tt> ignored silently undecodable filenames in Python 3.0 and that lying on the real content of a directory looks like a very bad idea.</p> <p><strong>Martin v. Löwis</strong> found a very smart solution to this problem: the <tt class="docutils literal">surrogateescape</tt> error handler.</p> <p><strong>This …</strong></p><p>In my previous article, I wrote that <tt class="docutils literal">os.listdir(str)</tt> ignored silently undecodable filenames in Python 3.0 and that lying on the real content of a directory looks like a very bad idea.</p> <p><strong>Martin v. Löwis</strong> found a very smart solution to this problem: the <tt class="docutils literal">surrogateescape</tt> error handler.</p> <p><strong>This article is the second in a series of articles telling the history and rationale of the Python 3 Unicode model for the operating system:</strong></p> <ul class="simple"> <li><ol class="first arabic"> <li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li> </ol> </li> <li><ol class="first arabic" start="2"> <li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li> </ol> </li> <li><ol class="first arabic" start="3"> <li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li> </ol> </li> <li><ol class="first arabic" start="4"> <li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li> </ol> </li> <li><ol class="first arabic" start="5"> <li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li> </ol> </li> <li><ol class="first arabic" start="6"> <li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li> </ol> </li> </ul> <div class="section" id="first-attempt-to-propose-the-solution"> <h2>First attempt to propose the solution</h2> <p>September 2008, <a class="reference external" href="https://bugs.python.org/issue3187">bpo-3187</a>: While solutions to fix <tt class="docutils literal">os.listdir(str)</tt> were discussed, <strong>Martin v. Löwis</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg73992">proposed a different approach</a>:</p> <blockquote> <p>I'd like to propose yet another approach: make sure that <strong>conversion</strong> according to the file system encoding <strong>always succeeds</strong>. <strong>If an unconvertable byte is detected, map it into some private-use character.</strong> To reduce the chance of conflict with other people's private-use characters, we can use some of the plane 15 private-use characters, e.g. map byte 0xPQ to U+F30PQ (in two-byte Unicode mode, this would result in a surrogate pair).</p> <p>This would make all file names accessible to all text processing (including glob and friends); UI display would typically either report an encoding error, or arrange for some replacement glyph to be shown.</p> <p>There are certain variations of the approach possible, in case there is objection to a specific detail.</p> </blockquote> <p>He amended this proposal:</p> <blockquote> <p><strong>James Knight</strong> points out that UTF-8b can be used to give unambiguous round-tripping of characters in a UTF-8 locale. So I would like to amend my previous proposal:</p> <ul class="simple"> <li>for a non-UTF-8 encoding, use private-use characters for roundtripping</li> <li>if the locale's charset is UTF-8, use UTF-8b as the file system encoding.</li> </ul> </blockquote> <p><strong>But Martin's smart idea was lost</strong> in the middle of long discussion.</p> <a class="reference external image-reference" href="https://github.com/loewis"> <img alt="Martin v. Löwis" src="https://vstinner.github.io/images/martin_von_loewis.jpg" /> </a> </div> <div class="section" id="pep-383"> <h2>PEP 383</h2> <p>April 2009, Martin v. Löwis proposed again his idea, now as the well defined <a class="reference external" href="https://peps.python.org/pep-0383">PEP 383</a>: <strong>Non-decodable Bytes in System Character Interfaces</strong>. He <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2009-April/088919.html">posted his PEP to python-dev</a> for comments.</p> <p>Abstract:</p> <blockquote> <p>File names, environment variables, and command line arguments are defined as being character data in POSIX; the C APIs however allow passing arbitrary bytes - whether these conform to a certain encoding or not.</p> <p><strong>This PEP proposes a means of dealing with such irregularities by embedding the bytes in character strings in such a way that allows recreation of the original byte string.</strong></p> </blockquote> <p>The <tt class="docutils literal">surrogateescape</tt> encoding is based on <strong>Markus Kuhn</strong>'s idea that he called <strong>UTF-8b</strong>. Undecodable bytes in range <tt class="docutils literal"><span class="pre">0x80-0xff</span></tt> are mapped as Unicode surrogate characters: range <tt class="docutils literal">U+DC80</tt> - <tt class="docutils literal">U+DCFF</tt>.</p> <p>Example:</p> <pre class="literal-block"> &gt;&gt;&gt; b'nonascii\xff'.decode('ascii') UnicodeDecodeError: 'ascii' codec can't decode byte 0xff (...) &gt;&gt;&gt; b'nonascii\xff'.decode('ascii', 'surrogateescape') 'nonascii\udcff' &gt;&gt;&gt; 'nonascii\udcff'.encode('ascii', 'surrogateescape') b'nonascii\xff' </pre> <p>Using the <tt class="docutils literal">surrogateescape</tt> error handler, <strong>decoding cannot fail</strong>. For example, <tt class="docutils literal">os.listdir(str)</tt> no longer ignores silently undecodable filenames, since all filenames became decodable with any encoding. Moreover, encoding filenames with <tt class="docutils literal">surrogateescape</tt> returns the original bytes unchanged.</p> <p><a class="reference external" href="https://mail.python.org/pipermail/python-dev/2009-April/089278.html">The PEP was accepted</a> by <strong>Guido van Rossum</strong> in less than one week!</p> </div> <div class="section" id="implementation"> <h2>Implementation</h2> <p>May 2009, Martin v. Löwis opened the <a class="reference external" href="https://bugs.python.org/issue5915">bpo-5915</a> to get a review on his implementation.</p> <p>Two days later, after <strong>Benjamin Peterson</strong> and <strong>Antoine Pitrou</strong> reviews, Martin pushed the <a class="reference external" href="https://github.com/python/cpython/commit/011e8420339245f9b55d41082ec6036f2f83a182">commit 011e8420</a>:</p> <blockquote> Issue #5915: Implement PEP 383, Non-decodable Bytes in System Character Interfaces.</blockquote> <p>Five days later, Martin renamed his &quot;utf8b&quot; error handler to its final name <strong>surrogateescape</strong>, <a class="reference external" href="https://github.com/python/cpython/commit/43c57785d3319249c03c3fa46c9df42a8ccd3e52">commit 43c57785</a>:</p> <blockquote> Rename utf8b error handler to surrogateescape.</blockquote> <p><strong>Python 3.1</strong> will be the first release getting the <tt class="docutils literal">surrogateescape</tt> error handler.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>In Python 3.0, <tt class="docutils literal">os.listdir(str)</tt> ignored silently undecodable filenames which was not ideal.</p> <p><strong>Martin v. Löwis</strong> proposed to apply <strong>Markus Kuhn</strong>'s idea called <strong>UTF-8b</strong> in Python as a new <tt class="docutils literal">surrogateescape</tt> error handler.</p> <p>Martin's PEP was approved in less than one week and implemented a few days later.</p> <p>Using the <tt class="docutils literal">surrogateescape</tt> error handler, decoding cannot fail: <tt class="docutils literal">os.listdir(str)</tt> no longer ignores silently undecodable filenames. Moreover, encoding filenames with <tt class="docutils literal">surrogateescape</tt> returns the original bytes unchanged.</p> <p>The <tt class="docutils literal">surrogateescape</tt> error handler fixed a lot of old and very complex Unicode issues on Unix. It is still widely used in Python 3.6 to <strong>not annoy users with Unicode errors</strong>.</p> </div> Python 3.0 listdir() Bug on Undecodable Filenames2018-03-09T13:00:00+01:002018-03-09T13:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-09:/python30-listdir-undecodable-filenames.html<p>Ten years ago, when Python 3.0 final was released, <tt class="docutils literal">os.listdir(str)</tt> <strong>ignored silently undecodable filenames</strong>:</p> <pre class="literal-block"> $ python3.0 &gt;&gt;&gt; os.mkdir(b'x') &gt;&gt;&gt; open(b'x/nonascii\xff', 'w').close() &gt;&gt;&gt; os.listdir('x') [] </pre> <p>You had to use bytes to see all filenames:</p> <pre class="literal-block"> &gt;&gt;&gt; os.listdir(b'x') [b'nonascii\xff'] </pre> <p>If the locale is POSIX …</p><p>Ten years ago, when Python 3.0 final was released, <tt class="docutils literal">os.listdir(str)</tt> <strong>ignored silently undecodable filenames</strong>:</p> <pre class="literal-block"> $ python3.0 &gt;&gt;&gt; os.mkdir(b'x') &gt;&gt;&gt; open(b'x/nonascii\xff', 'w').close() &gt;&gt;&gt; os.listdir('x') [] </pre> <p>You had to use bytes to see all filenames:</p> <pre class="literal-block"> &gt;&gt;&gt; os.listdir(b'x') [b'nonascii\xff'] </pre> <p>If the locale is POSIX or C, listdir() ignored silently all non-ASCII filenames. Hopefully, <tt class="docutils literal">os.listdir()</tt> accepts <tt class="docutils literal">bytes</tt>, right? In fact, 4 months before the 3.0 final release, it was not the case.</p> <p>Lying on the real content of a directory looks like a very bad idea. Well, there is a rationale behind this design. Let me tell you this story which is now 10 years old.</p> <p><strong>This article is the first in a series of articles telling the history and rationale of the Python 3 Unicode model for the operating system:</strong></p> <ul class="simple"> <li><ol class="first arabic"> <li><a class="reference external" href="https://vstinner.github.io/python30-listdir-undecodable-filenames.html">Python 3.0 listdir() Bug on Undecodable Filenames</a></li> </ol> </li> <li><ol class="first arabic" start="2"> <li><a class="reference external" href="https://vstinner.github.io/pep-383.html">Python 3.1 surrogateescape error handler (PEP 383)</a></li> </ol> </li> <li><ol class="first arabic" start="3"> <li><a class="reference external" href="https://vstinner.github.io/painful-history-python-filesystem-encoding.html">Python 3.2 Painful History of the Filesystem Encoding</a></li> </ol> </li> <li><ol class="first arabic" start="4"> <li><a class="reference external" href="https://vstinner.github.io/python36-utf8-windows.html">Python 3.6 now uses UTF-8 on Windows</a></li> </ol> </li> <li><ol class="first arabic" start="5"> <li><a class="reference external" href="https://vstinner.github.io/posix-locale.html">Python 3.7 and the POSIX locale</a></li> </ol> </li> <li><ol class="first arabic" start="6"> <li><a class="reference external" href="https://vstinner.github.io/python37-new-utf8-mode.html">Python 3.7 UTF-8 Mode</a></li> </ol> </li> </ul> <div class="section" id="the-os-walk-bug"> <h2>The os.walk() bug</h2> <a class="reference external image-reference" href="http://www.dailymail.co.uk/news/article-3592525/Classic-crashes-Incredible-black-white-photos-chaos-roads-early-days-automobile-beautiful-vintage-motors-smashing-trees-careering-canals-plummeting-bridges.html"> <img alt="Boston Herald-Traveler photographer Leslie Jones had an eye for a dramatic scene, including when this seven-tonne dump truck plunged through the Warren Avenue bridge, in Boston" src="https://vstinner.github.io/images/car_accident_hole.jpg" /> </a> <p><a class="reference external" href="https://bugs.python.org/issue3187">bpo-3187</a>, june 2008: <strong>Helmut Jarausch</strong> tested the <strong>first beta release of Python 3.0</strong> and reported a bug on <tt class="docutils literal">os.walk()</tt> when he tried to walk into his home directory:</p> <pre class="literal-block"> Traceback (most recent call last): File &quot;WalkBug.py&quot;, line 5, in &lt;module&gt; for Dir, SubDirs, Files in os.walk('/home/jarausch') : File &quot;/usr/local/lib/python3.0/os.py&quot;, line 278, in walk for x in walk(path, topdown, onerror, followlinks): File &quot;/usr/local/lib/python3.0/os.py&quot;, line 268, in walk if isdir(join(top, name)): File &quot;/usr/local/lib/python3.0/posixpath.py&quot;, line 64, in join if b.startswith('/'): TypeError: expected an object with the buffer interface </pre> <p>In Python 3.0b1, <tt class="docutils literal">os.listdir(str)</tt> returned undecodable filenames as <tt class="docutils literal">bytes</tt>. The caller must be prepared to get filenames as two types: <tt class="docutils literal">str</tt> and <tt class="docutils literal">bytes</tt>: it wasn't the case for <tt class="docutils literal">os.walk()</tt> which failed with a <tt class="docutils literal">TypeError</tt>.</p> <p><strong>At the first look, the bug seems trivial to fix. In fact, many solutions were proposed, it will take 4 months and 79 messages to fix the bug</strong>.</p> </div> <div class="section" id="i-proposed-a-new-filename-class"> <h2>I proposed a new Filename class</h2> <p>August 2008, <a class="reference external" href="https://bugs.python.org/issue3187#msg71612">my first comment proposed</a> to use a custom &quot;Filename&quot; type to store the original <tt class="docutils literal">bytes</tt> filename, but also gives a Unicode view of the filename, in a single object, using an hypothetical <tt class="docutils literal">myformat()</tt> function:</p> <pre class="literal-block"> class Filename: def __init__(self, orig): self.as_bytes = orig self.as_str = myformat(orig) def __str__(self): return self.as_str def __bytes__(self): return self.as_bytes </pre> <p><strong>Antoine Pitrou</strong> suggested to inherit from <tt class="docutils literal">str</tt>:</p> <blockquote> I agree that logically it's the right solution. It's also the most invasive. If that class is <strong>made a subclass of str</strong>, however, existing code shouldn't break more than it currently does.</blockquote> <p>I preferred to inherit from <tt class="docutils literal">bytes</tt> for pratical reasons. Antoine noted that the native type for filenames on Windows is <tt class="docutils literal">str</tt>, and so inheriting from <tt class="docutils literal">bytes</tt> can be an issue on Windows.</p> <p>Anyway, <a class="reference external" href="https://bugs.python.org/issue3187#msg71749">Guido van Rossum disliked the idea</a> (comment on InvalidFilename, a variant of the class):</p> <blockquote> I'm not interested in the InvalidFilename class; it's an API complification that might seem right for your situation but <strong>will hinder most other people</strong>.</blockquote> </div> <div class="section" id="guido-van-rossum-proposed-to-use-replace-error-handler"> <h2>Guido van Rossum proposed to use replace error handler</h2> <p><strong>Guido van Rossum</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg71655">proposed to use the replace error handler</a> to prevent decoding error. For example, <tt class="docutils literal">b'nonascii\xff'</tt> is decoded as <tt class="docutils literal">'nonascii�'</tt>.</p> <p>The problem is that this filename cannot be used to read the file content using <tt class="docutils literal">open()</tt> or to remove the file using <tt class="docutils literal">os.unlink()</tt>, since the operating system doesn't know the Unicode filename containing the &quot;�&quot; character.</p> <p>An important property is that <strong>encoding back the Unicode filename to bytes must return the same original bytes filename</strong>.</p> </div> <div class="section" id="defer-the-choice-to-the-caller-pass-a-callback"> <h2>Defer the choice to the caller: pass a callback</h2> <p>As no obvious choice arised, <a class="reference external" href="https://bugs.python.org/issue3187#msg71680">I proposed to use a callback to handle undecodable filenames</a>. Pseudo-code:</p> <pre class="literal-block"> def listdir(path, fallback_decoder=default_fallback_decoder): charset = sys.getfilesystemcharset() dir_fd = opendir(path) try: for bytesname in readdir(dir_fd): try: name = str(bytesname, charset) exept UnicodeDecodeError: name = fallback_decoder(bytesname) yield name finally: closedir(dir_fd) </pre> <p>The default behaviour is to raise an exception on decoding error:</p> <pre class="literal-block"> def default_fallback_decoder(name): raise </pre> <p>Example of callback returning the raw bytes string unchanged (Python 3.0 beta1 behaviour):</p> <pre class="literal-block"> def return_undecodable_unchanged(name): return name </pre> <p>Example to use a custom filename class:</p> <pre class="literal-block"> class Filename: ... def filename_decoder(name): return Filename(name) </pre> <p><a class="reference external" href="https://bugs.python.org/issue3187#msg71699">Guido also disliked my callback idea</a>:</p> <blockquote> The callback variant is <strong>too complex</strong>; you could <strong>write it yourself by using os.listdir() with a bytes argument</strong>.</blockquote> </div> <div class="section" id="emit-a-warning-on-undecodable-filename"> <h2>Emit a warning on undecodable filename</h2> <a class="reference external image-reference" href="http://www.unicode.org/"> <img alt="Warning: venoumous snakes" src="https://vstinner.github.io/images/warning_venomous_snakes.png" /> </a> <p>As ignoring undecodable filenames in <tt class="docutils literal">os.listdir(str)</tt> slowly became the most popular option, <strong>Benjamin Peterson</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg71700">proposed to emit a warning</a> if a filename cannot be decoded, to ease debugging:</p> <blockquote> (...) I don't like the idea of silently losing the contents of a directory. That's asking for difficult to discover bugs. Could Python emit a warning in this case?</blockquote> <p>Guido van Rossum <a class="reference external" href="https://bugs.python.org/issue3187#msg71705">liked the idea</a>:</p> <blockquote> This may be the best compromise yet.</blockquote> <p><strong>Amaury Forgeot d'Arc</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg73535">asked</a>:</p> <blockquote> Does the warning warn multiple times? IIRC the default behaviour is to warn once.</blockquote> <p><strong>Benjamin Peterson</strong> <a class="reference external" href="https://bugs.python.org/issue3187#msg73535">replied</a>:</p> <blockquote> <strong>Making a warning happen more than once is tricky because it requires messing with the warnings filter.</strong> This of course takes away some of the user's control which is one of the main reasons for using the Python warning system in the first place.</blockquote> <p>Because of this issue, the warning idea was abandonned.</p> </div> <div class="section" id="support-bytes-and-fix-os-listdir"> <h2>Support bytes and fix os.listdir()</h2> <p>Guido repeated that the best workaround is to pass filenames as <tt class="docutils literal">bytes</tt>, which is the native type for filenames on Unix, but most functions only accepted filenames as <tt class="docutils literal">str</tt>.</p> <p>I started to write multiple patches to support passing filenames as <tt class="docutils literal">bytes</tt>:</p> <ul class="simple"> <li><tt class="docutils literal">posix_path_bytes.patch</tt>: enhance <tt class="docutils literal">posixpath.join()</tt></li> <li><tt class="docutils literal">io_byte_filename.patch</tt>: enhance <tt class="docutils literal">open()</tt></li> <li><tt class="docutils literal">fnmatch_bytes.patch</tt>: enhance <tt class="docutils literal">fnmatch.filter()</tt></li> <li><tt class="docutils literal">glob1_bytes.patch</tt>: enhance <tt class="docutils literal">glob.glob()</tt></li> <li><tt class="docutils literal">getcwd_bytes.patch</tt>: <tt class="docutils literal">os.getcwd()</tt> returns bytes if unicode conversion fails</li> <li><tt class="docutils literal">merge_os_getcwd_getcwdu.patch</tt>: Remove <tt class="docutils literal">os.getcwdu()</tt>; <tt class="docutils literal">os.getcwd(bytes=True)</tt> returns bytes</li> <li><tt class="docutils literal">os_getcwdb.patch</tt>: Fix <tt class="docutils literal">os.getcwd()</tt> by using <tt class="docutils literal">PyUnicode_Decode()</tt> and add <tt class="docutils literal">os.getcwdb()</tt> which returns <tt class="docutils literal">bytes</tt></li> </ul> <p>Guido van Rossum created a <a class="reference external" href="https://codereview.appspot.com/3055">review on my combined patches</a>. Then I also combined my patches into a single <tt class="docutils literal">python3_bytes_filename.patch</tt> file.</p> <p><strong>After one month of development, 6 versions of the combined patch, Guido commited my big change</strong> as the <a class="reference external" href="https://github.com/python/cpython/commit/f0af3e30db9475ab68bcb1f1ce0b5581e214df76">commit f0af3e30</a>:</p> <pre class="literal-block"> commit f0af3e30db9475ab68bcb1f1ce0b5581e214df76 Author: Guido van Rossum &lt;guido&#64;python.org&gt; Date: Thu Oct 2 18:55:37 2008 +0000 Issue #3187: Better support for &quot;undecodable&quot; filenames. Code by Victor Stinner, with small tweaks by GvR. Lib/fnmatch.py | 27 ++++--- Lib/genericpath.py | 5 +- Lib/glob.py | 17 +++-- Lib/io.py | 15 ++-- Lib/posixpath.py | 171 +++++++++++++++++++++++++++++++----------- Lib/test/test_fnmatch.py | 9 +++ Lib/test/test_posix.py | 2 +- Lib/test/test_posixpath.py | 150 ++++++++++++++++++++++++++++++++---- Lib/test/test_unicode_file.py | 6 +- Misc/NEWS | 10 ++- Modules/posixmodule.c | 90 +++++++++------------- 11 files changed, 358 insertions(+), 144 deletions(-) </pre> <p>My change:</p> <ul class="simple"> <li>Modify <tt class="docutils literal">os.listdir(str)</tt> to <strong>ignore silently undecodable filenames</strong>, instead of returning them as <tt class="docutils literal">bytes</tt></li> <li>Add <tt class="docutils literal">os.getcwdb()</tt> function: similar to <tt class="docutils literal">os.getcwd()</tt> but returns the current working directory as <tt class="docutils literal">bytes</tt>.</li> <li>Support <tt class="docutils literal">bytes</tt> paths:<ul> <li><tt class="docutils literal">fnmatch.filter()</tt></li> <li><tt class="docutils literal">glob.glob1()</tt></li> <li><tt class="docutils literal">glob.iglob()</tt></li> <li><tt class="docutils literal">open()</tt></li> <li><tt class="docutils literal">os.path.isabs()</tt></li> <li><tt class="docutils literal">os.path.issep()</tt></li> <li><tt class="docutils literal">os.path.join()</tt></li> <li><tt class="docutils literal">os.path.split()</tt></li> <li><tt class="docutils literal">os.path.splitext()</tt></li> <li><tt class="docutils literal">os.path.basename()</tt></li> <li><tt class="docutils literal">os.path.dirname()</tt></li> <li><tt class="docutils literal">os.path.splitdrive()</tt></li> <li><tt class="docutils literal">os.path.ismount()</tt></li> <li><tt class="docutils literal">os.path.expanduser()</tt></li> <li><tt class="docutils literal">os.path.expandvars()</tt></li> <li><tt class="docutils literal">os.path.normpath()</tt></li> <li><tt class="docutils literal">os.path.abspath()</tt></li> <li><tt class="docutils literal">os.path.realpath()</tt></li> </ul> </li> </ul> </div> <div class="section" id="more-bytes-patches"> <h2>More bytes patches</h2> <p>I looked if other functions accepted passing filenames as <tt class="docutils literal">bytes</tt> and... I was disappointed. It took me some years to fix the full Python standard library. Example of issues between 2008 and 2010:</p> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue4035">bpo-4035</a>: Support bytes in <tt class="docutils literal"><span class="pre">os.exec*()</span></tt></li> <li><a class="reference external" href="https://bugs.python.org/issue4036">bpo-4036</a>: Support bytes in <tt class="docutils literal">subprocess.Popen()</tt></li> <li><a class="reference external" href="https://bugs.python.org/issue8513">bpo-8513</a>: <tt class="docutils literal">subprocess</tt>: support bytes program name (POSIX)</li> <li><a class="reference external" href="https://bugs.python.org/issue8514">bpo-8514</a>: Add <tt class="docutils literal">fsencode()</tt> functions to os module</li> <li><a class="reference external" href="https://bugs.python.org/issue8603">bpo-8603</a>: Create a bytes version of <tt class="docutils literal">os.environ</tt> and <tt class="docutils literal">getenvb()</tt> -- Add <tt class="docutils literal">os.environb</tt></li> <li><a class="reference external" href="https://bugs.python.org/issue8412">bpo-8412</a>: <tt class="docutils literal">os.system()</tt> doesn't support surrogates nor bytes</li> <li><a class="reference external" href="https://bugs.python.org/issue8468">bpo-8468</a>: <tt class="docutils literal">bz2</tt> module: support surrogates in filename, and bytes/bytearray filename</li> <li><a class="reference external" href="https://bugs.python.org/issue8477">bpo-8477</a>: <tt class="docutils literal">ssl</tt> module: support surrogates in filenames, and bytes/bytearray filenames</li> <li><a class="reference external" href="https://bugs.python.org/issue8640">bpo-8640</a>: <tt class="docutils literal">subprocess:</tt> canonicalize env to bytes on Unix (Python3)</li> <li><a class="reference external" href="https://bugs.python.org/issue8776">bpo-8776</a>: Bytes version of <tt class="docutils literal">sys.argv</tt> (REJECTED)</li> </ul> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>At the first look, <strong>Helmut Jarausch</strong>'s <tt class="docutils literal">os.walk()</tt> bug looked trivial to fix.</p> <p>I proposed a <strong>new Filename class</strong> storing filenames as <tt class="docutils literal">bytes</tt> and <tt class="docutils literal">str</tt>, but Guido van Rossum rejected the idea because this API complification would <em>hinder most people</em>.</p> <p>Guido van Rossum proposed to <strong>use the replace error handler</strong>, but decoded filenames were not recognized by the operating system making them useless for most cases.</p> <p>I proposed to <strong>use callback to handle undecodable filenames</strong>, but Guido van Rossum also rejected this idea because it was too complex and could be written using os.listdir() with a bytes argument.</p> <p>Benjamin Peterson proposed to <strong>emit a warning</strong> when a filename cannot be decoded, but the idea was abandonned because of the warnings filters complexity to emit the warning multiple times.</p> <p>I wrote a big change modifying <tt class="docutils literal">os.listdir()</tt> to ignore silently undecodable filenames, but also modify a lot of functions to also accept filenames as <tt class="docutils literal">bytes</tt>. I made further changes the following years to fix the full Python standard library to accept <tt class="docutils literal">bytes</tt>.</p> <p>While it &quot;only&quot; took 4 months to fix the <tt class="docutils literal">os.listdir(str)</tt> issue, <strong>this kind of bugs will keep me busy the next 10 years</strong> (2008-2018)...</p> <p><strong>This article is the first in a series of articles telling the history and rationale of the Python 3 Unicode model for the operating system.</strong></p> </div> How I fixed a very old GIL race condition in Python 3.72018-03-08T10:00:00+01:002018-03-08T10:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-08:/python37-gil-change.html<p><strong>It took me 4 years to fix a nasty bug in the famous Python GIL</strong> (Global Interpreter Lock), one of the most critical part of Python. I had to dig the Git history to find a <strong>change made 26 years ago</strong> by <strong>Guido van Rossum</strong>: at this time, <em>threads were …</em></p><p><strong>It took me 4 years to fix a nasty bug in the famous Python GIL</strong> (Global Interpreter Lock), one of the most critical part of Python. I had to dig the Git history to find a <strong>change made 26 years ago</strong> by <strong>Guido van Rossum</strong>: at this time, <em>threads were something esoteric</em>. Let me tell you my story.</p> <div class="section" id="fatal-python-error-caused-by-a-c-thread-and-the-gil"> <h2>Fatal Python error caused by a C thread and the GIL</h2> <p>In March 2014, <strong>Steve Dower</strong> reported the bug <a class="reference external" href="https://bugs.python.org/issue20891">bpo-20891</a> when a &quot;C thread&quot; uses the Python C API:</p> <blockquote> <p>In Python 3.4rc3, calling <tt class="docutils literal">PyGILState_Ensure()</tt> from a thread that was not created by Python and without any calls to <tt class="docutils literal">PyEval_InitThreads()</tt> will cause a fatal exit:</p> <p><tt class="docutils literal">Fatal Python error: take_gil: NULL tstate</tt></p> </blockquote> <p>My first comment:</p> <blockquote> IMO it's a bug in <tt class="docutils literal">PyEval_InitThreads()</tt>.</blockquote> <a class="reference external image-reference" href="https://twitter.com/kwinkunks/status/619496450834087938"> <img alt="Release the GIL!" src="https://vstinner.github.io/images/release_the_gil.png" /> </a> </div> <div class="section" id="pygilstate-ensure-fix"> <h2>PyGILState_Ensure() fix</h2> <p>I forgot the bug during 2 years. In March 2016, I modified Steve's test program to make it compatible with Linux (the test was written for Windows). I succeeded to reproduce the bug on my computer and I wrote a fix for <tt class="docutils literal">PyGILState_Ensure()</tt>.</p> <p>One year later, november 2017, <strong>Marcin Kasperski</strong> asked:</p> <blockquote> Is this fix released? I can't find it in the changelog…</blockquote> <p>Oops, again, I completely forgot this issue! This time, not only I <strong>applied my PyGILState_Ensure() fix</strong>, but I also wrote the <strong>unit test</strong> <tt class="docutils literal">test_embed.test_bpo20891()</tt>:</p> <blockquote> Ok, the bug is now fixed in Python 2.7, 3.6 and master (future 3.7). On 3.6 and master, the fix comes with an unit test.</blockquote> <p>My fix for the master branch, commit <a class="reference external" href="https://github.com/python/cpython/commit/b4d1e1f7c1af6ae33f0e371576c8bcafedb099db">b4d1e1f7</a>:</p> <pre class="literal-block"> bpo-20891: Fix PyGILState_Ensure() (#4650) When PyGILState_Ensure() is called in a non-Python thread before PyEval_InitThreads(), only call PyEval_InitThreads() after calling PyThreadState_New() to fix a crash. Add an unit test in test_embed. </pre> <p>And I closed the issue <a class="reference external" href="https://bugs.python.org/issue20891">bpo-20891</a>...</p> </div> <div class="section" id="random-crash-of-the-test-on-macos"> <h2>Random crash of the test on macOS</h2> <p>Everything was fine... but one week later, I noticed <strong>random</strong> crashes on macOS buildbots on my newly added unit test. I succeeded to reproduce the bug manually, example of crash at the 3rd run:</p> <pre class="literal-block"> macbook:master haypo$ while true; do ./Programs/_testembed bpo20891 ||break; date; done Lun 4 déc 2017 12:46:34 CET Lun 4 déc 2017 12:46:34 CET Lun 4 déc 2017 12:46:34 CET Fatal Python error: PyEval_SaveThread: NULL tstate Current thread 0x00007fffa5dff3c0 (most recent call first): Abort trap: 6 </pre> <p><tt class="docutils literal">test_embed.test_bpo20891()</tt> on macOS showed a race condition in <tt class="docutils literal">PyGILState_Ensure()</tt>: the creation of the GIL lock itself... was not protected by a lock! Adding a new lock to check if Python currently has the GIL lock doesn't make sense...</p> <p>I proposed an incomplete fix for <tt class="docutils literal">PyThread_start_new_thread()</tt>:</p> <blockquote> I found a working fix: call <tt class="docutils literal">PyEval_InitThreads()</tt> in <tt class="docutils literal">PyThread_start_new_thread()</tt>. So the GIL is created as soon as a second thread is spawned. The GIL cannot be created anymore while two threads are running. At least, with the <tt class="docutils literal">python</tt> binary. It doesn't fix the issue if a thread is not spawned by Python, but this thread calls <tt class="docutils literal">PyGILState_Ensure()</tt>.</blockquote> </div> <div class="section" id="why-not-always-create-the-gil"> <h2>Why not always create the GIL?</h2> <p><strong>Antoine Pitrou</strong> asked a simple question:</p> <blockquote> Why not <em>always</em> call <tt class="docutils literal">PyEval_InitThreads()</tt> at interpreter initialization? Are there any downsides?</blockquote> <p>Thanks to <tt class="docutils literal">git blame</tt> and <tt class="docutils literal">git log</tt>, I found the origin of the code creating the GIL &quot;on demand&quot;, <strong>a change made 26 years ago</strong>!</p> <pre class="literal-block"> commit 1984f1e1c6306d4e8073c28d2395638f80ea509b Author: Guido van Rossum &lt;guido&#64;python.org&gt; Date: Tue Aug 4 12:41:02 1992 +0000 * Makefile adapted to changes below. * split pythonmain.c in two: most stuff goes to pythonrun.c, in the library. * new optional built-in threadmodule.c, build upon Sjoerd's thread.{c,h}. * new module from Sjoerd: mmmodule.c (dynamically loaded). * new module from Sjoerd: sv (svgen.py, svmodule.c.proto). * new files thread.{c,h} (from Sjoerd). * new xxmodule.c (example only). * myselect.h: bzero -&gt; memset * select.c: bzero -&gt; memset; removed global variable (...) +void +init_save_thread() +{ +#ifdef USE_THREAD + if (interpreter_lock) + fatal(&quot;2nd call to init_save_thread&quot;); + interpreter_lock = allocate_lock(); + acquire_lock(interpreter_lock, 1); +#endif +} +#endif </pre> <p>My guess was that the intent of dynamically created GIL is to reduce the &quot;overhead&quot; of the GIL for applications only using a single Python thread (never spawn a new Python thread).</p> <p>Luckily, <strong>Guido van Rossum</strong> was around and was able to elaborate the rationale:</p> <blockquote> Yeah, the original reasoning was that <strong>threads were something esoteric and not used by most code</strong>, and at the time we definitely felt that <strong>always using the GIL would cause a (tiny) slowdown</strong> and <strong>increase the risk of crashes</strong> due to bugs in the GIL code. I'd be happy to learn that we no longer need to worry about this and <strong>can just always initialize it</strong>.</blockquote> </div> <div class="section" id="second-fix-for-py-initialize-proposed"> <h2>Second fix for Py_Initialize() proposed</h2> <p>I proposed a <strong>second fix</strong> for <tt class="docutils literal">Py_Initialize()</tt> to always create the GIL as soon as Python starts, and no longer &quot;on demand&quot;, to prevent any risk of a race condition:</p> <pre class="literal-block"> + /* Create the GIL */ + PyEval_InitThreads(); </pre> <p><strong>Nick Coghlan</strong> asked if I could you run my patch through the performance benchmarks. I ran <a class="reference external" href="http://pyperformance.readthedocs.io/">pyperformance</a> on my <a class="reference external" href="https://github.com/python/cpython/pull/4700/">PR 4700</a>. Differences of at least 5%:</p> <pre class="literal-block"> haypo&#64;speed-python$ python3 -m perf compare_to \ 2017-12-18_12-29-master-bd6ec4d79e85.json.gz \ 2017-12-18_12-29-master-bd6ec4d79e85-patch-4700.json.gz \ --table --min-speed=5 +----------------------+--------------------------------------+-------------------------------------------------+ | Benchmark | 2017-12-18_12-29-master-bd6ec4d79e85 | 2017-12-18_12-29-master-bd6ec4d79e85-patch-4700 | +======================+======================================+=================================================+ | pathlib | 41.8 ms | 44.3 ms: 1.06x slower (+6%) | +----------------------+--------------------------------------+-------------------------------------------------+ | scimark_monte_carlo | 197 ms | 210 ms: 1.07x slower (+7%) | +----------------------+--------------------------------------+-------------------------------------------------+ | spectral_norm | 243 ms | 269 ms: 1.11x slower (+11%) | +----------------------+--------------------------------------+-------------------------------------------------+ | sqlite_synth | 7.30 us | 8.13 us: 1.11x slower (+11%) | +----------------------+--------------------------------------+-------------------------------------------------+ | unpickle_pure_python | 707 us | 796 us: 1.13x slower (+13%) | +----------------------+--------------------------------------+-------------------------------------------------+ Not significant (55): 2to3; chameleon; chaos; (...) </pre> <p>Oh, 5 benchmarks were slower. Performance regressions are not welcome in Python: we are working hard on <a class="reference external" href="https://lwn.net/Articles/725114/">making Python faster</a>!</p> </div> <div class="section" id="skip-the-failing-test-before-christmas"> <h2>Skip the failing test before Christmas</h2> <p>I didn't expect that 5 benchmarks would be slower. It required further investigation, but I didn't have time for that and I was too shy or ashame to take the responsibility of pushing a performance regression.</p> <p>Before the christmas holiday, no decision was taken whereas <tt class="docutils literal">test_embed.test_bpo20891()</tt> was still failing randomly on macOS buildbots. I <strong>was not confortable to touch a critical part of Python</strong>, its GIL, just before leaving for two weeks. So I decided to skip <tt class="docutils literal">test_bpo20891()</tt> until I'm back.</p> <p>No gift for you, Python 3.7.</p> <a class="reference external image-reference" href="https://drawception.com/panel/drawing/0teL3336/charlie-brown-sad-about-small-christmas-tree/"> <img alt="Sad Christmas tree" src="https://vstinner.github.io/images/sad_christmas_tree.png" /> </a> </div> <div class="section" id="new-benchmark-run-and-second-fix-applied-to-master"> <h2>New benchmark run and second fix applied to master</h2> <p>At the end of january 2018, I ran again the 5 benchmarks made slower by my PR. I ran these benchmarks manually on my laptop using CPU isolation:</p> <pre class="literal-block"> vstinner&#64;apu$ python3 -m perf compare_to ref.json patch.json --table Not significant (5): unpickle_pure_python; sqlite_synth; spectral_norm; pathlib; scimark_monte_carlo </pre> <p>Ok, it confirms that my second fix has <strong>no significant impact on performances</strong> according to the <a class="reference external" href="http://pyperformance.readthedocs.io/">Python &quot;performance&quot; benchmark suite</a>.</p> <p>I decided to <strong>push my fix</strong> to the master branch, commit <a class="reference external" href="https://github.com/python/cpython/commit/2914bb32e2adf8dff77c0ca58b33201bc94e398c">2914bb32</a>:</p> <pre class="literal-block"> bpo-20891: Py_Initialize() now creates the GIL (#4700) The GIL is no longer created &quot;on demand&quot; to fix a race condition when PyGILState_Ensure() is called in a non-Python thread. </pre> <p>Then I reenabled <tt class="docutils literal">test_embed.test_bpo20891()</tt> on the master branch.</p> </div> <div class="section" id="no-second-fix-for-python-2-7-and-3-6-sorry"> <h2>No second fix for Python 2.7 and 3.6, sorry!</h2> <p><strong>Antoine Pitrou</strong> considered that backport for Python 3.6 <a class="reference external" href="https://github.com/python/cpython/pull/5421#issuecomment-361214537">should not be merged</a>:</p> <blockquote> I don't think so. People can already call <tt class="docutils literal">PyEval_InitThreads()</tt>.</blockquote> <p><strong>Guido van Rossum</strong> didn't want to backport this change neither. So I only removed <tt class="docutils literal">test_embed.test_bpo20891()</tt> from the 3.6 branch.</p> <p>I didn't apply my second fix to Python 2.7 neither for the same reason. Moreover, Python 2.7 has no unit test, since it was too difficult to backport it.</p> <p>At least, Python 2.7 and 3.6 got my first <tt class="docutils literal">PyGILState_Ensure()</tt> fix.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>Python still has some race conditions in corner cases. Such bug was found in the creation of the GIL when a C thread starts using the Python API. I pushed a first fix, but a new and different race condition was found on macOS.</p> <p>I had to dig into the very old history (1992) of the Python GIL. Luckily, <strong>Guido van Rossum</strong> was also able to elaborate the rationale.</p> <p>After a glitch in benchmarks, we agreed to modify Python 3.7 to always create the GIL, instead of creating the GIL &quot;on demand&quot;. The change has no significant impact on performances.</p> <p>It was also decided to leave Python 2.7 and 3.6 unchanged, to prevent any risk of regression: continue to create the GIL &quot;on demand&quot;.</p> <p><strong>It took me 4 years to fix a nasty bug in the famous Python GIL.</strong> I am never confortable when touching such <strong>critical part</strong> of Python. I am now happy that the bug is behind us: it's now fully fixed in the future Python 3.7!</p> <p>See <a class="reference external" href="https://bugs.python.org/issue20891">bpo-20891</a> for the full story. Thanks to all developers who helped me to fix this bug!</p> </div> Python 3.7 nanoseconds2018-03-06T16:30:00+01:002018-03-06T16:30:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-06:/python37-pep-564-nanoseconds.html<p>Thanks to my <a class="reference external" href="https://vstinner.github.io/python37-perf-counter-nanoseconds.html">latest change on time.perf_counter()</a>, all Python 3.7 clocks now use nanoseconds as integer internally. It became possible to propose again my old idea of getting time as nanoseconds at Python level and so I wrote a new <a class="reference external" href="https://peps.python.org/pep-0564">PEP 564</a> &quot;Add new time functions with nanosecond …</p><p>Thanks to my <a class="reference external" href="https://vstinner.github.io/python37-perf-counter-nanoseconds.html">latest change on time.perf_counter()</a>, all Python 3.7 clocks now use nanoseconds as integer internally. It became possible to propose again my old idea of getting time as nanoseconds at Python level and so I wrote a new <a class="reference external" href="https://peps.python.org/pep-0564">PEP 564</a> &quot;Add new time functions with nanosecond resolution&quot;. While the PEP was discussed, I also deprecated <tt class="docutils literal">time.clock()</tt> and removed <tt class="docutils literal">os.stat_float_times()</tt>.</p> <a class="reference external image-reference" href="https://www.flickr.com/photos/dkalo/2909921582/"> <img alt="Old clock" src="https://vstinner.github.io/images/clock.jpg" /> </a> <div class="section" id="time-clock"> <h2>time.clock()</h2> <p>Since I wrote the <a class="reference external" href="https://peps.python.org/pep-0418">PEP 418</a> &quot;Add monotonic time, performance counter, and process time functions&quot; in 2012, I dislike <tt class="docutils literal">time.clock()</tt>. This clock is not portable: on Windows it mesures wall-clock, whereas it measures CPU time on Unix. Extract of <a class="reference external" href="https://docs.python.org/dev/library/time.html#time.clock">time.clock() documentation</a>:</p> <blockquote> <em>Deprecated since version 3.3: The behaviour of this function depends on the platform: use perf_counter() or process_time() instead, depending on your requirements, to have a well defined behaviour.</em></blockquote> <p>My PEP 418 deprecated <tt class="docutils literal">time.clock()</tt> in the documentation. In <a class="reference external" href="https://bugs.python.org/issue31803">bpo-31803</a>, I modified <tt class="docutils literal">time.clock()</tt> and <tt class="docutils literal"><span class="pre">time.get_clock_info('clock')</span></tt> to also emit a <tt class="docutils literal">DeprecationWarning</tt> warning. I replaced <tt class="docutils literal">time.clock()</tt> with <tt class="docutils literal">time.perf_counter()</tt> in tests and demos. I also removed <tt class="docutils literal">hasattr(time, 'monotonic')</tt> in <tt class="docutils literal">test_time</tt> since <tt class="docutils literal">time.monotonic()</tt> is always available since Python 3.5.</p> </div> <div class="section" id="os-stat-float-times"> <h2>os.stat_float_times()</h2> <p>The <tt class="docutils literal">os.stat_float_times()</tt> function was introduced in Python 2.3 to get file modification times with sub-second resolution (commit <a class="reference external" href="https://github.com/python/cpython/commit/f607bdaa77475ec8c94614414dc2cecf8fd1ca0a">f607bdaa</a>), the default was still to get time as seconds (integer). The function was introduced to get a smooth transition to time as floating point number, to keep the backward compatibility with Python 2.2.</p> <p><tt class="docutils literal">os.stat()</tt> was modified to return time as float by default in Python 2.5 (commit <a class="reference external" href="https://github.com/python/cpython/commit/fe33d0ba87f5468b50f939724b303969711f3be5">fe33d0ba</a>). Python 2.5 was released 11 years ago, I consider that people had enough time to migrate their code to float time :-) I modified <tt class="docutils literal">os.stat_float_times()</tt> in Python 3.1 to emit a <tt class="docutils literal">DeprecationWarning</tt> warning (commit <a class="reference external" href="https://github.com/python/cpython/commit/034d0aa2171688c40cee1a723ddcdb85bbce31e8">034d0aa2</a> of <a class="reference external" href="https://bugs.python.org/issue14711">bpo-14711</a>).</p> <p>Finally, I removed <tt class="docutils literal">os.stat_float_times()</tt> in Python 3.7: <a class="reference external" href="https://bugs.python.org/issue31827">bpo-31827</a>.</p> <p>Serhiy Storchaka proposed to also remove last three items from <tt class="docutils literal">os.stat_result</tt>. For example, <tt class="docutils literal">stat_result[stat.ST_MTIME]</tt> could be replaced with <tt class="docutils literal">stat_result.st_time</tt>. But I tried to remove these items and it broke the <tt class="docutils literal">logging</tt> module, so I decided to leave it unchanged.</p> </div> <div class="section" id="pep-564-time-time-ns"> <h2>PEP 564: time.time_ns()</h2> <p>Six years ago (2012), I wrote the <a class="reference external" href="https://peps.python.org/pep-0410">PEP 410</a> &quot;Use decimal.Decimal type for timestamps&quot; which proposes a large and complex change in all Python functions returning time to support nanosecond resolution using the <tt class="docutils literal">decimal.Decimal</tt> type. The PEP was <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2012-February/116837.html">rejected for different reasons</a>.</p> <p>Since all clock now use nanoseconds internally in Python 3.7, I proposed a new <a class="reference external" href="https://peps.python.org/pep-0564">PEP 564</a> &quot;Add new time functions with nanosecond resolution&quot;. Abstract:</p> <blockquote> <p>Add six new &quot;nanosecond&quot; variants of existing functions to the <tt class="docutils literal">time</tt> module: <tt class="docutils literal">clock_gettime_ns()</tt>, <tt class="docutils literal">clock_settime_ns()</tt>, <tt class="docutils literal">monotonic_ns()</tt>, <tt class="docutils literal">perf_counter_ns()</tt>, <tt class="docutils literal">process_time_ns()</tt> and <tt class="docutils literal">time_ns()</tt>. While similar to the existing functions without the <tt class="docutils literal">_ns</tt> suffix, they provide nanosecond resolution: they return a number of nanoseconds as a Python <tt class="docutils literal">int</tt>.</p> <p>The <tt class="docutils literal">time.time_ns()</tt> resolution is 3 times better than the <tt class="docutils literal">time.time()</tt> resolution on Linux and Windows.</p> </blockquote> <p>People were now convinced by the need for nanosecond resolution, so I added an &quot;Issues caused by precision loss&quot; section with 2 examples:</p> <ul class="simple"> <li>Example 1: measure time delta in long-running process</li> <li>Example 2: compare times with different resolution</li> </ul> <p>As for my previous PEP 410, many people proposed many alternatives recorded in the PEP: sub-nanosecond resolution, modifying <tt class="docutils literal">time.time()</tt> result type, different types, different API, a new module, etc.</p> <p>Hopefully for me, Guido van Rossum quickly approved my PEP for Python 3.7!</p> </div> <div class="section" id="implementaton-of-the-pep-564"> <h2>Implementaton of the PEP 564</h2> <p>I implemented my PEP 564 in <a class="reference external" href="https://bugs.python.org/issue31784">bpo-31784</a> with the commit <a class="reference external" href="https://github.com/python/cpython/commit/c29b585fd4b5a91d17fc5dd41d86edff28a30da3">c29b585f</a>. I added 6 new time functions:</p> <ul class="simple"> <li><tt class="docutils literal">time.clock_gettime_ns()</tt></li> <li><tt class="docutils literal">time.clock_settime_ns()</tt></li> <li><tt class="docutils literal">time.monotonic_ns()</tt></li> <li><tt class="docutils literal">time.perf_counter_ns()</tt></li> <li><tt class="docutils literal">time.process_time_ns()</tt></li> <li><tt class="docutils literal">time.time_ns()</tt></li> </ul> <p>Example:</p> <pre class="literal-block"> $ python3.7 Python 3.7.0b2+ (heads/3.7:31e2b76f7b, Mar 6 2018, 15:31:29) [GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux &gt;&gt;&gt; import time &gt;&gt;&gt; time.time() 1520354387.7663522 &gt;&gt;&gt; time.time_ns() 1520354388319257562 </pre> <p>I also added tests on <tt class="docutils literal">os.times()</tt> in <tt class="docutils literal">test_os</tt>, previously the function wasn't tested at all!</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>I added 6 new functions to get time with a nanosecond resolution like <tt class="docutils literal">time.time_ns()</tt> with my approved <a class="reference external" href="https://peps.python.org/pep-0564">PEP 564</a>. I also modified <tt class="docutils literal">time.clock()</tt> to emit a <tt class="docutils literal">DeprecationWarning</tt> and I removed the legacy <tt class="docutils literal">os.stat_float_times()</tt> function.</p> </div> Python 3.7 perf_counter() nanoseconds2018-03-06T15:00:00+01:002018-03-06T15:00:00+01:00Victor Stinnertag:vstinner.github.io,2018-03-06:/python37-perf-counter-nanoseconds.html<p>Since 2012, I have been trying to convert all Python clocks to use internally nanoseconds. The last clock which still used floating point internally was <tt class="docutils literal">time.perf_counter()</tt>. INADA Naoki's new importtime tool was an opportunity for me to have a new look on a tricky integer overflow issue.</p> <div class="section" id="modify-importtime-to-use-time-perf-counter-clock"> <h2>Modify importtime …</h2></div><p>Since 2012, I have been trying to convert all Python clocks to use internally nanoseconds. The last clock which still used floating point internally was <tt class="docutils literal">time.perf_counter()</tt>. INADA Naoki's new importtime tool was an opportunity for me to have a new look on a tricky integer overflow issue.</p> <div class="section" id="modify-importtime-to-use-time-perf-counter-clock"> <h2>Modify importtime to use time.perf_counter() clock</h2> <p>INADA Naoki added to Python 3.7 a new cool <a class="reference external" href="https://docs.python.org/dev/using/cmdline.html#id5">-X importtime</a> command line option to analyze the Python import performance. This tool can be used optimize the startup time of your application. Example:</p> <pre class="literal-block"> vstinner&#64;apu$ ./python -X importtime -c pass import time: self [us] | cumulative | imported package (...) import time: 901 | 1902 | io import time: 374 | 374 | _stat import time: 663 | 1037 | stat import time: 617 | 617 | genericpath import time: 877 | 1493 | posixpath import time: 3840 | 3840 | _collections_abc import time: 2106 | 8474 | os import time: 674 | 674 | _sitebuiltins import time: 922 | 922 | sitecustomize import time: 598 | 598 | usercustomize import time: 1444 | 12110 | site </pre> <p>Read Naoki's article <a class="reference external" href="https://dev.to/methane/how-to-speed-up-python-application-startup-time-nkf">How to speed up Python application startup time</a> (Jan 19, 2018) for a concrete analysis of <tt class="docutils literal">pipenv</tt> performance.</p> <p>Naoki chose to use the <tt class="docutils literal">time.monotonic()</tt> clock internally to measure elapsed time. On Windows, this clock (<tt class="docutils literal">GetTickCount64()</tt> function) has a resolution around 15.6 ms, whereas most Python imports take less than 10 ms, and so most numbers are just zeros. Example:</p> <pre class="literal-block"> f:\dev\3x&gt;python -X importtime -c &quot;import idlelib.pyshell&quot; Running Debug|Win32 interpreter... import time: self [us] | cumulative | imported package import time: 0 | 0 | _codecs import time: 0 | 0 | codecs import time: 0 | 0 | encodings.aliases import time: 15000 | 15000 | encodings import time: 0 | 0 | encodings.utf_8 import time: 0 | 0 | _signal import time: 0 | 0 | encodings.latin_1 import time: 0 | 0 | _weakrefset import time: 0 | 0 | abc import time: 0 | 0 | io import time: 0 | 0 | _stat (...) </pre> <p>In <a class="reference external" href="https://bugs.python.org/issue31415">bpo-31415</a>, I fixed the issue by adding a new C function <tt class="docutils literal">_PyTime_GetPerfCounter()</tt> to access the <tt class="docutils literal">time.perf_counter()</tt> clock at the C level and I modified &quot;importtime&quot; to use it.</p> <p>Problem solved! ... almost...</p> </div> <div class="section" id="double-integer-float-conversions"> <h2>Double integer-float conversions</h2> <p>My commit <a class="reference external" href="https://github.com/python/cpython/commit/a997c7b434631f51e00191acea2ba6097691e859">a997c7b4</a> of <a class="reference external" href="https://bugs.python.org/issue31415">bpo-31415</a> adding <tt class="docutils literal">_PyTime_GetPerfCounter()</tt> moved the C code from <tt class="docutils literal">Modules/timemodule.c</tt> to <tt class="docutils literal">Python/pytime.c</tt>, but also changed the internal type storing time from floatting point number (C <tt class="docutils literal">double</tt>) to integer number (<tt class="docutils literal">_PyTyime_t</tt> which is <tt class="docutils literal">int64_t</tt> in practice).</p> <p>The drawback of this change is that <tt class="docutils literal">time.perf_counter()</tt> now converts <tt class="docutils literal">QueryPerformanceCounter() / QueryPerformanceFrequency()</tt> float into a <tt class="docutils literal">_PyTime_t</tt> (integer) and then back to a float, and these conversions cause a precision loss. I computed that the conversions start to loose precision starting after a single second with <tt class="docutils literal">QueryPerformanceFrequency()</tt> equals to <tt class="docutils literal">3,579,545</tt> (3.6 MHz).</p> <p>To fix the precision loss, I modified again <tt class="docutils literal">time.clock()</tt> and <tt class="docutils literal">time.perf_counter()</tt> to not use <tt class="docutils literal">_PyTime_t</tt> anymore, only double.</p> </div> <div class="section" id="grumpy-victor"> <h2>Grumpy Victor</h2> <img alt="Grumpy" src="https://vstinner.github.io/images/grumpy.jpg" /> <p>My change to replace <tt class="docutils literal">_PyTime_t</tt> with <tt class="docutils literal">double</tt> made me grumpy. I have been trying to convert all Python clocks to <tt class="docutils literal">_PyTime_t</tt> since 6 years (2012).</p> <p>Being blocked by a single clock made me grumpy, especially because the issue is specific to the Windows implementation. The Linux implementation of <tt class="docutils literal">time.perf_counter()</tt> uses <tt class="docutils literal">clock_gettime()</tt> which directly returns nanoseconds as integers, no division needed to get time as <tt class="docutils literal">_PyTime_t</tt>.</p> <p>I looked at the clock sources in the Linux kernel source code: <a class="reference external" href="https://github.com/torvalds/linux/blob/master/kernel/time/clocksource.c">kernel/time/clocksource.c</a>. Linux clocks only use integers and support nanosecond resolution. I'm always impressed by the quality of the Linux kernel source code, the code is straightforward C code. If Linux is able to use integers for various kinds of clocks, I should be able to use integers for my specific Windows implementations of <tt class="docutils literal">time.perf_counter()</tt>, no?</p> <p>In practice, the <tt class="docutils literal">_PyTime_t</tt> is a number of nanoseconds, so the computation is:</p> <pre class="literal-block"> (QueryPerformanceCounter() * 1_000_000_000) / QueryPerformanceFrequency() </pre> <p>where <tt class="docutils literal">1_000_000_000</tt> is the number of nanoseconds in one second. <strong>The problem is preventing integer overflow</strong> on the first part, using <tt class="docutils literal">_PyTime_t</tt> which is <tt class="docutils literal">int64_t</tt> in practice:</p> <pre class="literal-block"> QueryPerformanceCounter() * 1_000_000_000 </pre> </div> <div class="section" id="some-maths-to-avoid-the-precision-loss"> <h2>Some maths to avoid the precision loss</h2> <p>Using a pencil, a sheet of paper and some maths, I found a solution!</p> <pre class="literal-block"> (a * b) / q == (a / q) * b + ((a % q) * b) / q </pre> <img alt="Math rocks" src="https://vstinner.github.io/images/math_rocks.jpg" /> <p>This prevents the risk of integer overflow. C implementation:</p> <pre class="literal-block"> Py_LOCAL_INLINE(_PyTime_t) _PyTime_MulDiv(_PyTime_t ticks, _PyTime_t mul, _PyTime_t div) { _PyTime_t intpart, remaining; /* Compute (ticks * mul / div) in two parts to prevent integer overflow: compute integer part, and then the remaining part. (ticks * mul) / div == (ticks / div) * mul + (ticks % div) * mul / div The caller must ensure that &quot;(div - 1) * mul&quot; cannot overflow. */ intpart = ticks / div; ticks %= div; remaining = ticks * mul; remaining /= div; return intpart * mul + remaining; } </pre> <p>Simplified Windows implementation of perf_counter():</p> <pre class="literal-block"> _PyTime_t win_perf_counter(void) { LARGE_INTEGER freq; LONGLONG frequency; LARGE_INTEGER now; LONGLONG ticksll; _PyTime_t ticks; (void)QueryPerformanceFrequency(&amp;freq); frequency = freq.QuadPart; QueryPerformanceCounter(&amp;now); ticksll = now.QuadPart; ticks = (_PyTime_t)ticksll; return _PyTime_MulDiv(ticks, SEC_TO_NS, (_PyTime_t)frequency); } </pre> <p>On Windows, I added the following sanity checks to make sure that integer overflows cannot occur:</p> <pre class="literal-block"> /* Check that frequency can be casted to _PyTime_t. Make also sure that (ticks * SEC_TO_NS) cannot overflow in _PyTime_MulDiv(), with ticks &lt; frequency. Known QueryPerformanceFrequency() values: * 10,000,000 (10 MHz): 100 ns resolution * 3,579,545 Hz (3.6 MHz): 279 ns resolution None of these frequencies can overflow with 64-bit _PyTime_t, but check for overflow, just in case. */ if (frequency &gt; _PyTime_MAX || frequency &gt; (LONGLONG)_PyTime_MAX / (LONGLONG)SEC_TO_NS) { PyErr_SetString(PyExc_OverflowError, &quot;QueryPerformanceFrequency is too large&quot;); return -1; } </pre> <p>Since I also modified the macOS implementation of <tt class="docutils literal">time.monotonic()</tt> to use <tt class="docutils literal">_PyTime_MulDiv()</tt>, I also added this check for macOS:</p> <pre class="literal-block"> /* Make sure that (ticks * timebase.numer) cannot overflow in _PyTime_MulDiv(), with ticks &lt; timebase.denom. Known time bases: * always (1, 1) on Intel * (1000000000, 33333335) or (1000000000, 25000000) on PowerPC None of these time bases can overflow with 64-bit _PyTime_t, but check for overflow, just in case. */ if ((_PyTime_t)timebase.numer &gt; _PyTime_MAX / (_PyTime_t)timebase.denom) { PyErr_SetString(PyExc_OverflowError, &quot;mach_timebase_info is too large&quot;); return -1; } </pre> </div> <div class="section" id="pytime-c-source-code"> <h2>pytime.c source code</h2> <p>If you are curious, the full code lives at <a class="reference external" href="https://github.com/python/cpython/blob/master/Python/pytime.c">Python/pytime.c</a> and is currently around 1,100 lines of C code.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>INADA Naoki's importtime tool was using <tt class="docutils literal">time.monotonic()</tt> clock which failed to measure short import times on Windows. I modified it to use <tt class="docutils literal">time.perf_counter()</tt> internally to get better precision on Windows. I identified a precision loss caused by my internal <tt class="docutils literal">_PyTime_t</tt> type to store time as nanoseconds. Thanks to maths, I succeeded to use nanoseconds and prevent any risk of integer overflow.</p> </div> My contributions to CPython during 2017 Q3: Part 3 (funny bugs)2017-10-19T16:00:00+02:002017-10-19T16:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-10-19:/contrib-cpython-2017q3-part3.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3 (july, august, september), Part 3 (funny bugs).</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling threads)</a>.</p> <p>Summary:</p> <ul class="simple"> <li>FreeBSD bug: minor() device regression</li> <li>regrtest snowball effect when hunting memory leaks</li> <li>Bugfixes</li> <li>Other Changes</li> </ul> <div class="section" id="freebsd-bug-minor-device-regression"> <h2>FreeBSD bug: minor() device regression</h2> <a class="reference external image-reference" href="https://www.freebsd.org/"> <img alt="Logo of the FreeBSD project" src="https://vstinner.github.io/images/freebsd.png" /> </a> <p><a class="reference external" href="https://bugs.python.org/issue31044">bpo-31044</a>: The …</p></div><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3 (july, august, september), Part 3 (funny bugs).</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling threads)</a>.</p> <p>Summary:</p> <ul class="simple"> <li>FreeBSD bug: minor() device regression</li> <li>regrtest snowball effect when hunting memory leaks</li> <li>Bugfixes</li> <li>Other Changes</li> </ul> <div class="section" id="freebsd-bug-minor-device-regression"> <h2>FreeBSD bug: minor() device regression</h2> <a class="reference external image-reference" href="https://www.freebsd.org/"> <img alt="Logo of the FreeBSD project" src="https://vstinner.github.io/images/freebsd.png" /> </a> <p><a class="reference external" href="https://bugs.python.org/issue31044">bpo-31044</a>: The test_makedev() of test_posix started to fail in the build 632 (Wed Jul 26 10:47:01 2017) of AMD64 FreeBSD CURRENT. The test failed on Debug, but also Non-Debug buildbots, in master and 3.6 branches. It looks more like a change on the buildbot, maybe a FreeBSD upgrade?</p> <p>Thanks to <strong>koobs</strong>, I have a SSH access to the buildbot. I was able to reproduce the bug manually. I noticed that minor() truncates most significant bits.</p> <p>I continued my analysis and I found that, at May 23, the FreeBSD <tt class="docutils literal">dev_t</tt> type changed from 32 bits to 64 bits in the kernel, but the <tt class="docutils literal">minor()</tt> userland function was not updated.</p> <p>I reported a bug to FreeBSD: <a class="reference external" href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221048">Bug 221048 - minor() truncates device number to 32 bits, whereas dev_t type was extended to 64 bits</a>.</p> <p>In the meanwhile, I skipped test_posix.test_makedev() on FreeBSD if <tt class="docutils literal">dev_t</tt> is larger than 32-bit.</p> <p>Hopefully, the FreeBSD bug was quickly fixed!</p> </div> <div class="section" id="regrtest-snowball-effect-when-hunting-memory-leaks"> <h2>regrtest snowball effect when hunting memory leaks</h2> <p>While trying to fix all reference leaks on the new Windows and Linux &quot;Refleaks&quot; buildbots, I reported the bug <a class="reference external" href="https://bugs.python.org/issue31217">bpo-31217</a>:</p> <pre class="literal-block"> test_code leaked [1, 1, 1] memory blocks, sum=3 </pre> <p>Two weeks after reporting the bug, I was able to reproduce the bug, but <strong>only with Python compiled in 32-bit mode</strong>. Strange.</p> <p>I spent one day to understand the bug. I removed as much as possible while making sure that I can still reproduce the bug. At the end, I wrote <a class="reference external" href="https://bugs.python.org/file47114/leak2.py">leak2.py</a> which reproduces the bug with a single import: <tt class="docutils literal">import sys</tt>. Even if the script is only 86 lines long, I was still unable to understand the bug.</p> <p>My first hypothesis:</p> <blockquote> It seems like the &quot;leak&quot; is the call to <tt class="docutils literal">sys.getallocatedblocks()</tt> which creates a new integer, and the integer is kept alive between two loop iterations.</blockquote> <p><strong>Antoine Pitrou</strong> rejected it:</p> <blockquote> I doubt it. If that was the case, the reference count would increase as well.</blockquote> <p>It was Antoine Pitrou who understood the bug:</p> <pre class="literal-block"> Ahah. Actually, it's quite simple :-) On 64-bit Python: &gt;&gt;&gt; id(82914 - 82913) == id(1) True On 32-bit Python: &gt;&gt;&gt; id(82914 - 82913) == id(1) False So the first non-zero alloc_delta really has a snowball effect, as it creates new memory block which will produce a non-zero alloc_delta on the next run, etc. </pre> <p>I implemented Antoine's idea to fix the bug, <a class="reference external" href="https://github.com/python/cpython/commit/6c2feabc5dac2f3049b15134669e9ad5af573193">commit</a>:</p> <pre class="literal-block"> Use a pool of integer objects to prevent false alarm when checking for memory block leaks. Fill the pool with values in -1000..1000 which are the most common (reference, memory block, file descriptor) differences. Co-Authored-By: Antoine Pitrou &lt;pitrou&#64;free.fr&gt; </pre> <p>The bug is probably as old as the code hunting memory leaks.</p> </div> <div class="section" id="bugfixes"> <h2>Bugfixes</h2> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue30891">bpo-30891</a>: Second fix for importlib <tt class="docutils literal">_find_and_load()</tt> to handle correctly parallelism with threads. Call <tt class="docutils literal">sys.modules.get()</tt> in the <tt class="docutils literal">with _ModuleLockManager(name):</tt> block to protect the dictionary key with the module lock and use an atomic get to prevent race conditions.</li> <li><a class="reference external" href="https://bugs.python.org/issue31019">bpo-31019</a>: <tt class="docutils literal">multiprocessing.Process.is_alive()</tt> now removes the process from the <tt class="docutils literal">_children set</tt> if the process completed. The change prevents leaking &quot;dangling&quot; processes.</li> <li><a class="reference external" href="https://bugs.python.org/issue31326">bpo-31326</a>, <tt class="docutils literal">concurrent.futures</tt>: <tt class="docutils literal">ProcessPoolExecutor.shutdown()</tt> now explicitly closes the call queue. Moreover, <tt class="docutils literal">shutdown(wait=True)</tt> now also joins the call queue thread, to prevent leaking a dangling thread.</li> <li><a class="reference external" href="https://bugs.python.org/issue31170">bpo-31170</a>: Update libexpat from 2.2.3 to 2.2.4: fix copying of partial characters for UTF-8 input (<a class="reference external" href="https://github.com/libexpat/libexpat/issues/115">libexpat bug 115</a>). Later, I also wrote non-regression tests for this bug (libexpat doesn't have any test for this bug).</li> <li><a class="reference external" href="https://bugs.python.org/issue31499">bpo-31499</a>, <tt class="docutils literal">xml.etree</tt>: <tt class="docutils literal">xmlparser_gc_clear()</tt> now sets self.parser to <tt class="docutils literal">NULL</tt> to prevent a crash in <tt class="docutils literal">xmlparser_dealloc()</tt> if <tt class="docutils literal">xmlparser_gc_clear()</tt> was called previously by the garbage collector, because the parser was part of a reference cycle. Fix co-written with <strong>Serhiy Storchaka</strong>.</li> <li><a class="reference external" href="https://bugs.python.org/issue30892">bpo-30892</a>: Fix <tt class="docutils literal">_elementtree</tt> module initialization (accelerator of <tt class="docutils literal">xml.etree</tt>), handle correctly <tt class="docutils literal">getattr(copy, 'deepcopy')</tt> failure to not fail with an assertion error.</li> </ul> </div> <div class="section" id="other-changes"> <h2>Other Changes</h2> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue30866">bpo-30866</a>: Add _testcapi.stack_pointer(). I used it to write the &quot;Stack consumption&quot; section of a previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to CPython during 2017 Q1</a></li> <li>_ssl_: Fix compiler warning. Cast Py_buffer.len (Py_ssize_t, signed) to size_t (unsigned) to prevent the &quot;comparison between signed and unsigned integer expressions&quot; warning.</li> <li><a class="reference external" href="https://bugs.python.org/issue30486">bpo-30486</a>: Make cell_set_contents() symbol private. Don't export the <tt class="docutils literal">cell_set_contents()</tt> symbol in the C API.</li> </ul> </div> My contributions to CPython during 2017 Q3: Part 2 (dangling threads)2017-10-19T15:00:00+02:002017-10-19T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-10-19:/contrib-cpython-2017q3-part2.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3 (july, august, september), Part 2: &quot;Dangling threads&quot;.</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part1.html">My contributions to CPython during 2017 Q3: Part 1</a>.</p> <p>Next reports:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny bugs)</a>.</li> </ul> <p>Summary:</p> <ul class="simple"> <li>Bugfixes: Reference cycles</li> <li>socketserver leaking threads and processes<ul> <li>test_logging random bug …</li></ul></li></ul><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3 (july, august, september), Part 2: &quot;Dangling threads&quot;.</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part1.html">My contributions to CPython during 2017 Q3: Part 1</a>.</p> <p>Next reports:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny bugs)</a>.</li> </ul> <p>Summary:</p> <ul class="simple"> <li>Bugfixes: Reference cycles</li> <li>socketserver leaking threads and processes<ul> <li>test_logging random bug</li> <li>Skip failing tests</li> <li>Fix socketserver for processes</li> <li>Fix socketserver for threads</li> <li>Issue not done yet</li> </ul> </li> <li>Environment altered and dangling threads<ul> <li>Environment changed</li> <li>test.support and regrtest enhancements</li> <li>multiprocessing bug fixes</li> <li>concurrent.futures bug fixes</li> <li>test_threading and test_thread</li> <li>Other fixes</li> </ul> </li> </ul> <div class="section" id="bugfixes-reference-cycles"> <h2>Bugfixes: Reference cycles</h2> <p>While fixing &quot;dangling threads&quot; (see below), I found and fixed 4 reference cycles which caused memory leaks and objects to live longer than expected. I was surprised that the bug in the common <tt class="docutils literal">socket.create_connection()</tt> function was not noticed before! So my work on dangling threads was useful!</p> <p>The typical pattern of such reference cycle is:</p> <pre class="literal-block"> def func(): err = None try: do_something() except Exception as exc: err = exc if err is not None: handle_error(exc) # the exception is stored in the 'err' variable func() # surprise, surprise, the exception is still alive at this point! </pre> <p>Or the variant:</p> <pre class="literal-block"> def func(): try: do_something() except Exception as exc: exc_info = sys.exc_info() handle_error(exc_info) # the exception is stored in the 'exc_info' variable func() # surprise, surprise, the exception is still alive at this point! </pre> <p>It's not easy to spot the bug, the bug is subtle. An exception object in Python 3 has a <tt class="docutils literal">__traceback__</tt> attribute which contains frames. If a frame stores the exception in a variable, like <tt class="docutils literal">err</tt> in the first example, or <tt class="docutils literal">exc_info</tt> in the second example, a cycle exists between the exception and frames. In this case, the exception, the traceback, the frames, <strong>and all variables of all frames are kept alive</strong> by the reference cycle, <strong>until the cycle is break by the garbage collector</strong>.</p> <p>The problem is that the garbage collector is only called infrequently, so the cycle may stay alive for a long time.</p> <p>Sometimes, the reference cycle is even more subtle than the simple examples above.</p> <p>Fixed reference cycles:</p> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>, <tt class="docutils literal">socket.create_connection()</tt>: Fix reference cycle.</li> <li><a class="reference external" href="https://bugs.python.org/issue31247">bpo-31247</a>: <tt class="docutils literal">xmlrpc.server</tt> now explicitly breaks reference cycles when using <tt class="docutils literal">sys.exc_info()</tt> in code handling exceptions.</li> <li><a class="reference external" href="https://bugs.python.org/issue31249">bpo-31249</a>, <tt class="docutils literal">concurrent.futures</tt>: <tt class="docutils literal">WorkItem.run()</tt> used by ThreadPoolExecutor now explicitly breaks a reference cycle between an exception object and the <tt class="docutils literal">WorkItem</tt> object. <tt class="docutils literal">ThreadPoolExecutor.shutdown()</tt> now also clears its threads set.</li> <li><a class="reference external" href="https://bugs.python.org/issue31238">bpo-31238</a>: <tt class="docutils literal">pydoc</tt>: <tt class="docutils literal">ServerThread.stop()</tt> now joins itself to wait until <tt class="docutils literal">DocServer.serve_until_quit()</tt> completes and then explicitly sets its docserver attribute to None to break a reference cycle. This change was made to fix <tt class="docutils literal">test_doc</tt>.</li> <li><a class="reference external" href="https://bugs.python.org/issue31323">bpo-31323</a>: Fix reference leak in test_ssl. Store exceptions as string rather than object to prevent reference cycles which cause leaking dangling threads.</li> </ul> <p>I also started a discussion on reference cycles caused by exceptions: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-September/149586.html">[Python-Dev] Evil reference cycles caused Exception.__traceback__</a>. Sadly, no action was taken, no obvious solution was found.</p> <p>I found the <tt class="docutils literal">socket.create_connection()</tt> reference cycle because of an unrelated change in test.support:</p> <pre class="literal-block"> bpo-29639: change test.support.HOST to &quot;localhost&quot; </pre> <p>Read <a class="reference external" href="https://bugs.python.org/issue29639#msg302087">my message</a> on bpo-29639 for the full story. Extract:</p> <blockquote> Modifying support.HOST to &quot;localhost&quot; triggered a reference cycle!?</blockquote> </div> <div class="section" id="socketserver-leaking-threads-and-processes"> <h2>socketserver leaking threads and processes</h2> <div class="section" id="test-logging-random-bug"> <h3>test_logging random bug</h3> <p>This story starts at July, 3, with test_logging failing randomly on FreeBSD, <a class="reference external" href="https://bugs.python.org/issue30830">bpo-30830</a>:</p> <pre class="literal-block"> test_output (test.test_logging.HTTPHandlerTest) ... ok Warning -- threading_cleanup() failed to cleanup -1 threads after 3 sec (count: 0, dangling: 1) </pre> <p>I failed to reproduce the bug on my FreeBSD VM, nor on Linux. The bug only occurred on one specific FreeBSD buildbot. I even got access to the buildbot... and I still failed to reproduce the bug! I tried to run test_logging multiple times in parallel, increase the system load, etc. I felt disappointed. I used my <tt class="docutils literal">system_load.py</tt> script which spawns Python processes running <tt class="docutils literal">while 1: pass</tt> to stress the CPU.</p> <p>After one month, I succeeded to reproduce the bug by running two commands in parallel.</p> <p>Command 1 to trigger the bug:</p> <pre class="literal-block"> ./python -m test -v test_logging \ --fail-env-changed \ --forever \ -m test.test_logging.DatagramHandlerTest.test_output \ -m test.test_logging.ConfigDictTest.test_listen_config_10_ok \ -m test.test_logging.SocketHandlerTest.test_output </pre> <p>Command 2 to stress the system:</p> <pre class="literal-block"> ./python -m test -j4 </pre> <p>It seems like the Python test suite is a very good tool to stress a system to trigger a race condition!</p> <p>Finally, I was able to identify the bug:</p> <blockquote> The problem is that <tt class="docutils literal">socketserver.ThreadingMixIn</tt> spawns threads without waiting for their completion in server_close().</blockquote> </div> <div class="section" id="skip-failing-tests"> <h3>Skip failing tests</h3> <p>To stabilize the buildbots and to be able to work on other bugs, I decided to first skip all tests using <tt class="docutils literal">socketserver.ThreadingMixIn</tt> until this class was fixed to prevent &quot;dangling threads&quot;.</p> </div> <div class="section" id="fix-socketserver-for-processes"> <h3>Fix socketserver for processes</h3> <p>While trying to see how to fix <tt class="docutils literal">socketserver.ThreadingMixIn</tt>, I understood that <a class="reference external" href="https://bugs.python.org/issue31151">bpo-31151</a> was a similar bug in the <tt class="docutils literal">socketserver</tt> module but for processes:</p> <pre class="literal-block"> test_ForkingUDPServer (test.test_socketserver.SocketServerTest) ... creating server (...) Warning -- reap_children() reaped child process 18281 </pre> <p>My analysis:</p> <blockquote> The problem is that <tt class="docutils literal">socketserver.ForkinMixin</tt> doesn't wait until all children completes. It only calls <tt class="docutils literal">os.waitpid()</tt> in non-blocking module (using <tt class="docutils literal">os.WNOHANG</tt>) after each loop iteration. If a child process completes after the last call to <tt class="docutils literal">ForkingMixIn.collect_children()</tt>, the server leaks zombie processes.</blockquote> <p>I fixed <tt class="docutils literal">socketserver.ForkingMixIn</tt> by modifying the <tt class="docutils literal">server_close()</tt> method to <strong>block</strong> until all child processes complete: <a class="reference external" href="https://github.com/python/cpython/commit/aa8ec34ad52bb3b274ce91169e1bc4a598655049">commit</a>.</p> <p>Just after pushing my fix, I understood that my fix changed the <tt class="docutils literal">ForkingMixIn</tt> behaviour. I wrote an email to ask if it's the good behaviour or if a change was needed: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-August/148826.html">[Python-Dev] socketserver ForkingMixin waiting for child processes</a>. The answer is that not everybody wants this behaviour. Sadly, I didn't have time yet to let the user chooses the behaviour.</p> </div> <div class="section" id="fix-socketserver-for-threads"> <h3>Fix socketserver for threads</h3> <p>Fixing <tt class="docutils literal">socketserver.ForkinMixin</tt> was simple because the code already tracked the (identifier of) child processes and already had code to wait for child completion.</p> <p>Fixing <tt class="docutils literal">socketserver.ThreadingMixIn</tt> (<a class="reference external" href="https://bugs.python.org/issue31233">bpo-31233</a>) was more complicated since it didn't keep track of spawned threads.</p> <p>I chose to keep a list of <tt class="docutils literal">threading.Thread</tt> objects, but only for non-daemonic threads. <tt class="docutils literal">socketserver.ThreadingMixIn.server_close()</tt> now joins all threads: <a class="reference external" href="https://github.com/python/cpython/commit/b8f4163da30e16c7cd58fe04f4b17e38d53cd57e">commit</a>.</p> </div> <div class="section" id="issue-not-done-yet"> <h3>Issue not done yet</h3> <p>As I wrote above, the <tt class="docutils literal">socketserver</tt> still needs to be reworked to let the user decides if the server must gracefully wait for child completion or not. Maybe expose also a method to explicitly wait for children, maybe with a timeout?</p> </div> </div> <div class="section" id="environment-altered-and-dangling-threads"> <h2>Environment altered and dangling threads</h2> <p>This part kept me busy for the whole quarter. While trying to fix &quot;all bugs&quot;, I looked at two specific &quot;environment changes&quot;: &quot;dangling threads&quot; and &quot;zombie processes&quot;. A dangling thread comes from a test spawning a thread but doesn't proper &quot;clean&quot; the thread.</p> <p>Leaking threads or processes is a very bad side effect since it is likely to cause random bugs in following tests.</p> <p>At the beginning, I expected that only 2 or 3 bugs should be fixed. At the end, it was closer to 100 bugs. I don't regret, I'm now sure that I made the Python test suite more reliable, and this work allowed me to catch <strong>and fix</strong> old reference cycles bugs (see above).</p> <div class="section" id="environment-changed"> <h3>Environment changed</h3> <p>To detect bugs, I modified Travis CI jobs, AppVeyor and buildbots to run tests with <tt class="docutils literal"><span class="pre">--fail-env-changed</span></tt>. With this option, if a test alters the environment, the full test suite is marked as failed with &quot;ENV_CHANGED&quot;.</p> <p>I also fixed <tt class="docutils literal">python3 <span class="pre">-m</span> test <span class="pre">--fail-env-changed</span> <span class="pre">--forever</span></tt> in <a class="reference external" href="https://bugs.python.org/issue30764">bpo-30764</a>: --forever now stops if a test alters the environment.</p> </div> <div class="section" id="test-support-and-regrtest-enhancements"> <h3>test.support and regrtest enhancements</h3> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue30845">bpo-30845</a>: reap_children() now logs warnings.</li> <li><tt class="docutils literal">support.reap_children()</tt> now sets environment_altered to <tt class="docutils literal">True</tt> if a test leaked a zombie process, to detect bugs using <tt class="docutils literal">python3 <span class="pre">-m</span> test <span class="pre">--fail-env-changed</span></tt>.</li> <li>regrtest: count also &quot;env changed&quot; tests as failed tests in the test progress.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: <tt class="docutils literal">support.threading_cleanup()</tt> now emits a warning immediately if there are threads running in the background, to be able to catch bugs more easily. Previously, the warning was only emitted if the function failed to cleanup these threads after 1 second.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Add <tt class="docutils literal">test.support.wait_threads_exit()</tt>. Use <tt class="docutils literal">_thread.count()</tt> to wait until threads exit. The new context manager prevents the &quot;dangling thread&quot; warning. Add also <tt class="docutils literal">support.join_thread()</tt> helper: joins a thread but raises an AssertionError if the thread is still alive after <em>timeout</em> seconds.</li> </ul> </div> <div class="section" id="multiprocessing-bug-fixes"> <h3>multiprocessing bug fixes</h3> <p>The multiprocessing module is very complex. multiprocessing tests are failing randomly for years, but nobody seems able to fix them. I can only hope that my following fixes will help to make these tests more reliable.</p> <ul class="simple"> <li>multiprocessing.Queue.join_thread() now waits until the thread completes, even if the thread was started by the same process which created the queue.</li> <li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: Avoid daemon processes in _test_multiprocessing. test_level() of _test_multiprocessing._TestLogging now uses regular processes rather than daemon processes to prevent zombi processes (to not &quot;leak&quot; processes).</li> <li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: Fix more dangling processes and threads in test_multiprocessing. Queue: call close() followed by join_thread(). Process: call join() or self.addCleanup(p.join).</li> <li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: test_multiprocessing now detects dangling processes and threads per test case classes.</li> <li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: test_multiprocessing close more queues. Close explicitly queues to make sure that we don't leave dangling threads. test_queue_in_process(): remove unused queue. test_access() joins also the process to fix a random warning.</li> <li><a class="reference external" href="https://bugs.python.org/issue26762">bpo-26762</a>: _test_multiprocessing now marks the test as ENV_CHANGED on dangling process or thread.</li> <li><a class="reference external" href="https://bugs.python.org/issue31069">bpo-31069</a>, Fix a warning about dangling processes in test_rapid_restart() of _test_multiprocessing: join the process.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>, test_multiprocessing: Give 30 seconds to join_process(), instead of 5 or 10 seconds, to wait until the process completes.</li> </ul> </div> <div class="section" id="concurrent-futures-bug-fixes"> <h3>concurrent.futures bug fixes</h3> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue30845">bpo-30845</a>: Enhance test_concurrent_futures cleanup. Make sure that tests don't leak threads nor processes. Clear explicitly the reference to the executor to make sure that it's destroyed.</li> <li><a class="reference external" href="https://bugs.python.org/issue31249">bpo-31249</a>: test_concurrent_futures checks dangling threads. Add a BaseTestCase class to test_concurrent_futures to check for dangling threads and processes on all tests, not only tests using ExecutorMixin.</li> <li><a class="reference external" href="https://bugs.python.org/issue31249">bpo-31249</a>: Fix test_concurrent_futures dangling thread. ProcessPoolShutdownTest.test_del_shutdown() now closes the call queue and joins its thread, to prevent leaking a dangling thread.</li> </ul> </div> <div class="section" id="test-threading-and-test-thread"> <h3>test_threading and test_thread</h3> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: test_threaded_import: fix test_side_effect_import(). Don't leak the module into sys.modules. Avoid also dangling threads.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: test_thread.test_forkinthread() now waits until the thread completes.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Try to fix the threading_cleanup() warning in test.lock_tests: wait a little bit longer to give time to the threads to complete. Warning seen on test_thread and test_importlib.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join threads in test_threading. Call thread.join() to prevent the &quot;dangling thread&quot; warning.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join timers in test_threading. Call the .join() method of threading.Timer timers to prevent the threading_cleanup() warning.</li> </ul> </div> <div class="section" id="other-fixes"> <h3>Other fixes</h3> <ul class="simple"> <li>test_urllib2_localnet: clear server variable. Set the server attribute to None in cleanup to avoid dangling threads.</li> <li><a class="reference external" href="https://bugs.python.org/issue30818">bpo-30818</a>: test_ftplib calls asyncore.close_all(). Always clear asyncore socket map using asyncore.close_all(ignore_all=True) in tearDown() method.</li> <li><a class="reference external" href="https://bugs.python.org/issue30908">bpo-30908</a>: Fix dangling thread in test_os.TestSendfile. tearDown() now clears explicitly the self.server variable to make sure that the thread is completely cleared when tearDownClass() checks if all threads have been cleaned up.</li> <li><a class="reference external" href="https://bugs.python.org/issue31067">bpo-31067</a>: test_subprocess now also calls reap_children() in tearDown(), not only on setUp().</li> <li><a class="reference external" href="https://bugs.python.org/issue31160">bpo-31160</a>: Fix test_builtin for zombie process. PtyTests.run_child() now calls os.waitpid() to read the exit status of the child process to avoid creating zombie process and leaking processes in the background.</li> <li><a class="reference external" href="https://bugs.python.org/issue31160">bpo-31160</a>: Fix test_random for zombie process. TestModule.test_after_fork() now calls os.waitpid() to read the exit status of the child process to avoid creating a zombie process.</li> <li><a class="reference external" href="https://bugs.python.org/issue31160">bpo-31160</a>: test_tempfile: TestRandomNameSequence.test_process_awareness() now calls os.waitpid() to avoid leaking a zombie process.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: fork_wait.py tests now joins threads, to not leak running threads in the background.</li> <li><a class="reference external" href="https://bugs.python.org/issue30830">bpo-30830</a>: test_logging uses threading_setup/cleanup. Replace &#64;support.reap_threads on some methods with support.threading_setup() in setUp() and support.threading_cleanup() in tearDown() in BaseTest.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: test_httpservers joins the server thread.</li> <li><a class="reference external" href="https://bugs.python.org/issue31250">bpo-31250</a>, test_asyncio: fix dangling threads. Explicitly call shutdown(wait=True) on executors to wait until all threads complete to prevent side effects between tests. Fix test_loop_self_reading_exception(): don't mock loop.close(). Previously, the original close() method was called rather than the mock, because how set_event_loop() registered loop.close().</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Explicitly clear the server attribute in test_ftplib and test_poplib to prevent dangling thread. Clear also self.server_thread attribute in TestTimeouts.tearDown().</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join threads in tests. Call thread.join() on threads to prevent the &quot;dangling threads&quot; warning.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join threads in test_hashlib: use thread.join() to wait until the parallel hash tasks complete rather than using events. Calling thread.join() prevent &quot;dangling thread&quot; warnings.</li> <li><a class="reference external" href="https://bugs.python.org/issue31234">bpo-31234</a>: Join threads in test_queue. Call thread.join() to prevent the &quot;dangling thread&quot; warning.</li> </ul> <p><strong>Next report:</strong> <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny bugs)</a>.</p> </div> </div> My contributions to CPython during 2017 Q3: Part 12017-10-18T15:00:00+02:002017-10-18T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-10-18:/contrib-cpython-2017q3-part1.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3 (july, august, september), Part 1.</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part1)</a>.</p> <p>Next reports:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling threads)</a>.</li> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny bugs)</a>.</li> </ul> <p>Summary:</p> <ul class="simple"> <li>Statistics</li> <li>Security fixes …</li></ul><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q3 (july, august, september), Part 1.</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part1)</a>.</p> <p>Next reports:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling threads)</a>.</li> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part3.html">My contributions to CPython during 2017 Q3: Part 3 (funny bugs)</a>.</li> </ul> <p>Summary:</p> <ul class="simple"> <li>Statistics</li> <li>Security fixes</li> <li>Enhancement: socket.close() now ignores ECONNRESET</li> <li>Removal of the macOS job of Travis CI</li> <li>New test.pythoninfo utility</li> <li>Revert commits if buildbots are broken</li> <li>Fix the Python test suite</li> </ul> <div class="section" id="statistics"> <h2>Statistics</h2> <pre class="literal-block"> # All branches $ git log --after=2017-06-30 --before=2017-10-01 --reverse --branches='*' --author=Stinner|grep '^commit ' -c 209 # Master branch only $ git log --after=2017-06-30 --before=2017-10-01 --reverse --author=Stinner origin/master|grep '^commit ' -c 97 </pre> <p>Statistics: I pushed <strong>97</strong> commits in the master branch on a <strong>total of 209 commits</strong>, remaining: 112 commits in the other branches (backports, fixes specific to Python 2.7, security fixes in Python 3.3 and 3.4, etc.)</p> </div> <div class="section" id="security-fixes"> <h2>Security fixes</h2> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue30947">bpo-30947</a>: Update libexpat from 2.2.1 to 2.2.3. Fix applied to master, 3.6, 3.5, 3.4, 3.3 and 2.7 branches! Expat 2.2.2 and 2.2.3 fixed multiple security vulnerabilities. <a class="reference external" href="http://python-security.readthedocs.io/vuln/expat_2.2.3.html">http://python-security.readthedocs.io/vuln/expat_2.2.3.html</a></li> <li>Fix whichmodule() of _pickle: : _PyUnicode_FromId() can return NULL, replace Py_INCREF() with Py_XINCREF(). Fix coverity report: CID 1417269.</li> <li><a class="reference external" href="https://bugs.python.org/issue30860">bpo-30860</a>: <tt class="docutils literal">_PyMem_Initialize()</tt> contains code which is never executed. Replace the runtime check with a build assertion. Fix Coverity CID 1417587.</li> </ul> <p>See also my <a class="reference external" href="http://python-security.readthedocs.io/">python-security website</a>.</p> </div> <div class="section" id="enhancement-socket-close-now-ignores-econnreset"> <h2>Enhancement: socket.close() now ignores ECONNRESET</h2> <p><a class="reference external" href="https://bugs.python.org/issue30319">bpo-30319</a>: socket.close() now ignores ECONNRESET. Previously, many network tests failed randomly with ConnectionResetError on socket.close().</p> <p>Patching all functions calling socket.close() would require a lot of work, and it was surprising to get a &quot;connection reset&quot; when closing a socket.</p> <p>Who cares that the peer closed the connection, since we are already closing it!?</p> <p>Note: socket.close() was modified in Python 3.6 to raise OSError on failure (<a class="reference external" href="https://bugs.python.org/issue26685">bpo-26685</a>).</p> </div> <div class="section" id="removal-of-the-macos-job-of-travis-ci"> <h2>Removal of the macOS job of Travis CI</h2> <a class="reference external image-reference" href="https://travis-ci.org/"> <img alt="call_method microbenchmark" class="align-right" src="https://vstinner.github.io/images/travis-ci.png" /> </a> <p>While the Linux jobs of Travis CI usually takes 15 minutes, up to 30 minutes in the worst case, the macOS job of Travis CI regulary took longer than 30 minutes, sometimes longer than 1 hour.</p> <p>While the macOS job was optional, sometimes it gone mad and prevented a PR to be merged. Cancelling the job marked Travis CI as failed on a PR, so it was still not possible to merge the PR, whereas, again, the job is marked as optional (&quot;Allowed Failure&quot;).</p> <p>Moreover, when the macOS job failed, the failure was not reported on the PR, since the job was marked as optional. The only way to notify a failure was to go to Travis CI and wait at least 30 minutes (whereas the Linux jobs already completed and it was already possible merge a PR...).</p> <p>I sent a first mail in June: <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2017-June/004661.html">[python-committers] macOS Travis CI job became mandatory?</a></p> <p>In september, we decided to remove the macOS job during the CPython sprint at Instagram (see my previous <a class="reference external" href="https://vstinner.github.io/new-python-c-api.html">New C API</a> article), to not slowdown our development speed (<a class="reference external" href="https://bugs.python.org/issue31355">bpo-31355</a>). I sent another email to announce the change: <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2017-September/004824.html">[python-committers] Travis CI: macOS is now blocking -- remove macOS from Travis CI?</a>.</p> <p>After the sprint, it was decided to not add again the macOS job, since we have 3 macOS buildbots. It's enough to detect regressions specific to macOS.</p> <p>After the removal of the macOS end, at the end of september, Travis CI published an article about the bad performances of their macOS fleet: <a class="reference external" href="https://blog.travis-ci.com/2017-09-22-macos-update">Updating Our macOS Open Source Offering</a>. Sadly, the article confirms that the situation is not going to evolve quickly.</p> </div> <div class="section" id="new-test-pythoninfo-utility"> <h2>New test.pythoninfo utility</h2> <p>To understand the &quot;Segfault when readline history is more then 2 * history size&quot; crash of <a class="reference external" href="https://bugs.python.org/issue29854">bpo-29854</a>, I modified <tt class="docutils literal">test_readline</tt> to log libreadline versions. I also added <tt class="docutils literal">readline._READLINE_LIBRARY_VERSION</tt>. My colleague <strong>Nir Soffer</strong> wrote the final readline fix: skip the test on old readline versions.</p> <p>As a follow-up of this issue, I added a new <tt class="docutils literal">test.pythoninfo</tt> program to log many information to debug Python tests (<a class="reference external" href="https://bugs.python.org/issue30871">bpo-30871</a>). pythoninfo is now run on Travis CI, AppVeyor and buildbots.</p> <p>Example of output:</p> <pre class="literal-block"> $ ./python -m test.pythoninfo (...) _decimal.__libmpdec_version__: 2.4.2 expat.EXPAT_VERSION: expat_2.2.4 gdb_version: GNU gdb (GDB) Fedora 8.0.1-26.fc26 locale.encoding: UTF-8 os.cpu_count: 4 (...) time.timezone: -3600 time.tzname: ('CET', 'CEST') tkinter.TCL_VERSION: 8.6 tkinter.TK_VERSION: 8.6 tkinter.info_patchlevel: 8.6.6 zlib.ZLIB_RUNTIME_VERSION: 1.2.11 zlib.ZLIB_VERSION: 1.2.11 </pre> <p><tt class="docutils literal">test.pythoninfo</tt> can be easily extended to log more information, without polluting the output of the Python test suite which is already too verbose and very long.</p> </div> <div class="section" id="revert-commits-if-buildbots-are-broken"> <h2>Revert commits if buildbots are broken</h2> <p>Thanks to my work done last months on the Python test suite, the buildbots are now very reliable. When a buildbot fails, it becomes very likely that it's a real regression, and not a random failure caused by a bug in the Python test suite.</p> <p>I proposed a new rule: <strong>revert a change if it breaks builbots and the but cannot be fixed easily</strong>:</p> <blockquote> <p>So I would like to set a new rule: if I'm unable to fix buildbots failures caused by a recent change quickly (say, in less than 2 hours), I propose to revert the change.</p> <p>It doesn't mean that the commit is bad and must not be merged ever. No. It would just mean that we need time to work on fixing the issue, and it shouldn't impact other pending changes, to keep a sane master branch.</p> </blockquote> <p><a class="reference external" href="https://mail.python.org/pipermail/python-committers/2017-June/004588.html">[python-committers] Revert changes which break too many buildbots</a>.</p> <div class="section" id="test-datetime"> <h3>test_datetime</h3> <p>The first revert was an enhancement of test_datetime, <a class="reference external" href="https://bugs.python.org/issue30822">bpo-30822</a>:</p> <pre class="literal-block"> commit 98b6bc3bf72532b784a1c1fa76eaa6026a663e44 Author: Utkarsh Upadhyay &lt;mail&#64;musicallyut.in&gt; Date: Sun Jul 2 14:46:04 2017 +0200 bpo-30822: Fix testing of datetime module. (#2530) Only C implementation was tested. </pre> <p>I wrote an email to announce the revert: <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2017-July/004673.html">[python-committers] Revert changes which break too many buildbots</a>.</p> <p>It took 15 days to decide how to fix properly the issue (exclude <tt class="docutils literal">tzdata</tt> from test resources). I don't regret my revert, since having broken buildbots for 15 days would be very annoying.</p> </div> <div class="section" id="python-gdb-py-fix"> <h3>python-gdb.py fix</h3> <p>I also reverted this commit of <a class="reference external" href="https://bugs.python.org/issue30983">bpo-30983</a>:</p> <pre class="literal-block"> commit 2e0f4db114424a00354eab889ba8f7334a2ab8f0 Author: Bruno &quot;Polaco&quot; Penteado &lt;polaco&#64;gmail.com&gt; Date: Mon Aug 14 23:14:17 2017 +0100 bpo-30983: eval frame rename in pep 0523 broke gdb's python extension (#2803) pep 0523 renames PyEval_EvalFrameEx to _PyEval_EvalFrameDefault while the gdb python extension only looks for PyEval_EvalFrameEx to understand if it is dealing with a frame. Final effect is that attaching gdb to a python3.6 process doesnt resolve python objects. Eg. py-list and py-bt dont work properly. This patch fixes that. Tested locally on python3.6 </pre> <p>My comment on the issue:</p> <blockquote> <p>I chose to revert the change because I don't have the bandwidth right now to investigate why the change broke test_gdb.</p> <p>I'm surprised that a change affecting python-gdb.py wasn't properly tested manually using test_gdb.py :-( I understand that Travis CI doesn't have gdb and/or that the test pass in some cases?</p> <p>The revert only gives us more time to design the proper solution.</p> </blockquote> <p>Hopefully, a new fixed commit was pushed 4 days later and this one didn't break buildbots!</p> </div> </div> <div class="section" id="fix-the-python-test-suite"> <h2>Fix the Python test suite</h2> <p>As usual, I spent a significant part of my time to fix bugs in the Python test suite to make it more reliable and more &quot;usable&quot;.</p> <ul> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue30822">bpo-30822</a>: Exclude <tt class="docutils literal">tzdata</tt> from <tt class="docutils literal">regrtest <span class="pre">--all</span></tt>. When running the test suite using <tt class="docutils literal"><span class="pre">--use=all</span></tt> / <tt class="docutils literal"><span class="pre">-u</span> all</tt>, exclude <tt class="docutils literal">tzdata</tt> since it makes test_datetime too slow (15-20 min on some buildbots, just this single test file) which then times out on some buildbots. <tt class="docutils literal"><span class="pre">-u</span> tzdata</tt> must now be enabled explicitly.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue30188">bpo-30188</a>, test_nntplib: Catch also ssl.SSLEOFError in NetworkedNNTPTests.setUpClass(), not only EOFError. (<em>Sadly, test_nntplib still fails randomly with EOFError or SSLEOFError...</em>)</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31009">bpo-31009</a>: Fix <tt class="docutils literal">support.fd_count()</tt> on Windows. Call <tt class="docutils literal">msvcrt.CrtSetReportMode()</tt> to not kill the process nor log any error on stderr on os.dup(fd) if the file descriptor is invalid.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31034">bpo-31034</a>: Reliable signal handler for test_asyncio. Don't rely on the current SIGHUP signal handler, make sure that it's set to the &quot;default&quot; signal handler: SIG_DFL. A colleague reported me that the Python test suite hangs on running test_subprocess_send_signal() of test_asyncio. After analysing the issue, it seems like the test hangs because the RPM package builder ignores SIGHUP.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31028">bpo-31028</a>: Fix test_pydoc when run directly. Fix <tt class="docutils literal">get_pydoc_link()</tt>: get the absolute path to <tt class="docutils literal">__file__</tt> to prevent relative directories.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31066">bpo-31066</a>: Fix <tt class="docutils literal">test_httpservers.test_last_modified()</tt>. Write the temporary file on disk and then get its modification time.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31173">bpo-31173</a>: Rewrite WSTOPSIG test of test_subprocess.</p> <p>The current <tt class="docutils literal">test_child_terminated_in_stopped_state()</tt> function test creates a child process which calls <tt class="docutils literal">ptrace(PTRACE_TRACEME, 0, 0)</tt> and then crash (SIGSEGV). The problem is that calling <tt class="docutils literal">os.waitpid()</tt> in the parent process is not enough to close the process: the child process remains alive and so the unit test leaks a child process in a strange state. Closing the child process requires non-trivial code, maybe platform specific.</p> <p>Remove the functional test and replaces it with an unit test which mocks <tt class="docutils literal">os.waitpid()</tt> using a new <tt class="docutils literal">_testcapi.W_STOPCODE()</tt> function to test the <tt class="docutils literal">WIFSTOPPED()</tt> path.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31008">bpo-31008</a>: Fix asyncio test_wait_for_handle on Windows, tolerate a difference of 50 ms.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31235">bpo-31235</a>: Fix ResourceWarning in test_logging: always close all asyncore dispatchers (ignoring errors if any).</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue30121">bpo-30121</a>: Add test_subprocess.test_nonexisting_with_pipes(). Test the Popen failure when Popen was created with pipes. Create also NONEXISTING_CMD variable in test_subprocess.py.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31250">bpo-31250</a>, test_asyncio: fix EventLoopTestsMixin.tearDown(). Call doCleanups() to close the loop after calling executor.shutdown(wait=True).</p> </li> <li><p class="first">test_ssl: Implement timeout in ssl_io_loop(). The timeout parameter was not used.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31448">bpo-31448</a>, test_poplib: Call POP3.close(), don't close close directly the sock attribute to fix a ResourceWarning.</p> </li> <li><p class="first">os.test_utime_current(): tolerate 50 ms delta.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31135">bpo-31135</a>: ttk: fix LabeledScale and OptionMenu destroy() method. Call the parent destroy() method even if the used attribute doesn't exist. The LabeledScale.destroy() method now also explicitly clears label and scale attributes to help the garbage collector to destroy all widgets.</p> </li> <li><p class="first"><a class="reference external" href="https://bugs.python.org/issue31479">bpo-31479</a>: Always reset the signal alarm in tests. Use the <tt class="docutils literal">try: ... finally: signal.signal(0)</tt> pattern to make sure that tests don't &quot;leak&quot; a pending fatal signal alarm. Move some signal.alarm() calls into the try block.</p> </li> </ul> <p><strong>Next report:</strong> <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part2.html">My contributions to CPython during 2017 Q3: Part 2 (dangling threads)</a>.</p> </div> Python Security2017-09-15T22:00:00+02:002017-09-15T22:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-09-15:/python-security.html<p>I am working on the Python security for years, but I never wrote anything about that. Let's fix this!</p> <div class="section" id="psrt"> <h2>PSRT</h2> <p>I am part of the Python Security Response Team (PSRT): I get emails sent to <a class="reference external" href="mailto:security&#64;python.org">security&#64;python.org</a>. I try to analyze each report to validate that the bug is …</p></div><p>I am working on the Python security for years, but I never wrote anything about that. Let's fix this!</p> <div class="section" id="psrt"> <h2>PSRT</h2> <p>I am part of the Python Security Response Team (PSRT): I get emails sent to <a class="reference external" href="mailto:security&#64;python.org">security&#64;python.org</a>. I try to analyze each report to validate that the bug is reproductible, find impacted Python versions and start to discuss how to fix the vulnerability. In some cases, the reported issue is not a security vulnerability, is not related to CPython, or sometimes is already fixed. We also get reports about CPython, but also the web sites and other projects related to Python.</p> <p>Warning: I don't represent the PSRT, I speak for my own!</p> </div> <div class="section" id="vulnerabilities-sent-to-psrt"> <h2>Vulnerabilities sent to PSRT</h2> <p>In this article, I will focus on vulnerabilities impacting CPython: the C and Python code of CPython core and the standard library.</p> <p>When vulnerabilities are obvious bugs, they are quickly fixed. Done.</p> <p>But it's not uncommon that fixing a vulnerability impacts the backward compatibility which is a major concern of CPython core developers. There is also a risk of rejecting legit input data because the added checks are too strict. We have to be very careful and so fixing vulnerabilities can take weeks, if not months in the worst case.</p> <p>While CPython has few active core developers, the PSRT has even lesser active members to handle incoming reports. We are volunteers, so please be kind and patient...</p> </div> <div class="section" id="example-of-a-complex-fix"> <h2>Example of a complex fix</h2> <p>The <a class="reference external" href="https://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html">urllib FTP protocol stream injection</a> vulnerability was reported to the PSRT at 2016-01-15. The fix was only merged at 2017-07-26.</p> <p>First, it was not obvious how the vulnerability can be exploited, nor if it should be fixed.</p> <p>Then it was not obvious if the vulnerability should be fixed in the urllib module or in the ftplib module.</p> <p>Even if the bug was public, it didn't get much attention. Since I don't know well how the urllib module, I wrote an email to the python-dev mailing list: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-July/148699.html">Need help to fix urllib(.parse) vulnerabilities</a>.</p> <p>I proposed a fix for the urllib module: <a class="reference external" href="https://bugs.python.org/issue30713">Reject newline character (U+000A) in URLs in urllib.parse</a>. But it was rejected, since it was the wrong approach and my checks were too strict in many cases (rejected legit requests).</p> <p>The final fix rejects <tt class="docutils literal">\b</tt> and <tt class="docutils literal">\r</tt> newline characters in the putline() method of the ftplib module.</p> </div> <div class="section" id="track-known-and-fixed-cpython-vulnerabilities"> <h2>Track known and fixed CPython vulnerabilities</h2> <p>Currently, not least that six branches still get security fixes!</p> <ul class="simple"> <li>Python 2.7</li> <li>Python 3.3</li> <li>Python 3.4</li> <li>Python 3.5</li> <li>Python 3.6</li> <li>master: the development branch</li> </ul> <p>Last year, I added a table to the Python developer guide to help me to track the status of each branch: see the <a class="reference external" href="https://devguide.python.org/#status-of-python-branches">Status of Python branches</a>.</p> <p>This year, I created a tool to help me to track known CPython vulnerabilities: <a class="reference external" href="https://github.com/vstinner/python-security">python-security project</a> (hosted at GitHub). The <a class="reference external" href="https://github.com/vstinner/python-security/blob/master/vulnerabilities.yaml">vulnerabilities.yaml file</a> is a YAML file with one section per vulnerability. Each vulnerability has a title, link to the Python bug, disclosure date, reported date, commits, etc.</p> <p>The tool gets the date of commits and the Git tags which contains the commit to infer the first Python versions of each branch which contain the fix. It also build a timeline to help to understand how the vulnerability was handled.</p> <p>I also wanted to be more transparent on how we handle vulnerabilities and our velocity to fix them.</p> <p>Honestly, I was disappointed that it took so long to fix some vulnerabilities in the past. Hopefully, it seems like we are more reactive nowadays!</p> </div> <div class="section" id="example-of-a-fixed-vulnerability"> <h2>Example of a fixed vulnerability</h2> <p>Example: <a class="reference external" href="https://python-security.readthedocs.io/vuln/cve-2016-5699_http_header_injection.html">CVE-2016-5699: HTTP header injection</a>.</p> <p>Right now, Python 3.3 is still vulnerable (my fix was commited, I am now waiting Python 3.3.7 which is coming at the end of september).</p> <p>Since the vulnerability was reported, it took 108 days to merge the fix, 72 more days (total 180 days) for the first release including the fix (Python 2.7.10).</p> <p>Sadly, the PSRT doesn't compute a severity of vulnerabilities yet.</p> <p>Hopefully, for this vulnerability, web frameworks were able to workaround the vulnerability by input sanitization.</p> </div> <div class="section" id="backport-all-fixes"> <h2>Backport all fixes</h2> <p>Last months, I backported fixes to the six branches which still accept security fixes, to respect the contract with our users: we are doing our best to protect you!</p> <p>The good news is that with Python 2.7.14 and Python 3.3.7 releases scheduled this month, all major security vulnerabilities will be fixed in all maintained Python branches!</p> <p>Some fixes were not backported on purpose. For example, the <a class="reference external" href="https://python-security.readthedocs.io/vuln/cve-2013-7040_hash_not_properly_randomized.html#cve-2013-7040-hash-not-properly-randomized">CVE-2013-7040: Hash not properly randomized</a> vulnerability requires to change the hash algorithm and we decided to not touch Python 2.7 and 3.3 for backward compatibility reasons (don't break code relying on the exact hash function). The issue was fixed in Python 3.4 by using the SipHash hash algorithm which uses a hash secret (generated randomly by Python at startup).</p> </div> <div class="section" id="python-security-documentation"> <h2>Python security documentation</h2> <p>Last months, I also started to collect random notes about the Python security.</p> <p>Explore my <a class="reference external" href="https://python-security.readthedocs.io/">python-security.readthedocs.io</a> documentation and send me feedback!</p> </div> A New C API for CPython2017-09-07T18:00:00+02:002017-09-07T18:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-09-07:/new-python-c-api.html<p>I am currently at a CPython sprint 2017 at Facebook. We are discussing my idea of writing a new C API for CPython hiding implementation details and replacing macros with function calls.</p> <img alt="CPython sprint at Facebook, september 2017" src="https://vstinner.github.io/images/cpython_sprint_sept2017.jpg" /> <p>This article tries to explain why the CPython C API needs to <strong>evolve</strong>.</p> <div class="section" id="c-api-prevents-further-optimizations"> <h2>C API prevents further optimizations …</h2></div><p>I am currently at a CPython sprint 2017 at Facebook. We are discussing my idea of writing a new C API for CPython hiding implementation details and replacing macros with function calls.</p> <img alt="CPython sprint at Facebook, september 2017" src="https://vstinner.github.io/images/cpython_sprint_sept2017.jpg" /> <p>This article tries to explain why the CPython C API needs to <strong>evolve</strong>.</p> <div class="section" id="c-api-prevents-further-optimizations"> <h2>C API prevents further optimizations</h2> <p>The CPython <tt class="docutils literal">PyListObject</tt> type uses an array of <tt class="docutils literal">PyObject*</tt> objects. PyPy is able to use a C array of integers if the list only contains small integers. CPython cannot because PyList_GET_ITEM(list, index) is implemented as a macro:</p> <pre class="literal-block"> #define PyList_GET_ITEM(op, i) ((PyListObject *)op)-&gt;ob_item[i] </pre> <p>The macro relies on the <tt class="docutils literal">PyListObject</tt> structure:</p> <pre class="literal-block"> typedef struct { PyVarObject ob_base; PyObject **ob_item; // &lt;-- pointer to real data Py_ssize_t allocated; } PyListObject; typedef struct { PyObject ob_base; Py_ssize_t ob_size; /* Number of items in variable part */ } PyVarObject; typedef struct _object { Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; </pre> </div> <div class="section" id="api-and-abi"> <h2>API and ABI</h2> <p>Compiling C extension code using <tt class="docutils literal">PyList_GET_ITEM()</tt> produces machine code accessing <tt class="docutils literal">PyListObject</tt> members. Something like (C pseudo code):</p> <pre class="literal-block"> PyObject **items; PyObject *item; items = (PyObject **)(((char*)list) + 24); item = items[i]; </pre> <p>The offset 24 is hardcoded in the C extension object file: the <strong>API</strong> (<strong>programming</strong> interface) becomes the <strong>ABI</strong> (<strong>binary</strong> interface).</p> <p>But debug builds use a different memory layout:</p> <pre class="literal-block"> typedef struct _object { struct _object *_ob_next; // &lt;--- two new fields are added struct _object *_ob_prev; // &lt;--- for debug purpose Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; </pre> <p>The machine code becomes something like:</p> <pre class="literal-block"> items = (PyObject **)(((char*)op) + 40); item = items[i]; </pre> <p>The offset changes from 24 to 40 (+16, two pointers of 8 bytes).</p> <p>C extensions have to be recompiled to work on Python compiled in debug mode.</p> <p>Another example is Python 2.7 which uses a different ABI for UTF-16 and UCS-4 Unicode string: the <tt class="docutils literal"><span class="pre">--with-wide-unicode</span></tt> configure option.</p> </div> <div class="section" id="stable-abi"> <h2>Stable ABI</h2> <p>If the machine code doesn't use the offset, it would be able to only compile C extensions once.</p> <p>A solution is to replace PyList_GET_ITEM() <strong>macro</strong> with a <strong>function</strong>:</p> <pre class="literal-block"> PyObject* PyList_GET_ITEM(PyObject *list, Py_ssize_t index); </pre> <p>defined as:</p> <pre class="literal-block"> PyObject* PyList_GET_ITEM(PyObject *list, Py_ssize_t index) { return ((PyListObject *)list)-&gt;ob_item[i]; } </pre> <p>The machine code becomes a <strong>function call</strong>:</p> <pre class="literal-block"> PyObject *item; item = PyList_GET_ITEM(list, index); </pre> </div> <div class="section" id="specialized-list-for-small-integers"> <h2>Specialized list for small integers</h2> <p>If C extension objects don't access structure members anymore, it becomes possible to modify the memory layout.</p> <p>For example, it's possible to design a specialized implementation of <tt class="docutils literal">PyListObject</tt> for small integers:</p> <pre class="literal-block"> typedef struct { PyVarObject ob_base; int use_small_int; PyObject **pyobject_array; int32_t *small_int_array; // &lt;-- new compact C array for integers Py_ssize_t allocated; } PyListObject; PyObject* PyList_GET_ITEM(PyObject *op, Py_ssize_t index) { PyListObject *list = (PyListObject *)op; if (list-&gt;use_small_int) { int32_t item = list-&gt;small_int_array[index]; /* create a new object at each call */ return PyLong_FromLong(item); } else { return list-&gt;pyobject_array[index]; } } </pre> <p>It's just an example to show that it becomes possible to modify PyObject structures. I'm not sure that it's useful in practice.</p> </div> <div class="section" id="multiple-python-runtimes"> <h2>Multiple Python &quot;runtimes&quot;</h2> <p>Assuming that all used C extensions use the new stable ABI, we can now imagine multiple specialized Python runtimes installed in parallel, instead of a single runtime:</p> <ul class="simple"> <li>python3.7: regular/legacy CPython, backward compatible</li> <li>python3.7-dbg: runtime checks to ease debug</li> <li>fasterpython3.7: use specialized list</li> <li>etc.</li> </ul> <p>The <tt class="docutils literal">python3</tt> runtime would remain <strong>fully</strong> compatible since it would use the old C API with macros and full structures. So by default, everything will continue to work.</p> <p>But the other runtimes require that all imported C extensions were compiled with the new C API.</p> <p><tt class="docutils literal"><span class="pre">python3.7-dbg</span></tt> adds more checks tested at runtime. Example:</p> <pre class="literal-block"> PyObject* PyList_GET_ITEM(PyObject *list, Py_ssize_t index) { assert(PyList_Check(list)); assert(0 &lt;= index &amp;&amp; index &lt; Py_SIZE(list)); return ((PyListObject *)list)-&gt;ob_item[i]; } </pre> <p>Currently, some Linux distributions provide a <tt class="docutils literal"><span class="pre">python3-dbg</span></tt> binary, but may not provide <tt class="docutils literal"><span class="pre">-dbg</span></tt> binary packages of all C extensions. So all C extensions have to be recompiled manually which is quite painful (need to install build dependencies, wait until everthing is recompiled, etc.).</p> </div> <div class="section" id="experiment-optimizations"> <h2>Experiment optimizations</h2> <p>With the new C API, it becomes possible to implement a new class of optimizations.</p> <div class="section" id="tagged-pointer"> <h3>Tagged pointer</h3> <p>Store small integers directly into the pointer value. Reduce the memory usage, avoid expensive unboxing-boxing.</p> <p>See <a class="reference external" href="https://en.wikipedia.org/wiki/Tagged_pointer">Wikipedia: Tagged pointer</a>.</p> </div> <div class="section" id="no-garbage-collector-gc-at-all"> <h3>No garbage collector (GC) at all</h3> <p>Python runtime without GC at all. Remove the following header from objects tracked by the GC:</p> <pre class="literal-block"> struct { union _gc_head *gc_next; union _gc_head *gc_prev; Py_ssize_t gc_refs; } PyGC_Head; </pre> <p>It would remove 24 bytes per object tracked by the GC.</p> <p>For comparison, the smallest Python object is &quot;object()&quot; which only takes 16 bytes.</p> </div> <div class="section" id="tracing-garbage-collector-without-reference-counting"> <h3>Tracing garbage collector without reference counting</h3> <p>This idea is really the most complex and most experimental idea, but IMHO it's required to &quot;unlock&quot; Python performances.</p> <ul class="simple"> <li>Write a new API to keep track of pointers:<ul> <li>Declare a variable storing a <tt class="docutils literal">PyObject*</tt> object</li> <li>Set a pointer</li> <li>Maybe also read a pointer?</li> </ul> </li> <li>Modify C extensions to use this new API</li> <li>Implement a tracing garbage collector which can move objects in memory to compact memory</li> <li>Remove reference counting</li> </ul> <p>It even seems possible to implement a tracing garbage collector <strong>and</strong> use reference counting. But I'm not an expert in this area, need to dig the topic.</p> <p>Questions:</p> <ul class="simple"> <li>Is it possible to fix all C extensions to use the new API? Should be an opt-in option in a first stage.</li> <li>Is it possible to emulate Py_INCREF/DECREF API, for backward compatibility, using an hash table which maintains a reference counter outside <tt class="docutils literal">PyObject</tt>?</li> <li>Do we need to fix all C extensions?</li> </ul> <p>Read also <a class="reference external" href="https://en.wikipedia.org/wiki/Tracing_garbage_collection">Wikipedia: Tracing garbage collection</a>.</p> </div> <div class="section" id="gilectomy"> <h3>Gilectomy</h3> <p>Abstracting the ABI allows to customize the runtime for Gilectomy needs, to be able to reemove the GIL.</p> <p>Removing reference counting would make Gilectomy much simpler.</p> </div> </div> My contributions to CPython during 2017 Q2 (part 3)2017-07-13T17:00:00+02:002017-07-13T17:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-13:/contrib-cpython-2017q2-part3.html<p>This is the third part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p> <ul class="simple"> <li>Security</li> <li>Trick bug: Clang 4.0, dtoa and strict aliasing</li> <li>sigwaitinfo() race condition in test_eintr</li> <li>FreeBSD test_subprocess core dump</li> </ul> <p>Previous reports:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1)</a>.</li> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part2.html">My contributions to CPython …</a></li></ul><p>This is the third part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p> <ul class="simple"> <li>Security</li> <li>Trick bug: Clang 4.0, dtoa and strict aliasing</li> <li>sigwaitinfo() race condition in test_eintr</li> <li>FreeBSD test_subprocess core dump</li> </ul> <p>Previous reports:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1)</a>.</li> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part2.html">My contributions to CPython during 2017 Q2 (part 2)</a>.</li> </ul> <p>Next report:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part1.html">My contributions to CPython during 2017 Q3: Part 1</a>.</li> </ul> <div class="section" id="security"> <h2>Security</h2> <div class="section" id="backport-fixes"> <h3>Backport fixes</h3> <p>I am trying to fix all known security fixes in the 6 maintained Python branches: 2.7, 3.3, 3.4, 3.5, 3.6 and master.</p> <p>I created the <a class="reference external" href="http://python-security.readthedocs.io/">python-security.readthedocs.io</a> website to track these vulnerabilities, especially which Python versions are fixed, to identifiy missing backports.</p> <p>Python 2.7, 3.5, 3.6 and master are quite good, I am still working on backporting fixes into 3.4 and 3.3. Larry Hastings merged my 3.4 backports and other security fixes, and scheduled a new 3.4.7 release next weeks. Later, I will try to fix Python 3.3 as well, before its end-of-life, scheduled for the end of september.</p> <p>See the <a class="reference external" href="https://docs.python.org/devguide/#status-of-python-branches">Status of Python branches</a> in the devguide.</p> </div> <div class="section" id="libexpat-2-2"> <h3>libexpat 2.2</h3> <p>Python embeds a copy of libexpat to ease Python compilation on Windows and macOS. It means that we have to remind to upgrade it at each libexpat release. It is especially important when security vulerabilities are fixed in libexpat.</p> <p>libexpat 2.2 was released at 2016-06-21 and it contains such fixes for vulnerabilities, see: <a class="reference external" href="http://python-security.readthedocs.io/vuln/cve-2016-0718_expat_2.2_bug_537.html">CVE-2016-0718: expat 2.2, bug #537</a>.</p> <p>Sadly, it took us a few months to upgrade libexpat. I wrote a short shell script to easily upgrade libexpat: recreate the <tt class="docutils literal">Modules/expat/</tt> directory from a libexpat tarball.</p> <p>My commit:</p> <blockquote> <p>bpo-29591: Upgrade Modules/expat to libexpat 2.2 (#2164)</p> <p>Remove the configuration (<tt class="docutils literal"><span class="pre">Modules/expat/*config.h</span></tt>) of unsupported platforms: Amiga, MacOS Classic on PPC32, Open Watcom.</p> <p>Remove XML_HAS_SET_HASH_SALT define: it became useless since our local expat copy was upgrade to expat 2.1 (it's now expat 2.2.0).</p> </blockquote> <p>I upgraded libexpat to 2.2 in Pytohn 2.7, 3.4, 3.5, 3.6 and master branches. I still have a pending pull request for 3.3.</p> </div> <div class="section" id="libexpat-2-2-1"> <h3>libexpat 2.2.1</h3> <p>Just after I finally upgraded our libexpat copy to 2.2.0... libexpat 2.2.1 was released with new security fixes! See <a class="reference external" href="http://python-security.readthedocs.io/vuln/cve-2017-9233_expat_2.2.1.html">CVE-2017-9233: Expat 2.2.1</a></p> <p>Again, I upgraded libexpat to 2.2.1 in all branches (pending: 3.3), see bpo-30694. My commit:</p> <blockquote> <p>Upgrade expat copy from 2.2.0 to 2.2.1 to get fixes of multiple security vulnerabilities including:</p> <ul class="simple"> <li>CVE-2017-9233 (External entity infinite loop DoS),</li> <li>CVE-2016-9063 (Integer overflow, re-fix),</li> <li>CVE-2016-0718 (Fix regression bugs from 2.2.0's fix to CVE-2016-0718)</li> <li>CVE-2012-0876 (Counter hash flooding with SipHash).</li> </ul> <p>Note: the CVE-2016-5300 (Use os-specific entropy sources like getrandom) doesn't impact Python, since Python already gets entropy from the OS to set the expat secret using <tt class="docutils literal">XML_SetHashSalt()</tt>.</p> </blockquote> </div> <div class="section" id="urllib-splithost-vulnerability"> <h3>urllib splithost() vulnerability</h3> <p>Vulnerability: <a class="reference external" href="http://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html">bpo-30500: urllib connects to a wrong host</a>.</p> <p>While it was quick to confirm the vulnerability, it was tricky to decide how to properly <strong>fix it without breaking backward compatibility</strong>. We had too few unit tests, and no obvious definition of the <em>expected</em> behaviour. I contributed to the discussed and to polish the fix:</p> <p>bpo-30500 commit:</p> <blockquote> Fix urllib.parse.splithost() to correctly parse fragments. For example, <tt class="docutils literal"><span class="pre">splithost('//127.0.0.1#&#64;evil.com/')</span></tt> now correctly returns the <tt class="docutils literal">127.0.0.1</tt> host, instead of treating <tt class="docutils literal">&#64;evil.com</tt> as the host in an authentification (<tt class="docutils literal">login&#64;host</tt>).</blockquote> <p>Fix applied to master, 3.6, 3.5, 3.4 and 2.7; pending pull request for 3.3.</p> </div> <div class="section" id="travis-ci"> <h3>Travis CI</h3> <p>I also wrote a pull request to enable Travis CI and AppVeyor CI on Python 3.3 and 3.4 branches, to test security on CI. These changes are complex and not merged yet, but I am now confident that the CI will be enabled on 3.4!</p> <p>My PR for Python 3.4: <a class="reference external" href="https://github.com/python/cpython/pull/2475">[3.4] Backport CI config from master</a>.</p> </div> </div> <div class="section" id="tricky-bug-clang-4-0-dtoa-and-strict-aliasing"> <h2>Tricky bug: Clang 4.0, dtoa and strict aliasing</h2> <p>Aha, another funny story about compilers: bpo-30104.</p> <p>I noticed that the following tests started to fail on the &quot;AMD64 FreeBSD CURRENT Debug 3.x&quot; buildbot:</p> <ul class="simple"> <li>test_cmath</li> <li>test_float</li> <li>test_json</li> <li>test_marshal</li> <li>test_math</li> <li>test_statistics</li> <li>test_strtod</li> </ul> <p>First, I bet on a libc change on FreeBSD. Then, I found that test_strtod fails on FreeBSD using clang 4.0, but pass on FreeBSD using clang 3.8.</p> <p>I started to bisect the code on Linux using a subset of <tt class="docutils literal">Python/dtoa.c</tt>:</p> <ul class="simple"> <li>Start (integrated in CPython code base): 2,876 lines</li> <li>dtoa2.c (standalone): 2,865 lines</li> <li>dtoa5.c: 50 lines</li> </ul> <p>Extract of dtoa5.c:</p> <pre class="literal-block"> typedef union { double d; uint32_t L[2]; } U; struct Bigint { int wds; }; static double ratio(struct Bigint *a) { U da, db; int k, ka, kb; double r; da.d = 1.682; ka = 6; db.d = 1.0; kb = 5; k = ka - kb + 32 * (a-&gt;wds - 12); printf(&quot;k=%i\n&quot;, k); if (k &gt; 0) da.L[1] += k * 0x100000; else { k = -k; db.L[1] += k * 0x100000; } r = da.d / db.d; /* r == 3.364 */ return r; } </pre> <p>Even if I had a very short C code (50 lines) reproducing the bug, I was still unable to understand the bug. I read many articles about aliasing, and I still don't understand fully the bug... I suggest you these two good articles:</p> <ul class="simple"> <li><a class="reference external" href="http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html">Understanding Strict Aliasing</a> (Mike Acton, June 1, 2006)</li> <li><a class="reference external" href="http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html">Demystifying The Restrict Keyword</a> (Mike Acton, May 29, 2006)</li> </ul> <p>Anyway, I wanted to report the bug to clang (LLVM), but the LLVM bug tracker was migrating and I was unable to subscribe to get an account!</p> <p>In the meanwhile, <strong>Dimitry Andric</strong>, a FreeBSD developer, told me that he got <em>exactly</em> the same clang 4.0 issue with &quot;dtoa.c&quot; in the <em>julia</em> programming language. Two months before I saw the same bug, he already reported the bug to FreeBSD: <a class="reference external" href="https://bugs.freebsd.org/216770">lang/julia: fails to build with clang 4.0</a>, and to clang: <a class="reference external" href="https://bugs.llvm.org//show_bug.cgi?id=31928">After r280351: if/else blocks incorrectly optimized away?</a>.</p> <p>The &quot;problem&quot; is that clang developers disagree that it's a bug. In short, the discussion was around the C standard: does clang respect C aliasing rules or not? At the end, clang developers consider that they are right to optimize. To summarize:</p> <blockquote> It's a bug in the code, not in the compiler</blockquote> <p>So I made a first change to use the <tt class="docutils literal"><span class="pre">-fno-strict-aliasing</span></tt> flag when Python is compiled with clang:</p> <blockquote> Python/dtoa.c is not compiled correctly with clang 4.0 and optimization level -O2 or higher, because of an aliasing issue on the double/ULong[2] union.</blockquote> <p>But this change can make Python slower when compiled on clang, so I was asked to only compile <tt class="docutils literal">Python/dtoa.c</tt> with this flag:</p> <blockquote> On clang, only compile dtoa.c with -fno-strict-aliasing, use strict aliasing to compile all other C files.</blockquote> </div> <div class="section" id="sigwaitinfo-race-condition-in-test-eintr"> <h2>sigwaitinfo() race condition in test_eintr</h2> <div class="section" id="the-tricky-test-eintr"> <h3>The tricky test_eintr</h3> <p>When I wrote and implemented the <a class="reference external" href="https://www.python.org/dev/peps/pep-0475/">PEP 475, Retry system calls failing with EINTR</a>, I didn't expect so many annoying bugs of the newly written <tt class="docutils literal">test_eintr</tt> unit test. This test calls system calls while sending signals every 100 ms. Usually the test tries to block on a system call during at least 200 ms, to make sure that the syscall was interrupted at least once by a signal, to check that Python correctly retries the interrupted system call.</p> <p>Since the PEP was implemented, I already fixed many race conditions in <tt class="docutils literal">test_eintr</tt>, but there was still a race condition on the <tt class="docutils literal">sigwaitinfo()</tt> unit test. <em>Sometimes</em> on a <em>few specific buildbots</em> (FreeBSD), the test fails randomly.</p> </div> <div class="section" id="first-attempt"> <h3>First attempt</h3> <p>My first attempt was the <a class="reference external" href="http://bugs.python.org/issue25277">bpo-25277</a>, opened at 2015-09-30. I added faulthandler to dump tracebacks if a test hangs longer than 10 minutes. Then I changed the sleep from 200 ms to 2 seconds in the <tt class="docutils literal">sigwaitinfo()</tt> test... just to make the bug less likely, but using a longer sleep doesn't fix the root issue.</p> </div> <div class="section" id="second-attempt"> <h3>Second attempt</h3> <p>My second attempt was the <a class="reference external" href="http://bugs.python.org/issue25868">bpo-25868</a>, opened at 2015-12-15. I added a pipe to &quot;synchronize the parent and the child processes&quot;, to try to make the sigwaitinfo() test a little bit more reliable. I also reduced the sleep from 2 seconds to 100 ms.</p> <p>7 minutes after my fix, <strong>Martin Panter</strong> wrote:</p> <blockquote> <p>With the pipe, there is still a potential race after the parent writes to the pipe and before sigwaitinfo() is invoked, versus the child sleep() call.</p> <p>What do you think of my suggestion to block the signal? Then (in theory) it should be robust, rather than relying on timing.</p> </blockquote> <p>I replied that I wasn't sure that sigwaitinfo() EINTR error was still tested if we make his proposed change.</p> <p>One month later, Martin wrote a patch but I was unable to take a decision on his change. In september 2016, Martin noticed a new test failure on the FreeBSD 9 buildbot.</p> </div> <div class="section" id="third-attempt"> <h3>Third attempt</h3> <p>My third attempt is the bpo-30320, opened at 2017-05-09. This time, I really wanted to fix <em>all</em> buildbot random failures. Since I was now able to reproduce the bug on my FreeBSD VM, I was able to write a fix but also to check that:</p> <ul class="simple"> <li>sigwaitinfo() and sigtimedwait() fail with EINTR and Python automatically restarts the interrupted syscall</li> <li>I hacked the test file to only run the sigwaitinfo() and sigtimedwait() unit tests. Running the test in a loop doesn't fail: I ran the test during 5 minutes in 10 shells (tests running 10 times in parallel) =&gt; no failure, the race condition seems to be gone.</li> </ul> <p>So I <a class="reference external" href="https://github.com/python/cpython/commit/211a392cc15f9a7b1b8ce65d8f6c9f8237d1b77f">pushed my fix</a>:</p> <blockquote> <p>bpo-30320: test_eintr now uses pthread_sigmask()</p> <p>Rewrite sigwaitinfo() and sigtimedwait() unit tests for EINTR using pthread_sigmask() to fix a race condition between the child and the parent process.</p> <p>Remove the pipe which was used as a weak workaround against the race condition.</p> <p>sigtimedwait() is now tested with a child process sending a signal instead of testing the timeout feature which is more unstable (especially regarding to clock resolution depending on the platform).</p> </blockquote> <p>To be honest, I wasn't really confident, when I pushed my fix, that blocking the waited signal is the proper fix.</p> <p>So it took <strong>1 year and 8 months</strong> to really find and fix the root bug.</p> <p>Sadly, while I was working on dozens of other bugs, I completely lost track of Martin's patch, even if I opened the bpo-25868. Sorry Martin for forgotting to review your patch! But when you wrote it, I was unable to test that sigwaitinfo() was still failing with EINTR.</p> </div> </div> <div class="section" id="freebsd-test-subprocess-core-dump"> <h2>FreeBSD test_subprocess core dump</h2> <p>bpo-30448: During one month, some FreeBSD buildbots was emitting this warning which started to annoy me, since I was trying to fix <em>all</em> buildbots warnings:</p> <pre class="literal-block"> Warning -- files was modified by test_subprocess Before: [] After: ['python.core'] </pre> <p>I tried and failed to reproduce the warning on my FreeBSD 11 VM. I also asked a friend to reproduce the bug, but he also failed. I was developping my <tt class="docutils literal">test.bisect</tt> tool and I wanted to get access to a machine to reproduce the bug!</p> <p>Later, <strong>Kubilay Kocak</strong> aka <em>koobs</em> gave me access to his FreeBSD buildbots and in a few seconds with my new test.bisect tool, I identified that the <tt class="docutils literal">test_child_terminated_in_stopped_state()</tt> test triggers a deliberate crash, but doesn't disable core dump creation. The fix is simple, use <tt class="docutils literal">test.support.SuppressCrashReport</tt> context manager. Thanks <em>koobs</em> for the access!</p> <p>Maybe only FreeBSD 10 and older dump a core on this specific test, not FreeBSD 11. I don't know why. The test is special, it tests a process which crashs while being traced with <tt class="docutils literal">ptrace()</tt>.</p> </div> My contributions to CPython during 2017 Q2 (part 2)2017-07-13T16:30:00+02:002017-07-13T16:30:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-13:/contrib-cpython-2017q2-part2.html<p>This is the second part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p> <ul class="simple"> <li>Mentoring</li> <li>Reference and memory leaks</li> <li>Contributions</li> <li>Enhancements</li> <li>Bugfixes</li> <li>Stars of the CPython GitHub project</li> </ul> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1)</a>.</p> <p>Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part3.html">My contributions to CPython during 2017 Q2 …</a></p><p>This is the second part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p> <ul class="simple"> <li>Mentoring</li> <li>Reference and memory leaks</li> <li>Contributions</li> <li>Enhancements</li> <li>Bugfixes</li> <li>Stars of the CPython GitHub project</li> </ul> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1)</a>.</p> <p>Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part3.html">My contributions to CPython during 2017 Q2 (part 3)</a>.</p> <div class="section" id="mentoring"> <h2>Mentoring</h2> <p>During this quarter, I tried to mark &quot;easy&quot; issues using a &quot;[EASY]&quot; tag in their title and the &quot;easy&quot; or &quot;easy C&quot; keyword. I announced these issues on the <a class="reference external" href="https://www.python.org/dev/core-mentorship/">core-mentorship mailing list</a>. I asked core developers to not fix these easy issues, but rather explain how to fix them. In each issue, I described how fix these issues.</p> <p>It was a success since all easy issues were fixed quickly, usually the PR was merged in less than 24 hours after I created the issue!</p> <p>I mentored <strong>Stéphane Wirtel</strong> and <strong>Louie Lu</strong> to fix issues (easy or not). During this quarter, Stéphane Wirtel got <strong>5 commits</strong> merged into master (on a <strong>total of 11 commits</strong>), and Louie lu got <strong>6 commits</strong> merged into master (on a <strong>total of 10 commits</strong>).</p> <p>They helped me to fix reference leaks spotted by the new Refleaks buildbots.</p> </div> <div class="section" id="reference-and-memory-leaks"> <h2>Reference and memory leaks</h2> <p>Zachary Ware installed a Gentoo and a Windows buildbots running the Python test suite with <tt class="docutils literal"><span class="pre">--huntrleaks</span></tt> to detect reference and memory leaks.</p> <p>I worked hard with others, especially Stéphane Wirtel and Louie Lu, to fix <em>all</em> reference leaks and memory leaks in Python 2.7, 3.5, 3.6 and master. Right now, there is no more leaks on Windows! For Gentoo, the buildbot is currently offline, but I am confident that all leaks also fixed.</p> <ul class="simple"> <li>bpo-30598: _PySys_EndInit() now duplicates warnoptions. Fix a reference leak in subinterpreters, like test_callbacks_leak() of test_atexit. warnoptions is a list used to pass options from the command line to the sys module constructor. Before this change, the list was shared by multiple interpreter which is not the expected behaviour. Each interpreter should have their own independent mutable world. This change duplicates the list in each interpreter. So each interpreter owns its own list, so each interpreter can clear its own list.</li> <li>bpo-30601: Fix a refleak in WindowsConsoleIO. Fix a reference leak in _io._WindowsConsoleIO: PyUnicode_FSDecoder() always initialize decodedname when it succeed and it doesn't clear input decodedname object.</li> <li>bpo-30599: Fix test_threaded_import reference leak. Mock os.register_at_fork() when importing the random module, since this function doesn't allow to unregister callbacks and so leaked memory.</li> <li>2.7: _tkinter: Fix refleak in getint(). PyNumber_Int() creates a new reference: need to decrement result reference counter.</li> <li>bpo-30635: Fix refleak in test_c_locale_coercion. When checking for reference leaks, test_c_locale_coercion is run multiple times and so _LocaleCoercionTargetsTestCase.setUpClass() is called multiple times. setUpClass() appends new value at each call, so it looks like a reference leak. Moving the setup from setUpClass() to setUpModule() avoids this, eliminating the false alarm.</li> <li>bpo-30602: Fix refleak in os.spawnve(). When os.spawnve() fails while handling arguments, free correctly argvlist: pass lastarg+1 rather than lastarg to free_string_array() to also free the first item.</li> <li>bpo-30602: Fix refleak in os.spawnv(). When os.spawnv() fails while handling arguments, free correctly argvlist: pass lastarg+1 rather than lastarg to free_string_array() to also free the first item.</li> <li>Fix ref cycles in TestCase.assertRaises(). bpo-23890: unittest.TestCase.assertRaises() now manually breaks a reference cycle to not keep objects alive longer than expected.</li> <li>Python 2.7: bpo-30675: Fix refleak hunting in regrtest. regrtest now warms up caches: create explicitly all internal singletons which are created on demand to prevent false positives when checking for reference leaks.</li> <li>_winconsoleio: Fix memory leak. Fix memory leak when _winconsoleio tries to open a non-console file: free the name buffer.</li> <li>bpo-30813: Fix unittest when hunting refleaks. bpo-11798, bpo-16662, bpo-16935, bpo-30813: Skip test_discover_with_module_that_raises_SkipTest_on_import() and test_discover_with_init_module_that_raises_SkipTest_on_import() of test_unittest when hunting reference leaks using regrtest.</li> <li>bpo-30704, bpo-30604: Fix memleak in code_dealloc(): Free also co_extra-&gt;ce_extras, not only co_extra. XXX Serhiy rewrote the structure in master to use a single memory block, implemented my idea.</li> </ul> <div class="section" id="python-3-5-regrtest-fix"> <h3>Python 3.5 regrtest fix</h3> <p>bpo-30675, Fix the multiprocessing code in regrtest:</p> <ul class="simple"> <li>Rewrite code to pass <tt class="docutils literal">slaveargs</tt> from the master process to worker processes: reuse the same code of the Python master branch.</li> <li>Move code to initialize tests in a new <tt class="docutils literal">setup_tests()</tt> function, similar change was done in the master branch.</li> <li>In a worker process, call <tt class="docutils literal">setup_tests()</tt> with the namespace built from <tt class="docutils literal">slaveargs</tt> to initialize correctly tests.</li> </ul> <p>Before this change, <tt class="docutils literal">warm_caches()</tt> was not called in worker processes because the setup was done before rebuilding the namespace from <tt class="docutils literal">slaveargs</tt>. As a consequence, the <tt class="docutils literal">huntrleaks</tt> feature was unstable. For example, <tt class="docutils literal">test_zipfile</tt> reported randomly false positive on reference leaks.</p> </div> <div class="section" id="false-positives"> <h3>False positives</h3> <p>bpo-30776: reduce regrtest -R false positives (#2422)</p> <ul class="simple"> <li>Change the regrtest --huntrleaks checker to decide if a test file leaks or not. Require that each run leaks at least 1 reference.</li> <li>Warmup runs are now completely ignored: ignored in the checker test and not used anymore to compute the sum.</li> <li>Add an unit test for a reference leak.</li> </ul> <p>Example of reference differences previously considered a failure (leak) and now considered as success (success, no leak):</p> <pre class="literal-block"> [3, 0, 0] [0, 1, 0] [8, -8, 1] </pre> <p>The same change was done to check for memory leaks.</p> </div> </div> <div class="section" id="contributions"> <h2>Contributions</h2> <p>This quarter, I helped to merge two contributions:</p> <ul class="simple"> <li>bpo-9850: Deprecate the macpath module. Co-Authored-By: <strong>Chi Hsuan Yen</strong>.</li> <li>bpo-30595: Fix multiprocessing.Queue.get(timeout). multiprocessing.Queue.get() with a timeout now polls its reader in non-blocking mode if it succeeded to aquire the lock but the acquire took longer than the timeout. Co-Authored-By: <strong>Grzegorz Grzywacz</strong>.</li> </ul> </div> <div class="section" id="enhancements"> <h2>Enhancements</h2> <ul class="simple"> <li>bpo-30265: support.unlink() now only ignores ENOENT and ENOTDIR, instead of ignoring all OSError exception.</li> <li>bpo-30054: Expose tracemalloc C API: make PyTraceMalloc_Track() and PyTraceMalloc_Untrack() functions public. numpy is able to use tracemalloc since numpy 1.13.</li> </ul> </div> <div class="section" id="bugfixes"> <h2>Bugfixes</h2> <ul class="simple"> <li>bpo-30125: On Windows, faulthandler.disable() now removes the exception handler installed by faulthandler.enable().</li> <li>bpo-30284: Fix regrtest for out of tree build. Use a build/ directory in the build directory, not in the source directory, since the source directory may be read-only and must not be modified. Fallback on the source directory if the build directory is not available (missing &quot;abs_builddir&quot; sysconfig variable).</li> <li>test_locale now ignores the DeprecationWarning, don't fail anymore if test run with <tt class="docutils literal">python3 <span class="pre">-Werror</span></tt>. Fix also deprecation message: add a space.</li> <li>Fix a compiler warnings on AIX: only define get_zone() and get_gmtoff() if needed.</li> <li>Fix a compiler warning in tmtotuple(): use the <tt class="docutils literal">time_t</tt> type for the <tt class="docutils literal">gmtoff</tt> parameter.</li> <li>bpo-30264: ExpatParser closes the source on error. ExpatParser.parse() of xml.sax.xmlreader now always closes the source: close the file object or the urllib object if source is a string (not an open file-like object). The change fixes a ResourceWarning on parsing error. Add test_parse_close_source() unit test.</li> <li>Fix SyntaxWarning on importing test_inspect. Fix the following warning when test_inspect.py is compiled to test_inspect.pyc: <tt class="docutils literal">SyntaxWarning: tuple parameter unpacking has been removed in 3.x</tt></li> <li>bpo-30418: On Windows, subprocess.Popen.communicate() now also ignore EINVAL on stdin.write(): ignore also EINVAL if the child process is still running but closed the pipe.</li> <li>bpo-30257: _bsddb: Fix newDBObject(). Don't set cursorSetReturnsNone to DEFAULT_CURSOR_SET_RETURNS_NONE anymore if self-&gt;myenvobj is set. Fix a GCC warning on the strange indentation.</li> <li>bpo-30231: Remove skipped test_imaplib tests. The public cyrus.andrew.cmu.edu IMAP server (port 993) doesn't accept TLS connection using our self-signed x509 certificate. Remove the two tests which are already skipped. Write a new test_certfile_arg_warn() unit test for the certfile deprecation warning.</li> </ul> </div> <div class="section" id="stars-of-the-cpython-github-project"> <h2>Stars of the CPython GitHub project</h2> <p>At June 30, I wrote <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-June/148523.html">an email to python-dev</a> about <a class="reference external" href="https://github.com/showcases/programming-languages">GitHub showcase of hosted programming languages</a>: Python is only #11 with 8,539 stars, behind PHP and Ruby! I suggested to &quot;like&quot; (&quot;star&quot;?) the <a class="reference external" href="https://github.com/python/cpython/">CPython project on GitHub</a> if you like the Python programming language!</p> <p>Four days later, <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-July/148548.html">we got +2,389 new stars (8,539 =&gt; 10,928)</a>, thank you! Python moved from the 11th place to the 9th, before Elixir and Julia.</p> <p>Ben Hoyt <a class="reference external" href="https://www.reddit.com/r/Python/comments/6kg4w0/cpython_recently_moved_to_github_star_the_project/">posted it on reddit.com/r/Python</a>, where it got a bit of traction. Terry Jan Reedy also <a class="reference external" href="https://mail.python.org/pipermail/python-list/2017-July/723476.html">posted it on python-list</a>.</p> <p>Screenshot at 2017-07-13 showing Ruby, PHP and CPython:</p> <a class="reference external image-reference" href="https://github.com/showcases/programming-languages"> <img alt="GitHub showcase: Programming languages" src="https://vstinner.github.io/images/github_cpython_stars.png" /> </a> <p>CPython now has 11,512 stars, only 861 stars behind PHP ;-)</p> </div> My contributions to CPython during 2017 Q2 (part 1)2017-07-13T16:00:00+02:002017-07-13T16:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-13:/contrib-cpython-2017q2-part1.html<p>This is the first part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p> <ul class="simple"> <li>Statistics</li> <li>Buidbots and test.bisect</li> <li>Python 3.6.0 regression</li> <li>struct.Struct.format type</li> <li>Optimization: one less syscall per open() call</li> <li>make regen-all</li> </ul> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to CPython during 2017 Q1</a>.</p> <p>Next reports …</p><p>This is the first part of my contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q2 (april, may, june):</p> <ul class="simple"> <li>Statistics</li> <li>Buidbots and test.bisect</li> <li>Python 3.6.0 regression</li> <li>struct.Struct.format type</li> <li>Optimization: one less syscall per open() call</li> <li>make regen-all</li> </ul> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to CPython during 2017 Q1</a>.</p> <p>Next reports:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part2.html">My contributions to CPython during 2017 Q2 (part 2)</a>.</li> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part3.html">My contributions to CPython during 2017 Q2 (part 3)</a>.</li> <li><a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q3-part1.html">My contributions to CPython during 2017 Q3: Part 1</a>.</li> </ul> <p>Next parts</p> <div class="section" id="statistics"> <h2>Statistics</h2> <pre class="literal-block"> # All branches $ git log --after=2017-03-31 --before=2017-06-30 --reverse --branches='*' --author=Stinner &gt; 2017Q2 $ grep '^commit ' 2017Q2|wc -l 222 # Master branch only $ git log --after=2017-03-31 --before=2017-06-30 --reverse --author=Stinner origin/master|grep '^commit '|wc -l 85 </pre> <p>Statistics: <strong>85</strong> commits in the master branch, a <strong>total of 222 commits</strong>: most (but not all) of the remaining 137 commits are cherry-picked backports to 2.7, 3.5 and 3.6 branches.</p> <p>Note: I didn't use <tt class="docutils literal"><span class="pre">--no-merges</span></tt> since we don't use merge anymore, but <tt class="docutils literal">git <span class="pre">cherry-pick</span> <span class="pre">-x</span></tt>, to <em>backport</em> fixes. Before GitHub, we used <strong>forwardport</strong> with Mercurial merges (ex: commit into 3.6, then merge into master).</p> </div> <div class="section" id="buildbots-and-test-bisect"> <h2>Buildbots and test.bisect</h2> <p>Since this article became way too long, I splitted it into sub-articles:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/python-test-bisect.html">New Python test.bisect tool</a></li> <li><a class="reference external" href="https://vstinner.github.io/python-buildbots-2017q2.html">Work on Python buildbots, 2017 Q2</a></li> </ul> </div> <div class="section" id="python-3-6-0-regression"> <h2>Python 3.6.0 regression</h2> <p>I am ashamed, I introduced a tricky regression in Pyton 3.6.0 with my work on FASTCALL optimizations :-( A special way to call C builtin functions was broken:</p> <pre class="literal-block"> from datetime import datetime next(iter(datetime.now, None)) </pre> <p>This code raises a <tt class="docutils literal">StopIteration</tt> exception instead of formatting the current date and time.</p> <p>It's even worse. I was aware of the bug, it was already fixed it in master, but I just forgot to backport my fix: bpo-30524, fix _PyStack_UnpackDict().</p> <p>To prevent regressions, I wrote exhaustive unit tests on the 3 FASTCALL functions, commit: <a class="reference external" href="https://github.com/python/cpython/commit/3b5cf85edc188345668f987c824a2acb338a7816">bpo-30524: Write unit tests for FASTCALL</a></p> </div> <div class="section" id="struct-struct-format-type"> <h2>struct.Struct.format type</h2> <p>Sometimes, fixing a bug can take longer than expected. In March 2014, <strong>Zbyszek Jędrzejewski-Szmek</strong> reported a bug on the <tt class="docutils literal">format</tt> attribute of the <tt class="docutils literal">struct.Struct</tt> class: this attribute type is bytes, whereas a Unicode string (str) was expected.</p> <p>I proposed to &quot;just&quot; change the attribute type in December 2014, but it was an incompatible change which would break the backward compatibility. <strong>Martin Panter</strong> agreed and wrote a patch. <strong>Serhiy Storchaka</strong> asked to discuss such incompatible change on python-dev, but then nothing happened during longer than... 2 years!</p> <p>In March 2017, I converted the old Martin's patch into a new GitHub pull request. <strong>Serhiy</strong> asked again to write to python-dev, so I wrote: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-March/147688.html">Issue #21071: change struct.Struct.format type from bytes to str</a>. And... I got zero answer.</p> <p>Well, I didn't expect any, since it's a trivial change, and I don't expect that anyone rely on the exact <tt class="docutils literal">format</tt> attribute type. Moreover, the <tt class="docutils literal">struct.Struct</tt> constructor already accepts bytes and str types. If the attribute is passed to the constructor: it just works.</p> <p>In June 2017, Serhiy Storchaka replied to my email: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-June/148360.html">If nobody opposed to this change it will be made in short time.</a></p> <p>Since nobody replied, again, I just merged my pull request. So it took <strong>3 years and 3 months</strong> to change the type of an uncommon attribute :-)</p> <p>Note: I never used this attribute... Before reading this issue, I didn't even know that the <tt class="docutils literal">struct</tt> module has a <tt class="docutils literal">struct.Struct</tt> type...</p> </div> <div class="section" id="optimization-one-less-syscall-per-open-call"> <h2>Optimization: one less syscall per open() call</h2> <p>In bpo-30228, I modified FileIO.seek() and FileIO.tell() methods to now set the internal seekable attribute to avoid one <tt class="docutils literal">fstat()</tt> syscall per Python open() call in buffered or text mode.</p> <p>The seekable property is now also more reliable since its value is set correctly on memory allocation failure.</p> <p>I still have a second pending pull request to remove one more <tt class="docutils literal">fstat()</tt> syscall: <a class="reference external" href="https://github.com/python/cpython/pull/1385">bpo-30228: TextIOWrapper uses abs_pos, not tell()</a>.</p> </div> <div class="section" id="make-regen-all"> <h2>make regen-all</h2> <p>I started to look at bpo-23404, because the Python compilation failed on the &quot;AMD64 FreeBSD 9.x 3.x&quot; buildbot when trying to regenerate the <tt class="docutils literal">Include/opcode.h</tt> file.</p> <div class="section" id="old-broken-make-touch"> <h3>Old broken make touch</h3> <p>We had a <tt class="docutils literal">make touch</tt> command to workaround this file timestamp issue, but the command uses Mercurial, whereas Python migrated to Git last february. The buildobt &quot;touch&quot; step was removed because <tt class="docutils literal">make touch</tt> was broken.</p> <p>I was always annoyed by the Makefile which wants to regenerate generated files because of wrong file modification time, whereas the generated files were already up to date.</p> <p>The bug annoyed me on OpenIndiana where &quot;make touch&quot; didn't work beause the operating system only provides Python 2.6 and Mercurial didn't work on this version.</p> <p>The bug also annoyed me on FreeBSD which has no &quot;python&quot; command, only &quot;python2.7&quot;, and so required manual steps.</p> <p>The bug was also a pain point when trying to cross-compile Python.</p> </div> <div class="section" id="new-shiny-make-regen-all"> <h3>New shiny make regen-all</h3> <p>I decided to rewrite the Makefile to not regenerate generated files based on the file modification time anymore. Instead, I added a new <tt class="docutils literal">make <span class="pre">regen-all</span></tt> command to regenerate explicitly all generated files. Basically, I replaced <tt class="docutils literal">make touch</tt> with <tt class="docutils literal">make <span class="pre">regen-all</span></tt>.</p> <p>Changes:</p> <ul class="simple"> <li>Add a new <tt class="docutils literal">make <span class="pre">regen-all</span></tt> command to rebuild all generated files</li> <li>Add subcommands to only generate specific files:<ul> <li><tt class="docutils literal"><span class="pre">regen-ast</span></tt>: Include/Python-ast.h and Python/Python-ast.c</li> <li><tt class="docutils literal"><span class="pre">regen-grammar</span></tt>: Include/graminit.h and Python/graminit.c</li> <li><tt class="docutils literal"><span class="pre">regen-importlib</span></tt>: Python/importlib_external.h and Python/importlib.h</li> <li><tt class="docutils literal"><span class="pre">regen-opcode</span></tt>: Include/opcode.h</li> <li><tt class="docutils literal"><span class="pre">regen-opcode-targets</span></tt>: Python/opcode_targets.h</li> <li><tt class="docutils literal"><span class="pre">regen-typeslots</span></tt>: Objects/typeslots.inc</li> </ul> </li> <li>Rename <tt class="docutils literal">PYTHON_FOR_GEN</tt> to <tt class="docutils literal">PYTHON_FOR_REGEN</tt></li> <li>pgen is now only built by <tt class="docutils literal">make <span class="pre">regen-grammar</span></tt></li> <li>Add <tt class="docutils literal">$(srcdir)/</tt> prefix to paths to source files to handle correctly compilation outside the source directory</li> <li>Remove <tt class="docutils literal">make touch</tt>, <tt class="docutils literal">Tools/hg/hgtouch.py</tt> and <tt class="docutils literal">.hgtouch</tt></li> </ul> <p>Note: By default, <tt class="docutils literal">$(PYTHON_FOR_REGEN)</tt> is no more used nor needed by &quot;make&quot;.</p> </div> </div> Work on Python buildbots, 2017 Q22017-07-13T09:00:00+02:002017-07-13T09:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-13:/python-buildbots-2017q2.html<p>I spent the last 6 months on working on buildbots: reduce the failure rate, send email notitication on failure, fix random bugs, detect more bugs using warnings, backport fixes to older branches, etc. I decided to fix <em>all</em> buildbots issues: fix all warnings and all unstable tests!</p> <p>The good news …</p><p>I spent the last 6 months on working on buildbots: reduce the failure rate, send email notitication on failure, fix random bugs, detect more bugs using warnings, backport fixes to older branches, etc. I decided to fix <em>all</em> buildbots issues: fix all warnings and all unstable tests!</p> <p>The good news is that I made great progress, I fixed most random failures. A random fail now became the exception rather than the norm. Some issues were not bugs in tests, but real race conditions in the code. It's always good to fix unlikely race conditions before users hit them on production!</p> <ul class="simple"> <li>Introduction: Python Buildbots</li> <li>Orange Is The New Color</li> <li>New buildbot-status Mailing List</li> <li>Hardware issues<ul> <li>The vacuum cleaner</li> <li>The memory stick</li> </ul> </li> <li>Warnings</li> <li>regrtest</li> <li>Bug fixes</li> <li>Python 2.7</li> <li>Buildbot reports to python-dev</li> </ul> <div class="section" id="introduction-python-buildbots"> <h2>Introduction: Python Buildbots</h2> <p>CPython is running a <a class="reference external" href="https://buildbot.net/">Buildbot</a> server for continuous integration, but tests are run as post-commit: see <a class="reference external" href="https://www.python.org/dev/buildbot/">Python buildbots</a>. CPython is tested by a wide range of buildbot slaves:</p> <ul class="simple"> <li>6 operating systems:<ul> <li>Linux (Debian, Ubuntu, Gentoo, RHEL, SLES)</li> <li>Windows (7, 8, 8.1 and 10)</li> <li>macOS (Tiger, El Capitain, Sierra)</li> <li>FreeBSD (9, 10, CURRENT)</li> <li>AIX</li> <li>OpenIndiana (currently offline)</li> </ul> </li> <li>5 CPU architectures:<ul> <li>ARMv7</li> <li>x86 (Intel 32 bit)</li> <li>x86-64 aka &quot;AMD64&quot; (Intel 64-bit)</li> <li>PPC64, PPC64LE</li> <li>s390x</li> </ul> </li> <li>3 C compilers:<ul> <li>GCC</li> <li>Clang (FreeBSD, macOS)</li> <li>Visual Studio (Windows)</li> </ul> </li> </ul> <p>There are different kinds of tests:</p> <ul class="simple"> <li>Python test suite: the most common check</li> <li>Docs: check that the documentation can be build and doesn't contain warnings</li> <li>Refleaks: check for reference leaks and memory leaks, run the Python test suite with the <tt class="docutils literal"><span class="pre">--huntrleaks</span></tt> option</li> <li>DMG: Build the macOS installer with the <tt class="docutils literal"><span class="pre">Mac/BuildScript/build-installer.py</span></tt> script</li> </ul> <p>Python is tested in different configurations:</p> <ul class="simple"> <li>Debug: <tt class="docutils literal">./configure <span class="pre">--with-pydebug</span></tt>, the most common configuration</li> <li>Non-debug: release mode, with compiler optimizations</li> <li>PGO: Profiled Guided Optimization, <tt class="docutils literal">./configure <span class="pre">--enable-optimizations</span></tt></li> <li>Installed: <tt class="docutils literal">./configure <span class="pre">--prefix=XXX</span> &amp;&amp; make install</tt></li> <li>Shared library (libpython): <tt class="docutils literal">./configure <span class="pre">--enable-shared</span></tt></li> </ul> <p>Currently, 4 branches are tested:</p> <ul class="simple"> <li><tt class="docutils literal">master</tt>: called &quot;3.x&quot; on buildbots</li> <li><tt class="docutils literal">3.6</tt></li> <li><tt class="docutils literal">3.5</tt></li> <li><tt class="docutils literal">2.7</tt></li> </ul> <p>There is also <tt class="docutils literal">custom</tt>, a special branch used by core developers for testing patches.</p> <p>The buildbot configuration can be found in the <a class="reference external" href="https://github.com/python/buildmaster-config/">buildmaster-config project</a> (start with the <tt class="docutils literal">master/master.cfg</tt> file).</p> <p>Note: Thanks to the migration to GitHub, Pull Requests are now tested on Linux, Windows and macOS by Travis CI and AppVeyor. It's the first time in the CPython development history that we have automated pre-commit tests!</p> </div> <div class="section" id="orange-is-the-new-color"> <h2>Orange Is The New Color</h2> <p>A buildbot now becomes orange when tests contain warnings.</p> <p>My first change was to modify the buildbot configuration to extract warnings from the raw test output to create a new &quot;warnings&quot; report, to more easily detect warnings and tests failing randomly (test fail then pass when re-run).</p> <p>Example of orange build, x86-64 El Capitain 3.x:</p> <img alt="Buildbot: orange build" src="https://vstinner.github.io/images/buildbot_orange.png" /> <p>Extract of the current <tt class="docutils literal">master/custom/steps.py</tt>:</p> <pre class="literal-block"> class Test(BaseTest): # Regular expression used to catch warnings, errors and bugs warningPattern = ( # regrtest saved_test_environment warning: # Warning -- files was modified by test_distutils # test.support &#64;reap_threads: # Warning -- threading_cleanup() failed to cleanup ... r&quot;Warning -- &quot;, # Py_FatalError() call r&quot;Fatal Python error:&quot;, # PyErr_WriteUnraisable() exception: usually, error in # garbage collector or destructor r&quot;Exception ignored in:&quot;, # faulthandler_exc_handler(): Windows exception handler installed with # AddVectoredExceptionHandler() by faulthandler.enable() r&quot;Windows fatal exception:&quot;, # Resource warning: unclosed file, socket, etc. # NOTE: match the &quot;ResourceWarning&quot; anywhere, not only at the start r&quot;ResourceWarning&quot;, # regrtest: At least one test failed. Log a warning even if the test # passed on the second try, to notify that a test is unstable. r'Re-running failed tests in verbose mode', # Re-running test 'test_multiprocessing_fork' in verbose mode r'Re-running test .* in verbose mode', # Thread last resort exception handler in t_bootstrap() r'Unhandled exception in thread started by ', # test_os leaked [6, 6, 6] memory blocks, sum=18, r'test_[^ ]+ leaked ', ) # Use &quot;.*&quot; prefix to search the regex anywhere since stdout is mixed # with stderr, so warnings are not always written at the start # of a line. The log consumer calls warningPattern.match(line) warningPattern = r&quot;.*(?:%s)&quot; % &quot;|&quot;.join(warningPattern) warningPattern = re.compile(warningPattern) # if tests have warnings, mark the overall build as WARNINGS (orange) warnOnWarnings = True </pre> </div> <div class="section" id="new-buildbot-status-mailing-list"> <h2>New buildbot-status Mailing List</h2> <p>To check buildbots, previously I had to analyze manually the huge &quot;waterfall&quot; view of four Python branches: 2.7, 3.5, 3.6 and master (&quot;3.x&quot;).</p> <ul class="simple"> <li><a class="reference external" href="http://buildbot.python.org/all/waterfall?category=3.x.stable&amp;category=3.x.unstable">Python master (&quot;3.x&quot;)</a></li> <li><a class="reference external" href="http://buildbot.python.org/all/waterfall?category=3.6.stable&amp;category=3.6.unstable">Python 3.6</a></li> <li><a class="reference external" href="http://buildbot.python.org/all/waterfall?category=3.5.stable&amp;category=3.5.unstable">Python 3.5</a></li> <li><a class="reference external" href="http://buildbot.python.org/all/waterfall?category=2.7.stable&amp;category=2.7.unstable">Python 2.7</a></li> </ul> <p>Example of typical buildbot waterfall:</p> <a class="reference external image-reference" href="http://buildbot.python.org/all/waterfall?category=3.x.stable&amp;category=3.x.unstable"> <img alt="Buildbot waterfall" src="https://vstinner.github.io/images/buildbot_waterfall.png" /> </a> <p>The screenshot is obviously truncated since the webpage is giant: I have to scroll in all directions... It's not convenient to check the status of all builds, detect random failures, etc.</p> <p>We also have an IRC bot reporting buildbot failures: when a green (success) or orange (warning) buildbot becomes red (failure). I wanted to have the same thing, but by email. Technically, it's trivial to enable email notification, but I never did it because buildbots were simply too unstable: most failures were not related to the newly tested changes.</p> <p>But I decided to fix <em>all</em> buildbots issues, so I enabled email notification (<a class="reference external" href="https://bugs.python.org/issue30325">bpo-30325</a>). Since May 2017, buildbots are now sending notifications to a new <a class="reference external" href="https://mail.python.org/mm3/mailman3/lists/buildbot-status.python.org/">buildbot-status mailing list</a>.</p> <p>I use the mailing list to check if the failure is known or not: I try to answer to all failure notification emails. If the failure is known, I copy the link to the issue. Otherwise, I create a new issue and then copy the link to the new issue.</p> </div> <div class="section" id="hardware-issues"> <h2>Hardware issues</h2> <p>Unit tests versus real life :-) (or &quot;software versus hardware&quot;)</p> <div class="section" id="the-vacuum-cleaner"> <h3>The vacuum cleaner</h3> <p>Fixing buildbot issues can be boring sometimes, so let's start with a funny bug. At June 25, Nick Coghlan wrote to the <a class="reference external" href="https://mail.python.org/mailman/listinfo/python-buildbots">python-buildbots</a> mailing list:</p> <blockquote> It looks like the FreeBSD buildbots had an outage a little while ago, and the FreeBSD 10 one may need a nudge to get back online (the FreeBSD Current one looks like it came back automatically).</blockquote> <p>The reason is unexpected :-) <a class="reference external" href="https://mail.python.org/pipermail/python-buildbots/2017-June/000122.html">Kubilay Kocak, owner of the buildbot, answered</a>:</p> <blockquote> Vacuum cleaner tripped RCD pulling too much current from the same circuit as heater was running on. Buildbot worker host on same circuit.</blockquote> </div> <div class="section" id="the-memory-stick"> <h3>The memory stick</h3> <p>I opened at least 50 issues to report random buildbot failures. In the middle of these issues, you can find <a class="reference external" href="http://bugs.python.org/issue30371">bpo-30371</a>:</p> <pre class="literal-block"> http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/436/steps/test/logs/stdio ====================================================================== FAIL: test_long_lines (test.test_email.test_email.TestFeedParsers) ---------------------------------------------------------------------- Traceback (most recent call last): File &quot;C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_email\test_email.py&quot;, line 3526, in test_long_lines self.assertEqual(m.get_payload(), 'x'*M*N) AssertionError: 'xxxx[17103482 chars]xxxxxzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[2896464 chars]xxxx' != 'xxxx[17103482 chars]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[2896464 chars]xxxx' Notice the &quot;z&quot; in &quot;...xxxxxz...&quot;. </pre> <p>and:</p> <pre class="literal-block"> New fail, same buildbot: ====================================================================== FAIL: test_long_lines (test.test_email.test_email.TestFeedParsers) ---------------------------------------------------------------------- Traceback (most recent call last): File &quot;C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_email\test_email.py&quot;, line 3534, in test_long_lines self.assertEqual(m.items(), [('a', ''), ('b', 'x'*M*N)]) AssertionError: Lists differ: [('a'[1845894 chars]xxxxxzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[18154072 chars]xx')] != [('a'[1845894 chars]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[18154072 chars]xx')] First differing element 1: ('b',[1845882 chars]xxxxxzxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[18154071 chars]xxx') ('b',[1845882 chars]xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx[18154071 chars]xxx') [('a', ''), ('b', Don't click on http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/439/steps/test/logs/stdio : the log contains lines of 2 MB which make my Firefox super slow :-) </pre> <p>Jeremy Kloth, owner the buildbot, answered:</p> <blockquote> Watch this space, but I'm pretty sure that it is (was) bad memory.</blockquote> <p>He fixed the issue:</p> <blockquote> That's the real problem, I'm not <em>sure</em> it's the memory, but it does have the symptoms. And that is why my buildbot was down earlier, I was attempting to determine the bad stick and replace it.</blockquote> </div> </div> <div class="section" id="warnings"> <h2>Warnings</h2> <p>To fix test warnings, I enhanced the test suite to report more information when a warning is emitted and to ease detection of failures.</p> <p>A major change is the new <tt class="docutils literal"><span class="pre">--fail-env-changed</span></tt> option I added to regrtest (bpo-30764): make tests fail if the &quot;environment&quot; is changed. This option is now used on buildbots, Travis CI and AppVeyor, but only for the <em>master</em> branch yet.</p> <p>Other changes:</p> <ul class="simple"> <li>The &#64;reap_threads decorator and the threading_cleanup() function of test.support now log a warning if they fail to clenaup threads. The log may help to debug such other warning seen on the AMD64 FreeBSD CURRENT Non-Debug 3.x buildbot: &quot;Warning -- threading._dangling was modified by test_logging&quot;.</li> <li>threading_cleanup() failure marks test as ENV_CHANGED. If threading_cleanup() fails to cleanup threads, set a a new support.environment_altered flag to true, flag uses by save_env which is used by regrtest to check if a test altered the environment. At the end, the test file fails with ENV_CHANGED instead of SUCCESS, to report that it altered the environment.</li> <li>regrtest: always show before/after values of modified environment.</li> </ul> <p>I backported all these changes to the 2.7, 3.5 and 3.6 branches to make sure that warnings are fixed in all maintained branches.</p> </div> <div class="section" id="regrtest"> <h2>regrtest</h2> <p>As usual, I spent time our specialized test runner, regrtest:</p> <ul class="simple"> <li>bpo-30263: regrtest: log system load and the number of CPUs. I tried to find a relationship between race conditions and the system load. I failed to find any obvious correlation yet, but I still consider that the system load is useful.</li> <li>bpo-27103: regrtest disables -W if -R (reference hunting) is used. Workaround for a regrtest bug.</li> </ul> <p>But the most complex task was to backport <em>all</em> regrtest features and enhancements from master to regrtest of 3.6, 3.5 and then 2.7 branches.</p> <p>In Python 3.6, I rewrote regrtest.py file to split it into smaller files a in new Lib/test/libregrtest/ library, so it was painful to backport changes to 3.5 (bpo-30383) which still uses the single regrtest.py file.</p> <p>In Python 2.7 (bpo-30283), it is even worse. Lib/test/regrtest.py uses the old <tt class="docutils literal">getopt</tt> module to parse the command line instead of the new <tt class="docutils literal">argparse</tt> used in 3.5 and newer. But I succeeded to backport all features and enhancements from master!</p> <p>Python 2.7, 3.5, 3.6 and master now have almost the same CLI for <tt class="docutils literal">python <span class="pre">-m</span> test</tt>, almost the same features (except of one or two missing feature), and should provide the same level of information on failures and warnings.</p> <p>By the way, the new <tt class="docutils literal">test.bisect</tt> tool is now also available in all these branches. See my <a class="reference external" href="https://vstinner.github.io/python-test-bisect.html">New Python test.bisect tool</a> article.</p> </div> <div class="section" id="bug-fixes"> <h2>Bug fixes</h2> <p>As expected, the longest section here is the list of changes I wrote to fix all buildbot failures and warnings:</p> <ul class="simple"> <li>bpo-29972: Skip tests known to fail on AIX. See <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-April/147748.html">[Python-Dev] Fix or drop AIX buildbot?</a> email.</li> <li>bpo-29925: Skip test_uuid1_safe() on OS X Tiger</li> <li>Fix and optimize test_asyncore.test_quick_connect(). Don't use addCleanup() in test_quick_connect() because it keeps the Thread object alive and so &#64;reap_threads times out after 1 second. &quot;./python -m test -v test_asyncore -m test_quick_connect&quot; now takes 185 ms, instead of 11 seconds.</li> <li>bpo-30106: Fix test_asyncore.test_quick_connect(). test_quick_connect() runs a thread up to 50 seconds, whereas the socket is connected in 0.2 second and then the thread is expected to end in less than 3 second. On Linux, the thread ends quickly because select() seems to always return quickly. On FreeBSD, sometimes select() fails with timeout and so the thread runs much longer than expected. Fix the thread timeout to fix a race condition in the test.</li> <li>bpo-30106: Fix tearDown() of test_asyncore. Call asyncore.close_all() with ignore_all=True in the tearDown() method of the test_asyncore base test case. It prevents keeping alive sockets in asyncore.socket_map if close() fails with an unexpected error.</li> <li>bpo-30108: Restore sys.path in test_site. Add setUpModule() and tearDownModule() functions to test_site to save/restore sys.path at the module level to prevent warning if the user site directory is created, since site.addsitedir() modifies sys.path.</li> <li>bpo-30107: test_io doesn't dump a core file on an expected crash anymore. test_io has two unit tests which trigger a deadlock: test_daemon_threads_shutdown_stdout_deadlock() and test_daemon_threads_shutdown_stderr_deadlock(). These tests call Py_FatalError() if the expected bug is triggered which calls abort(). Use test.support.SuppressCrashReport to prevent the creation on a core dump, to fix the warning: <tt class="docutils literal">Warning <span class="pre">--</span> files was modified by test_io <span class="pre">(...)</span> After: ['python.core']</tt></li> <li>bpo-30125: Disable faulthandler to run test_SEH() of test_ctypes to prevent the following log with a traceback: <tt class="docutils literal">Windows fatal exception: access violation</tt></li> <li>bpo-30131: test_logging cleans up threads using &#64;support.reap_threads.</li> <li>bpo-30132: BuildExtTestCase of test_distutils now uses support.temp_cwd() in setUp() to remove files created in the current working directory by BuildExtTestCase unit tests.</li> <li>bpo-30107: On macOS, test.support.SuppressCrashReport now redirects /usr/bin/defaults command stderr into a pipe to not pollute stderr. It fixes a test_io.test_daemon_threads_shutdown_stderr_deadlock() failure when the CrashReporter domain doesn't exists.</li> <li>bpo-30175: Skip client cert tests of test_imaplib. The IMAP server cyrus.andrew.cmu.edu doesn't accept our randomly generated client x509 certificate anymore.</li> <li>bpo-30175: test_nntplib fails randomly with EOFError in NetworkedNNTPTests.setUpClass(): catch EOFError to skip tests in that case.</li> <li>bpo-30199: AsyncoreEchoServer of test_ssl now calls asyncore.close_all(ignore_all=True) to ensure that asyncore.socket_map is cleared once the test completes, even if ConnectionHandler was not correctly unregistered. Fix the following warning: <tt class="docutils literal">Warning <span class="pre">--</span> asyncore.socket_map was modified by test_ssl</tt>.</li> <li>Fix test_ftplib warning if IPv6 is not available. DummyFTPServer now calls del_channel() on bind() error to prevent the following warning in TestIPv6Environment.setUpClass(): <tt class="docutils literal">Warning <span class="pre">--</span> asyncore.socket_map was modified by test_ftplib</tt></li> <li>bpo-30329: Catch Windows error 10022 on shutdown(). Catch the Windows socket WSAEINVAL error (code 10022) in imaplib and poplib on shutdown(SHUT_RDWR): An invalid operation was attempted. This error occurs sometimes on SSL connections.</li> <li>bpo-30357: test_thread now uses threading_cleanup(). test_thread: setUp() now uses support.threading_setup() and support.threading_cleanup() to wait until threads complete to avoid random side effects on following tests. Co-Authored-By: <strong>Grzegorz Grzywacz</strong>.</li> <li>bpo-30339: test_multiprocessing_main_handling timeout. test_multiprocessing_main_handling: increase the test_source timeout from 10 seconds to 60 seconds, since the test fails randomly on busy buildbots. Sadly, this change wasn't enough to fix buildbots.</li> <li>bpo-30387: Fix warning in test_threading. test_is_alive_after_fork() now joins directly the thread to avoid the following warning added by bpo-30357: &quot;Warning -- threading_cleanup() failed to cleanup 0 threads after 2 sec (count: 0, dangling: 21)&quot;. Use also a different exit code to catch generic exit code 1.</li> <li>bpo-30649: On Windows, test_os now tolerates a delta of 50 ms instead of 20 ms in test_utime_current() and test_utime_current_old(). On other platforms, reduce the delta from 20 ms to 10 ms. PPC64 Fedora 3.x buildbot requires at least a delta of 14 ms.</li> <li>bpo-30595: test_queue_feeder_donot_stop_onexc() of _test_multiprocessing now uses a timeout of 1 second on Queue.get(), instead of 0.1 second, for slow buildbots.</li> <li>bpo-30764, bpo-29335: test_child_terminated_in_stopped_state() of test_subprocess now uses support.SuppressCrashReport() to prevent the creation of a core dump on FreeBSD.</li> <li>bpo-30280: TestBaseSelectorEventLoop of test.test_asyncio.test_selector_events now correctly closes the event loop: cleanup its executor to not leak threads: don't override the close() method of the event loop, only override the_close_self_pipe() method. asyncio base TestCase now uses threading_setup() and threading_cleanup() of test.support to cleanup threads.</li> <li>bpo-26568, bpo-30812: Fix test_showwarnmsg_missing(): restore the attribute after removing it.</li> </ul> </div> <div class="section" id="python-2-7-1"> <h2>Python 2.7</h2> <p>I wanted to fix <em>all</em> buildbot issues of <em>all</em> branches including 2.7, whereas I didn't touch much the Python 2.7 code base last months (last years???). The first six months of 2017, I backported dozens of commits from master to 2.7!</p> <p>For example, I added AppVeyor on 2.7: a Windows CI for GitHub!</p> <p>On Windows we support multiple versions of Visual Studio. I use Visual Studio 2008, whereas most 2.7 Windows buildbots use Visual Studio 2010 or newer. I fixed sysconfig.is_python_build() if Python is built with Visual Studio 2008 (VS 9.0) (bpo-30342).</p> <p>Other Python 2.7 changes:</p> <ul class="simple"> <li>Fix &quot;make tags&quot; command.</li> <li>bpo-30764: support.SuppressCrashReport backported to 2.7 and &quot;ported&quot; to Windows. Add Windows support to test.support.SuppressCrashReport: call SetErrorMode() and CrtSetReportMode(). _testcapi: add CrtSetReportMode() and CrtSetReportFile() functions and CRT_xxx and CRTDBG_xxx constants needed by SuppressCrashReport.</li> <li>bpo-30705: Fix test_regrtest.test_crashed(). Add test.support._crash_python() which triggers a crash but uses test.support.SuppressCrashReport() to prevent a crash report from popping up. Modify test_child_terminated_in_stopped_state() of test_subprocess and test_crashed() of test_regrtest to use _crash_python().</li> </ul> <p>I also backported many fixes wrote by other developers, including old fixes up to 8 years old!</p> <p>Usually, <strong>finding</strong> the proper fix takes much more time than the cherry-pick itself which is usually straighforward (no conflict, nothing to do). I am always impressed that Git is able to detect that a file was renamed between Python 2 and Python 3, and applies cleanly the change!</p> <p>Example of backports from master to 2.7:</p> <ul class="simple"> <li>bpo-6393: Fix locale.getprerredencoding() on macOS. Python crashes on OSX when <tt class="docutils literal">$LANG</tt> is set to some (but not all) invalid values due to an invalid result from nl_langinfo(). Fix written in <strong>September 2009</strong> (8 years ago)!</li> <li>bpo-15526: test_startfile changes the cwd. Try to fix test_startfile's inability to clean up after itself in time. Patch by <strong>Jeremy Kloth</strong>. Fix the following support.rmtree() error while trying to remove the temporary working directory used by Python tests: &quot;WindowsError: [Error 32] The process cannot access the file because it is being used by another process: ...&quot;. Original commit written in <strong>September 2012</strong>!</li> <li>bpo-11790: Fix sporadic failures in test_multiprocessing.WithProcessesTestCondition. Fixed written in <strong>April 2011</strong>. This backported commit was tricky to identify!</li> <li>bpo-8799, fix test_threading: Reduce timing sensitivity of condition test by explicitly. delaying the main thread so that it doesn't race ahead of the workers. Fix written in <strong>Nov 2013</strong>.</li> <li>test_distutils: Use EnvironGuard on InstallTestCase, UtilTestCase, and BuildExtTestCase to prevent the following warning: <tt class="docutils literal">Warning <span class="pre">--</span> os.environ was modified by test_distutils</tt></li> <li>Fix test_multprocessing: Relax test timing (bpo-29861) to avoid sporadic failures.</li> </ul> </div> <div class="section" id="buildbot-reports-to-python-dev"> <h2>Buildbot reports to python-dev</h2> <p>I also wrote 3 reports to the Python-Dev mailing list:</p> <ul class="simple"> <li>May 3: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-May/147838.html">Status of Python buildbots</a></li> <li>June 8: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-June/148271.html">Buildbot report, june 2017</a></li> <li>June 29: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-June/148511.html">Buildbot report (almost July)</a></li> </ul> </div> New Python test.bisect tool2017-07-12T15:00:00+02:002017-07-12T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-12:/python-test-bisect.html<p>This article tells the story of the new CPython <tt class="docutils literal">test.bisect</tt> tool to identify failing tests in the CPython test suite.</p> <div class="section" id="modify-manually-a-test-file"> <h2>Modify manually a test file</h2> <p>I am fixing reference leaks since many years. When the test file contains more than 200 tests and is longer than 5,000 lines …</p></div><p>This article tells the story of the new CPython <tt class="docutils literal">test.bisect</tt> tool to identify failing tests in the CPython test suite.</p> <div class="section" id="modify-manually-a-test-file"> <h2>Modify manually a test file</h2> <p>I am fixing reference leaks since many years. When the test file contains more than 200 tests and is longer than 5,000 lines, it's just not possible to spot a reference leak. Each time, I modified the long test file and actually <em>removes</em> enough code until the file becomes short enough so I can read it.</p> <p>This method <em>works</em>, but it usually took me 20 to 30 minutes, and so it was common that I made mistakes... and usually had to restart from the start...</p> </div> <div class="section" id="first-failed-attempt"> <h2>First failed attempt</h2> <p>In october 2014, while fixing <a class="reference external" href="http://bugs.python.org/issue22588#msg228905">yet another reference leak in test_capi</a>, <strong>Xavier de Gaye</strong> was surprised that I identified quickly the leak and wanted to want how I proceeded. I explained my method removing code, but I also asked for a tool.</p> <p>Xavier created bpo-22607 at 2014-10-11 and wrote a patch based on an integer range to run a subset of tests and did something special on the <tt class="docutils literal">subTest()</tt> context manager. But <strong>Georg Brandl</strong> wasn't convinced by this approach and... I forgot this issue.</p> </div> <div class="section" id="new-design-list-tests-run-a-subset"> <h2>New design: list tests, run a subset</h2> <p>During this quarter, I had to fix dozens of reference leaks but also tests failing with &quot;environment changed&quot;: one test method modified &quot;something&quot;. It was really painful to identify each time the failing test.</p> <p>So I created bpo-29512 at 2017-02-09 to ask again the same tool. Technically, I just wanted to run a subset of tests.</p> <p>While working on OpenStack, I enjoyed the <tt class="docutils literal">testr</tt> tool, a test runner able to list tests and to run a subset of tests. <tt class="docutils literal">testr</tt> also provides a bisection tool to identify a subset of tests enough to reproduce a bug. The subset can contain more than a single test. Sometimes you need to run two tests sequentially to trigger a specific bug, and it's usually long and boring to identify manually these two tests.</p> <p>I proposed a similar design for my bisection tool. Start by listing all tests, and then:</p> <ul class="simple"> <li>create a pure <em>random</em> sample of tests: subset with half the size of the current test set</li> <li>If tests still fail, use the subset as the new set. Otherwise, throw the subset.</li> <li>Loop until the subset is small enough or the process run longer than 100 iterations.</li> </ul> </div> <div class="section" id="regrtest-list-cases"> <h2>regrtest --list-cases</h2> <p>To list tests, I created bpo-30523 and wrote a patch for the unittest module. Modifying unittest didn't work well with doctests and the command line interface (CLI) didn't work as I wanted. I proposed to modify regrtest instead of unittest.</p> <p>I proposed to <strong>Louie Lu</strong> to implement my new idea. I was impressed that he implemented it so quickly and that it worked so well! I just asked him to not exclude doctest test cases, since these test cases were working as expected! I quickly merged his modified patch which adds the <tt class="docutils literal"><span class="pre">--list-cases</span></tt> option to regrtest.</p> <p>Note: regrtest already had a <tt class="docutils literal"><span class="pre">--list-tests</span></tt> which lists test <em>files</em>, whereas <tt class="docutils literal"><span class="pre">--list-cases</span></tt> lists test <em>methods</em> and doctests.</p> </div> <div class="section" id="regrtest-matchfile"> <h2>regrtest --matchfile</h2> <p>I created bpo-30540 to add a --matchfile option to regrtest. regrtest already had a --match option, but it was only possible to use the option once, and I wanted to use a text files for my list of tests.</p> <p>Again, I was surprised that it was so simple to implement the feature. By the way, I modified regrtest --match to allow to specific the option multiple times, to run multiple tests instead of a single one.</p> </div> <div class="section" id="new-test-bisect-tool"> <h2>New test.bisect tool</h2> <p>Since I had the two key features: <tt class="docutils literal">regrtest <span class="pre">--list-cases</span></tt> and <tt class="docutils literal">regrtest <span class="pre">--matchfile</span></tt>, it became trivial to implement the bisection tool. I wrote a first prototype. The &quot;prototype&quot; worked much better than expected.</p> <p>My first version required a text file listing test cases. I modified it to run automatically the new <tt class="docutils literal"><span class="pre">--list-cases</span></tt> command.</p> <p>I extended the tool to not only track reference leaks, but also &quot;environment changed&quot; failures like finding a test which creates a file but doesn't remove it.</p> <p>I was asked to add this tool in the Python stdlib, so I added it as <tt class="docutils literal">Lib/test/bisect.py</tt> to use it with:</p> <pre class="literal-block"> python3 -m test.bisect ... </pre> <p>The test.bisect CLI is similar to the test CLI on purpose.</p> </div> <div class="section" id="reference-leak-example"> <h2>Reference leak example</h2> <p>I modified <tt class="docutils literal">test_access()</tt> of test_os to add manually a reference leak:</p> <pre class="literal-block"> $ ./python -m test -R 3:3 test_os (...) test_os leaked [1, 1, 1] references, sum=3 test_os leaked [1, 1, 1] memory blocks, sum=3 test_os failed in 33 sec (...) </pre> <p>Just replace <tt class="docutils literal"><span class="pre">-m</span> test</tt> with <tt class="docutils literal"><span class="pre">-m</span> test.bisect</tt> in the command, and you get the guilty method:</p> <pre class="literal-block"> $ ./python -m test.bisect -R 3:3 test_os Start bisection with 257 tests Test arguments: -R 3:3 test_os Bisection will stop when getting 1 or less tests (-n/--max-tests option), or after 100 iterations (-N/--max-iter option) [+] Iteration 1: run 128 tests/257 + /home/haypo/prog/python/master/python -m test --matchfile /tmp/tmpvbraed7h -R 3:3 test_os (...) Tests succeeded: skip this subtest, try a new subbset [+] Iteration 2: run 128 tests/257 + /home/haypo/prog/python/master/python -m test --matchfile /tmp/tmpcjqtzgfe -R 3:3 test_os (...) Tests failed: use this new subtest [+] Iteration 3: run 64 tests/128 (...) [+] Iteration 15: run 1 tests/2 (...) Tests (1): * test.test_os.FileTests.test_access Bisection completed in 16 iterations and 0:03:10 </pre> <p>The <tt class="docutils literal">test.bisect</tt> command found the bug I introduced: <tt class="docutils literal">test.test_os.FileTests.test_access</tt>.</p> <p>The command takes a few minutes, but I don't care of its performance as soon as its fully automated! If you use the <tt class="docutils literal"><span class="pre">-o</span> file</tt> option, each time the tool is able to reduce the size of the test set, it writes the new list of tests on disk. So even if the tool crashs or fails to find a single failure test, it already helps!</p> <p>I am now very happy that <tt class="docutils literal">test.bisect</tt> works better than I expected. So I backported it to 2.7, 3.5, 3.6 and master branches, since I want to fix <em>all</em> buildbot failures on <em>all</em> maintained branches.</p> </div> <div class="section" id="environment-changed-example"> <h2>Environment changed example</h2> <p>While running the previous example, I noticed the following warning:</p> <pre class="literal-block"> Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2) </pre> <p>Using the new <tt class="docutils literal"><span class="pre">--fail-env-changed</span></tt> option, it is now posible to check which test of test_os emits such warning:</p> <pre class="literal-block"> haypo&#64;selma$ ./python -m test.bisect --fail-env-changed -R 3:3 test_os (...) Tests (1): * test.test_os.TestSendfile.test_keywords Bisection completed in 14 iterations and 0:03:27 </pre> <p>I never trust anything, so let's confirm the bug:</p> <pre class="literal-block"> haypo&#64;selma$ ./python -m test --fail-env-changed -R 3:3 test_os -m test.test_os.TestSendfile.test_keywords Run tests sequentially 0:00:00 load avg: 0.33 [1/1] test_os Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2) beginning 6 repetitions 123456 Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2) . Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2) .Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2) .Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2) .Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2) .Warning -- threading_cleanup() failed to cleanup 0 threads after 3 sec (count: 0, dangling: 2) . test_os failed (env changed) 1 test altered the execution environment: test_os Total duration: 21 sec Tests result: ENV CHANGED </pre> <p>Ok right, there is something wrong with test_keywords(). I just opened the <a class="reference external" href="http://bugs.python.org/issue30908">bpo-30908</a>.</p> </div> My contributions to CPython during 2017 Q12017-07-05T12:00:00+02:002017-07-05T12:00:00+02:00Victor Stinnertag:vstinner.github.io,2017-07-05:/contrib-cpython-2017q1.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q1 (january, februrary, march):</p> <ul class="simple"> <li>Statistics</li> <li>Optimization</li> <li>Tricky bug</li> <li>FASTCALL optimizations</li> <li>Stack consumption</li> <li>Contributions</li> <li>os.urandom() and getrandom()</li> <li>Migration to GitHub</li> <li>Enhancements</li> <li>Security</li> <li>regrtest</li> <li>Bugfixes</li> </ul> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q4.html">My contributions to CPython during 2016 Q4</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1 …</a></p><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2017 Q1 (january, februrary, march):</p> <ul class="simple"> <li>Statistics</li> <li>Optimization</li> <li>Tricky bug</li> <li>FASTCALL optimizations</li> <li>Stack consumption</li> <li>Contributions</li> <li>os.urandom() and getrandom()</li> <li>Migration to GitHub</li> <li>Enhancements</li> <li>Security</li> <li>regrtest</li> <li>Bugfixes</li> </ul> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q4.html">My contributions to CPython during 2016 Q4</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q2-part1.html">My contributions to CPython during 2017 Q2 (part 1)</a>.</p> <div class="section" id="statistics"> <h2>Statistics</h2> <pre class="literal-block"> # All commits $ git log --after=2016-12-31 --before=2017-04-01 --reverse --branches='*' --author=Stinner &gt; 2017Q1 $ grep '^commit ' 2017Q1|wc -l 121 # Exclude merges $ git log --no-merges --after=2016-12-31 --before=2017-04-01 --reverse --branches='*' --author=Stinner|grep '^commit '|wc -l 105 # master branch (excluding merges) $ git log --no-merges --after=2016-12-31 --before=2017-04-01 --reverse --author=Stinner origin/master|grep '^commit '|wc -l 98 # Only merges $ git log --merges --after=2016-12-31 --before=2017-04-01 --reverse --branches='*' --author=Stinner|grep '^commit '|wc -l 16 </pre> <p>Statistics: <strong>98</strong> commits in the master branch, 16 merge commits (done using Mercurial before the migration to GitHub, and then converted to Git), and 7 other commits (likely backports), total: <strong>121</strong> commits.</p> </div> <div class="section" id="optimization"> <h2>Optimization</h2> <p>With the work done in 2016 on FASTCALL, it became much easier to optimize code by using the new FASTCALL API.</p> <div class="section" id="python-slots"> <h3>Python slots</h3> <p>Issue #29507: I worked with <strong>INADA Naoki</strong> to continue the work he did with <strong>Yury Selivanov</strong> on optimizing method calls. We optimized &quot;slots&quot; implemented in Python. Slots is an internal optimization to call &quot;dunder&quot; methods like <tt class="docutils literal">__getitem__()</tt>.</p> <p>For Python methods, get the unbound Python function and prepend arguments with <em>self</em>, rather than calling the descriptor which creates a temporary PyMethodObject.</p> <p>Add a new _PyObject_FastCall_Prepend() function used to call the unbound Python method with <em>self</em>. It avoids the creation of a temporary tuple to pass positional arguments.</p> <p>Avoiding a temporary PyMethodObject and a temporary tuple makes Python slots up to <strong>1.46x faster</strong>. Microbenchmark on a <tt class="docutils literal">__getitem__()</tt> method implemented in Python:</p> <pre class="literal-block"> Median +- std dev: 121 ns +- 5 ns -&gt; 82.8 ns +- 1.0 ns: 1.46x faster (-31%) </pre> </div> <div class="section" id="struct-module"> <h3>struct module</h3> <p>In the issue #29300, <strong>Serhiy Storchaka</strong> and me converted most methods in the C <tt class="docutils literal">_struct</tt> module to Argument Clinic to make them use the FASTCALL calling convention. Using METH_FASTCALL avoids the creation of temporary tuple to pass positional arguments and so is faster. For example, <tt class="docutils literal"><span class="pre">struct.pack(&quot;i&quot;,</span> 1)</tt> becomes <strong>1.56x faster</strong> (-36%):</p> <pre class="literal-block"> $ ./python -m perf timeit \ -s 'import struct; pack=struct.pack' 'pack(&quot;i&quot;, 1)' \ --compare-to=../default-ref/python Median +- std dev: 119 ns +- 1 ns -&gt; 76.8 ns +- 0.4 ns: 1.56x faster (-36%) Significant (t=295.91) </pre> <p>The difference is only <tt class="docutils literal">42.2 ns</tt>, but since the function only takes <tt class="docutils literal">76.8 ns</tt>, the difference is significant. The speedup can also be explained by more efficient functions used to parse arguments. The new functions now use a cache on the format string.</p> </div> <div class="section" id="deque-module"> <h3>deque module</h3> <p>Similar change in the deque module, I modified the index(), insert() and rotate() methods to use METH_FASTCALL. Speedup:</p> <ul class="simple"> <li>d.index(): <strong>1.24x faster</strong></li> <li>d.rotate(1): 1.24x faster</li> <li>d.insert(): 1.18x faster</li> <li>d.rotate(): 1.10x faster</li> </ul> </div> </div> <div class="section" id="tricky-bug"> <h2>Tricky bug</h2> <div class="section" id="test-exceptions-test-unraisable"> <h3>test_exceptions.test_unraisable()</h3> <p>The optimization on Python slots (issue #29507) caused a regression in the test_unraisable() unit test of test_exceptions.</p> <p>The <tt class="docutils literal">test_unraisable()</tt> method expects that <tt class="docutils literal">PyErr_WriteUnraisable(method)</tt> fails on <tt class="docutils literal">repr(method)</tt>.</p> <p>Before the change, <tt class="docutils literal">slot_tp_finalize()</tt> called <tt class="docutils literal">PyErr_WriteUnraisable()</tt> with a PyMethodObject. In this case, <tt class="docutils literal">repr(method)</tt> calls <tt class="docutils literal">repr(self)</tt> which is <tt class="docutils literal">BrokenRepr.__repr__()</tt> and the calls raises a new exception.</p> <p>After the change, <tt class="docutils literal">slot_tp_finalize()</tt> uses an unbound method: <tt class="docutils literal">repr()</tt> is called on a regular <tt class="docutils literal">__del__()</tt> method which doesn't call <tt class="docutils literal">repr(self)</tt> and so <tt class="docutils literal">repr()</tt> doesn't fail anymore.</p> <p>The fix is to remove the BrokenRepr unit test, since <tt class="docutils literal">PyErr_WriteUnraisable()</tt> doesn't call <tt class="docutils literal">__repr__()</tt> anymore.</p> <p>The removed test was really implementation specific, and my optimization &quot;fixed&quot; the bug or &quot;broke&quot; the test. It's hard to say :-)</p> </div> <div class="section" id="unittest-assertraises-reference-cycle"> <h3>unittest assertRaises() reference cycle</h3> <p>At April 2015, <strong>Vjacheslav Fyodorov</strong> reported a reference cycle in the assertRaises() method of the unittest module: bpo-23890.</p> <p>When the context manager API of the <tt class="docutils literal">assertRaises()</tt> method is used, the context manager returns an object which contains the exception. So the exception is kept alive longer than usual.</p> <p>Python 3 exceptions now store traceback objects which contain local variables. If a function stores the current exception in a local variable and the frame of this function is part of the traceback, we get a reference cycle:</p> <blockquote> exception -&gt; traceback &gt; frame -&gt; variable -&gt; exception</blockquote> <p>I fixed the reference cycle by manually clearing local variables. Example of change of my commit:</p> <pre class="literal-block"> try: return context.handle('assertRaises', args, kwargs) finally: # bpo-23890: manually break a reference cycle context = None </pre> <p>It's not the first time that I fixed such reference cycle in the unit test module. My previous fix was the issue #19880. Fix a reference leak in unittest.TestCase. Explicitly break reference cycles between frames and the <tt class="docutils literal">_Outcome</tt> instance: commit <a class="reference external" href="https://github.com/python/cpython/commit/031bd532c48cf20a9cbf438bdae75dde49e36c51">031bd532</a>.</p> </div> </div> <div class="section" id="fastcall-optimizations"> <h2>FASTCALL optimizations</h2> <p>FASTCALL is my project to avoid temporary tuple to pass positional arguments and avoid temporary dictionary to pass keyword arguments when calling a function. It optimizes function calls in general.</p> <p>I continued work on FASTCALL to optimize code further and use FASTCALL in more cases.</p> <div class="section" id="recursion-depth"> <h3>Recursion depth</h3> <p>In the issue #29306, I fixed the usage of Py_EnterRecursiveCall() to account correctly the recursion depth, to fix the code responsible to prevent C stack overflow:</p> <ul class="simple"> <li><tt class="docutils literal"><span class="pre">*PyCFunction_*Call*()</span></tt> functions now call <tt class="docutils literal">Py_EnterRecursiveCall()</tt>.</li> <li><tt class="docutils literal">PyObject_Call()</tt> now calls directly <tt class="docutils literal">_PyFunction_FastCallDict()</tt> and <tt class="docutils literal">PyCFunction_Call()</tt> to avoid calling <tt class="docutils literal">Py_EnterRecursiveCall()</tt> twice per function call</li> </ul> </div> <div class="section" id="support-position-arguments"> <h3>Support position arguments</h3> <p>The issue #29286 enhanced Argument Clinic to use FASTCALL for functions which only accept positional arguments:</p> <ul class="simple"> <li>Rename _PyArg_ParseStack to _PyArg_ParseStackAndKeywords</li> <li>Add _PyArg_ParseStack() helper function</li> <li>Add _PyArg_NoStackKeywords() helper function.</li> <li>Add _PyArg_UnpackStack() function helper</li> <li>Argument Clinic: Use METH_FASTCALL calling convention instead of METH_VARARGS to parse position arguments and to parse &quot;boring&quot; position arguments.</li> </ul> </div> <div class="section" id="functions-converted-to-fastcall"> <h3>Functions converted to FASTCALL</h3> <ul class="simple"> <li>_hashopenssl module</li> <li>collections.OrderedDict methods (some of them, not all)</li> <li>__build_class__(), getattr(), next() and sorted() builtin functions</li> <li>type_prepare() C function, used in type constructor</li> <li>dict.get() and dict.setdefault() now use Argument Clinic. The signature of docstrings is also enhanced. For example, <tt class="docutils literal"><span class="pre">get(...)</span></tt> becomes <tt class="docutils literal">get(self, key, default=None, /)</tt>. Add also a note explaining why dict_update() doesn't use METH_FASTCALL.</li> </ul> </div> <div class="section" id="optimizations"> <h3>Optimizations</h3> <ul class="simple"> <li>Issue #28839: Optimize function_call(), now simply calls _PyFunction_FastCallDict() which is more efficient (fast paths for the common case, optimized code object and no keyword argument).</li> <li>Issue #28839: Optimize _PyFunction_FastCallDict() when kwargs is an empty dictionary, avoid the creation of an useless empty tuple.</li> <li>Issue #29259: Write fast path in _PyCFunction_FastCallKeywords() for METH_FASTCALL, avoid the creation of a temporary dictionary for keyword arguments.</li> <li>Issue #29259, #29263. methoddescr_call() creates a PyCFunction object, call it and the destroy it. Add a new _PyMethodDef_RawFastCallDict() method to avoid the temporary PyCFunction object.</li> <li>PyCFunction_Call() now calls _PyCFunction_FastCallDict()</li> <li>bpo-29735: Optimize partial_call(): avoid tuple. Add _PyObject_HasFastCall(). Fix also a performance regression in partial_call() if the callable doesn't support FASTCALL.</li> </ul> </div> <div class="section" id="bugfixes"> <h3>Bugfixes</h3> <ul class="simple"> <li>Issue #29286: _PyStack_UnpackDict() now returns -1 on error. Change _PyStack_UnpackDict() prototype to be able to notify of failure when args is NULL.</li> <li>Fix PyCFunction_Call() performance issue. Issue #29259, #29465: PyCFunction_Call() doesn't create anymore a redundant tuple to pass positional arguments for METH_VARARGS. Add a new cfunction_call() subfunction.</li> </ul> </div> <div class="section" id="objects-call-c-file"> <h3>Objects/call.c file</h3> <p>The issue #29465 moved all C functions &quot;calling functions&quot; to a new Objects/call.c file. Moving all functions at the same place should help to keep the code consistent. It might also help the compiler to inline code more easily, or maybe help to cache more machine code in CPU instruction cache.</p> <p>This change was made during the GitHub migration. Since the change is big (modify many <tt class="docutils literal">.c</tt> files), I got many conflicts and it was annoying to rebase it. I am now happy to get this <tt class="docutils literal">call.c</tt> file, it already helped me :-)</p> <p>Having <tt class="docutils literal">call.c</tt> also helps to keep helper functions need their callers, and prevent to expose them in the C API, even if they are exposed as private functions.</p> </div> <div class="section" id="don-t-optimize-keywords"> <h3>Don't optimize keywords</h3> <ul class="simple"> <li>Document that _PyFunction_FastCallDict() must copy kwargs. Issue #29318: Caller and callee functions must not share the dictionary: kwargs must be copied.</li> <li>Document why functools.partial() must copy kwargs. Add a comment to prevent further attempts to avoid a copy for optimization.</li> </ul> </div> </div> <div class="section" id="stack-consumption"> <h2>Stack consumption</h2> <p>A FASTCALL micro-optimization was blocked by Serhiy Storchaka because it increased the C stack consumption. In the past, I never analyzed the C stack consumption. Since I wanted to get this micro-optimization merged, I tried to reduce the consumption.</p> <p>At the beginning, I wrote a function to <strong>measure</strong> the C stack consumption in a reliable way. It took me a few iterations.</p> <p>Table showing the C stack consumption in bytes, and the difference compared to Python 3.5 (last release before I started working on FASTCALL):</p> <table border="1" class="docutils"> <colgroup> <col width="27%" /> <col width="22%" /> <col width="7%" /> <col width="22%" /> <col width="22%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Function</th> <th class="head">2.7</th> <th class="head">3.5</th> <th class="head">3.6</th> <th class="head">3.7</th> </tr> </thead> <tbody valign="top"> <tr><td>test_python_call</td> <td>1,360 (<strong>+352</strong>)</td> <td>1,008</td> <td>1,120 (<strong>+112</strong>)</td> <td>960 (<strong>-48</strong>)</td> </tr> <tr><td>test_python_getitem</td> <td>1,408 (<strong>+288</strong>)</td> <td>1,120</td> <td>1,168 (<strong>+48</strong>)</td> <td>880 (<strong>-240</strong>)</td> </tr> <tr><td>test_python_iterator</td> <td>1,424 (<strong>+192</strong>)</td> <td>1,232</td> <td>1,200 (<strong>-32</strong>)</td> <td>1,024 (<strong>-208</strong>)</td> </tr> <tr><td>Total</td> <td>4,192 (<strong>+832</strong>)</td> <td>3,360</td> <td>3,488 (<strong>+128</strong>)</td> <td>2,864 (<strong>-496</strong>)</td> </tr> </tbody> </table> <p>Table showing the number of function calls before a stack overflow, and the difference compared to Python 3.5:</p> <table border="1" class="docutils"> <colgroup> <col width="24%" /> <col width="23%" /> <col width="7%" /> <col width="23%" /> <col width="23%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Function</th> <th class="head">2.7</th> <th class="head">3.5</th> <th class="head">3.6</th> <th class="head">3.7</th> </tr> </thead> <tbody valign="top"> <tr><td>test_python_call</td> <td>6,161 (<strong>-2,153</strong>)</td> <td>8,314</td> <td>7,482 (<strong>-832</strong>)</td> <td>8,729 (<strong>+415</strong>)</td> </tr> <tr><td>test_python_getitem</td> <td>5,951 (<strong>-1,531</strong>)</td> <td>7,482</td> <td>7,174 (<strong>-308</strong>)</td> <td>9,522 (<strong>+2,040</strong>)</td> </tr> <tr><td>test_python_iterator</td> <td>5,885 (<strong>-916</strong>)</td> <td>6,801</td> <td>6,983 (<strong>+182</strong>)</td> <td>8,184 (<strong>+1,383</strong>)</td> </tr> <tr><td>Total</td> <td>17,997 (<strong>-4600</strong>)</td> <td>22,597</td> <td>21,639 (<strong>-958</strong>)</td> <td>26,435 (<strong>+3,838</strong>)</td> </tr> </tbody> </table> <p>Python 3.7 is the best of 2.7, 3.5, 3.6 and 3.7: lowest stack consumption and maximum number of calls (before a stack overflow) ;-)</p> <p>Changes:</p> <ul class="simple"> <li>call_method() now uses _PyObject_FastCall(). Issue #29233: Replace the inefficient _PyObject_VaCallFunctionObjArgs() with _PyObject_FastCall() in call_method() and call_maybe().</li> <li>Issue #29227: Inline call_function() into _PyEval_EvalFrameDefault() using Py_LOCAL_INLINE to reduce the stack consumption.</li> <li>Issue #29234: Inlining _PyStack_AsTuple() into callers increases their stack consumption, Disable inlining to optimize the stack consumption. Add _Py_NO_INLINE: use __attribute__((noinline)) of GCC and Clang.</li> </ul> </div> <div class="section" id="contributions"> <h2>Contributions</h2> <ul class="simple"> <li>Issue #28961: Fix unittest.mock._Call helper: don't ignore the name parameter anymore. Patch written by <strong>Jiajun Huang</strong>.</li> <li>Prohibit implicit C function declarations. Issue #27659: use -Werror=implicit-function-declaration when possible (GCC and Clang, but it depends on the compiler version). Patch written by <strong>Chi Hsuan Yen</strong>.</li> </ul> </div> <div class="section" id="os-urandom-and-getrandom"> <h2>os.urandom() and getrandom()</h2> <p>As usual, I had fun with os.urandom() in this quarter (see my previous article on urandom: <a class="reference external" href="https://vstinner.github.io/pep-524-os-urandom-blocking.html">PEP 524: os.urandom() now blocks on Linux in Python 3.6</a>).</p> <p>The glibc developers succeeded to implement a function getrandom() in glibc 2.25 (February 2017) to expose the &quot;new&quot; Linux getrandom() syscall which was introduced in Linux 3.17 (August 2014). Read the LWN article: <a class="reference external" href="https://lwn.net/Articles/711013/">The long road to getrandom() in glibc</a>.</p> <p>I created the issue #29157 because my os.urandom() implementation wasn't ready for the addition of a getrandom() function on Linux. My implementation using the getrandom() function didn't handle the ENOSYS error (syscall not supported), when Python is compiled on a recent kernel and glibc, but run on an older kernel and glibc.</p> <p>I rewrote the code to prefer getrandom() over getentropy():</p> <ul class="simple"> <li>dev_urandom() now calls py_getentropy(). Prepare the fallback to support getentropy() failure and falls back on reading from /dev/urandom.</li> <li>Simplify dev_urandom(). pyurandom() is now responsible to call getentropy() or getrandom(). Enhance also dev_urandom() and pyurandom() documentation.</li> <li>getrandom() is now preferred over getentropy(). The glibc 2.24 now implements getentropy() on Linux using the getrandom() syscall. But getentropy() doesn't support non-blocking mode. Since getrandom() is tried first, it's not more needed to explicitly exclude getentropy() on Solaris. Replace: &quot;if defined(HAVE_GETENTROPY) &amp;&amp; !defined(sun)&quot; with &quot;if defined(HAVE_GETENTROPY)&quot;</li> <li>Enhance py_getrandom() documentation. py_getentropy() now supports ENOSYS, EPERM &amp; EINTR</li> </ul> <p>IMHO the main enhancement was the documentation (comments) of the code. The main function pyrandom() now has this long comment:</p> <blockquote> <p>Read random bytes:</p> <ul class="simple"> <li>Return 0 on success</li> <li>Raise an exception (if raise is non-zero) and return -1 on error</li> </ul> <p>Used sources of entropy ordered by preference, preferred source first:</p> <ul class="simple"> <li>CryptGenRandom() on Windows</li> <li>getrandom() function (ex: Linux and Solaris): call py_getrandom()</li> <li>getentropy() function (ex: OpenBSD): call py_getentropy()</li> <li>/dev/urandom device</li> </ul> <p>Read from the /dev/urandom device if getrandom() or getentropy() function is not available or does not work.</p> <p>Prefer getrandom() over getentropy() because getrandom() supports blocking and non-blocking mode: see the PEP 524. Python requires non-blocking RNG at startup to initialize its hash secret, but os.urandom() must block until the system urandom is initialized (at least on Linux 3.17 and newer).</p> <p>Prefer getrandom() and getentropy() over reading directly /dev/urandom because these functions don't need file descriptors and so avoid ENFILE or EMFILE errors (too many open files): see the issue #18756.</p> <p>Only the getrandom() function supports non-blocking mode.</p> <p>Only use RNG running in the kernel. They are more secure because it is harder to get the internal state of a RNG running in the kernel land than a RNG running in the user land. The kernel has a direct access to the hardware and has access to hardware RNG, they are used as entropy sources.</p> <p>Note: the OpenSSL RAND_pseudo_bytes() function does not automatically reseed its RNG on fork(), two child processes (with the same pid) generate the same random numbers: see issue #18747. Kernel RNGs don't have this issue, they have access to good quality entropy sources.</p> <p>If raise is zero:</p> <ul class="simple"> <li>Don't raise an exception on error</li> <li>Don't call the Python signal handler (don't call PyErr_CheckSignals()) if a function fails with EINTR: retry directly the interrupted function</li> <li>Don't release the GIL to call functions.</li> </ul> </blockquote> </div> <div class="section" id="migration-to-github"> <h2>Migration to GitHub</h2> <p>In February 2017, the Mercurial repository was converted to Git and the development of CPython moved to GitHub at <a class="reference external" href="https://github.com/python/cpython/">https://github.com/python/cpython/</a>. I helped to polish the migration in early days:</p> <ul class="simple"> <li>Rename README to README.rst and enhance formatting</li> <li>bpo-29527: Don't treat warnings as error in Travis docs job</li> <li>Travis CI: run rstlint.py in the docs job. Currently, <a class="reference external" href="http://buildbot.python.org/all/buildslaves/ware-docs">http://buildbot.python.org/all/buildslaves/ware-docs</a> buildbot is only run as post-commit. For example, bpo-29521 (PR#41) introduced two warnings, unnotified by the Travis CI docs job. Modify the docs job to run toosl/rstlint.py. Fix also the two minor warnings which causes the buildbot slave to fail. Doc/Makefile: set PYTHON to python3.</li> <li>Add Travis CI and Codecov badges to README.</li> <li>Exclude myself from mention-bot. I made changes in almost all CPython files last 5 years, so mention-bot asks me to review basically all pull requests. I simply don't have the bandwidth to review everything, sorry! I prefer to select myself which PR I want to follow.</li> <li>bpo-27425: Add .gitattributes, fix Windows tests. Mark binary files as binay in .gitattributes to not translate newline characters in Git repositories on Windows.</li> </ul> </div> <div class="section" id="enhancements"> <h2>Enhancements</h2> <ul class="simple"> <li>Issue #29259: python-gdb.py now also looks for PyCFunction in the current frame, not only in the older frame. python-gdb.py now also supports method-wrapper (wrapperobject) objects (Issue #29367).</li> <li>Issue #26273: Document the new TCP_USER_TIMEOUT and TCP_CONGESTION constants</li> <li>bpo-29919: Remove unused imports found by pyflakes. Make also minor PEP8 coding style fixes on modified imports.</li> <li>bpo-29887: Test normalization now fails if download fails; fix also a ResourceWarning.</li> </ul> </div> <div class="section" id="security"> <h2>Security</h2> <ul class="simple"> <li>Backport for Python 3.4. Issues #27850 and #27766: Remove 3DES from ssl default cipher list and add ChaCha20 Poly1305. See the <a class="reference external" href="http://python-security.readthedocs.io/vuln/cve-2016-2183_sweet32_attack_des_3des.html">CVE-2016-2183: Sweet32 attack (DES, 3DES)</a> vulnerability.</li> </ul> </div> <div class="section" id="regrtest"> <h2>regrtest</h2> <p>regrtest is the runner of the Python test suite. Changes:</p> <ul class="simple"> <li>regrtest: don't fail immediately if a child does crash. Issue #29362: Catch a crash of a worker process as a normal failure and continue to run next tests. It allows to get the usual test summary: single line result (OK/FAIL), total duration, etc.</li> <li>Fix regrtest -j0 -R output: write also dots into stderr, instead of stdout.</li> </ul> </div> <div class="section" id="bugfixes-1"> <h2>Bugfixes</h2> <ul class="simple"> <li>Issue #29140: Fix hash(datetime.time). Fix time_hash() function: replace DATE_xxx() macros with TIME_xxx() macros. Before, the hash function used a wrong value for microseconds if fold is set (equal to 1).</li> <li>Issue #29174, #26741: Fix subprocess.Popen.__del__() on Python shutdown. subprocess.Popen.__del__() now keeps a strong reference to warnings.warn() function. The change allows to log the warning late at Python finalization. Before the warning was ignored or logged an error instead of the warning.</li> <li>Issue #25591: Fix test_imaplib if the module ssl is missing.</li> <li>Fix script_helper.run_python_until_end(): copy the <tt class="docutils literal">SYSTEMROOT</tt> environment variable. Windows requires at least the SYSTEMROOT environment variable to start Python. If run_python_until_end() doesn't copy SYSTEMROOT, the function always fail on Windows.</li> <li>Fix datetime.fromtimestamp(): check bounds. Issue #29100: Fix datetime.fromtimestamp() regression introduced in Python 3.6.0: check minimum and maximum years.</li> <li>Fix test_datetime on system with 32-bit time_t. Issue #29100: Catch OverflowError in the new test_timestamp_limits() test.</li> <li>Fix test_datetime on Windows. Issue #29100: On Windows, datetime.datetime.fromtimestamp(min_ts) fails with an OSError in test_timestamp_limits().</li> <li>bpo-29176: Fix the name of the _curses.window class. Set name to <tt class="docutils literal">_curses.window</tt> instead of <tt class="docutils literal">_curses.curses window</tt> with a space!?</li> <li>bpo-29619: os.stat() and os.DirEntry.inodeo() now convert inode (st_ino) using unsigned integers to support very large inodes (larger than 2^31).</li> </ul> </div> speed.python.org results: March 20172017-03-29T00:40:00+02:002017-03-29T00:40:00+02:00Victor Stinnertag:vstinner.github.io,2017-03-29:/speed-python-org-march-2017.html<p>In feburary 2017, CPython from Bitbucket with Mercurial moved to GitHub with Git: read <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-February/147381.html">[Python-Dev] CPython is now on GitHub</a> by Brett Cannon.</p> <p>In 2016, I worked on speed.python.org to automate running benchmarks and make benchmarks more stable. At the end, I had a single command to:</p> <ul class="simple"> <li>tune …</li></ul><p>In feburary 2017, CPython from Bitbucket with Mercurial moved to GitHub with Git: read <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2017-February/147381.html">[Python-Dev] CPython is now on GitHub</a> by Brett Cannon.</p> <p>In 2016, I worked on speed.python.org to automate running benchmarks and make benchmarks more stable. At the end, I had a single command to:</p> <ul class="simple"> <li>tune the system for benchmarks</li> <li>compile CPython using LTO+PGO</li> <li>install CPython</li> <li>install performance</li> <li>run performance</li> <li>upload results</li> </ul> <p>But my tools were written for Mercurial and speed.python.org uses Mercurial revisions as keys for changes. Since the CPython repository was converted to Git, I have to remove all old results and run again old benchmarks. But before removing everyhing, I took screenshots of the most interesting pages. It would prefer to keep a copy of all data, but it would require to write new tools and I am not motivated to do that.</p> <div class="section" id="python-3-7-compared-to-python-2-7"> <h2>Python 3.7 compared to Python 2.7</h2> <p>Benchmarks where Python 3.7 is <strong>faster</strong> than Python 2.7:</p> <img alt="python37_faster_py27" src="https://vstinner.github.io/images/speed2017/python37_faster_py27.png" /> <p>Benchmarks where Python 3.7 is <strong>slower</strong> than Python 2.7:</p> <img alt="python37_slower_py27" src="https://vstinner.github.io/images/speed2017/python37_slower_py27.png" /> </div> <div class="section" id="significant-optimizations"> <h2>Significant optimizations</h2> <p>CPython became regulary faster in 2016 on the following benchmarks.</p> <p>call_method, the main optimized was <a class="reference external" href="https://bugs.python.org/issue26110">Speedup method calls 1.2x</a>:</p> <img alt="call_method" src="https://vstinner.github.io/images/speed2017/call_method.png" /> <p>float:</p> <img alt="float" src="https://vstinner.github.io/images/speed2017/float.png" /> <p>hexiom:</p> <img alt="hexiom" src="https://vstinner.github.io/images/speed2017/hexiom.png" /> <p>nqueens:</p> <img alt="nqueens" src="https://vstinner.github.io/images/speed2017/nqueens.png" /> <p>pickle_list, something happened near September 2016:</p> <img alt="pickle_list" src="https://vstinner.github.io/images/speed2017/pickle_list.png" /> <p>richards:</p> <img alt="richards" src="https://vstinner.github.io/images/speed2017/richards.png" /> <p>scimark_lu, I like the latest dot!</p> <img alt="scimark_lu" src="https://vstinner.github.io/images/speed2017/scimark_lu.png" /> <p>scimark_sor:</p> <img alt="scimark_sor" src="https://vstinner.github.io/images/speed2017/scimark_sor.png" /> <p>sympy_sum:</p> <img alt="sympy_sum" src="https://vstinner.github.io/images/speed2017/sympy_sum.png" /> <p>telco is one of the most impressive, it became regulary faster:</p> <img alt="telco" src="https://vstinner.github.io/images/speed2017/telco.png" /> <p>unpickle_list, something happened between March and May 2016:</p> <img alt="unpickle_list" src="https://vstinner.github.io/images/speed2017/unpickle_list.png" /> </div> <div class="section" id="the-enum-change"> <h2>The enum change</h2> <p>One change related to the <tt class="docutils literal">enum</tt> module had significant impact on the two following benchmarks.</p> <p>python_startup:</p> <img alt="python_startup" src="https://vstinner.github.io/images/speed2017/python_startup.png" /> <p>See &quot;Python startup performance regression&quot; section of <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q4.html">My contributions to CPython during 2016 Q4</a> for the explanation on changes around September 2016.</p> <p>regex_compile became 1.2x slower (312 ms =&gt; 376 ms: +20%) because constants of the <tt class="docutils literal">re</tt> module became <tt class="docutils literal">enum</tt> objects: see <a class="reference external" href="http://bugs.python.org/issue28082">convert re flags to (much friendlier) IntFlag constants (issue #28082)</a>.</p> <img alt="regex_compile" src="https://vstinner.github.io/images/speed2017/regex_compile.png" /> </div> <div class="section" id="benchmarks-became-stable"> <h2>Benchmarks became stable</h2> <p>The following benchmarks are microbenchmarks which are impacted by many external factors. It's hard to get stable results. I'm happy to see that results are stable. I would say very stable compared to results when I started to work on the project!</p> <p>call_simple:</p> <img alt="call_simple" src="https://vstinner.github.io/images/speed2017/call_simple.png" /> <p>spectral_norm:</p> <img alt="spectral_norm" src="https://vstinner.github.io/images/speed2017/spectral_norm.png" /> </div> <div class="section" id="straight-line"> <h2>Straight line</h2> <p>It seems like no optimization had a significant impact on the following benchmarks. You can also see that benchmarks became stable, so it's easier to detect performance regression or significant optimization.</p> <p>dulwich_log:</p> <img alt="dulwich_log" src="https://vstinner.github.io/images/speed2017/dulwich_log.png" /> <p>pidigits:</p> <img alt="pidigits" src="https://vstinner.github.io/images/speed2017/pidigits.png" /> <p>sqlite_synth:</p> <img alt="sqlite_synth" src="https://vstinner.github.io/images/speed2017/sqlite_synth.png" /> <p>Apart something around April 2016, tornado_http result is stable:</p> <img alt="tornado_http" src="https://vstinner.github.io/images/speed2017/tornado_http.png" /> </div> <div class="section" id="unstable-benchmarks"> <h2>Unstable benchmarks</h2> <p>After months of efforts to make everything stable, some benchmarks are still unstable, even if temporary spikes are lower than before. See <a class="reference external" href="https://vstinner.github.io/analysis-python-performance-issue.html">Analysis of a Python performance issue</a> to see the size of previous tempoary performance spikes.</p> <p>regex_v8:</p> <img alt="regex_v8" src="https://vstinner.github.io/images/speed2017/regex_v8.png" /> <p>scimark_sparse_mat_mult:</p> <img alt="scimark_sparse_mat_mult" src="https://vstinner.github.io/images/speed2017/scimark_sparse_mat_mult.png" /> <p>unpickle_pure_python:</p> <img alt="unpickle_pure_python" src="https://vstinner.github.io/images/speed2017/unpickle_pure_python.png" /> </div> <div class="section" id="boring-results"> <h2>Boring results</h2> <p>There is nothing interesting to say on the following benchmark results.</p> <p>2to3:</p> <img alt="2to3" src="https://vstinner.github.io/images/speed2017/2to3.png" /> <p>crypto_pyaes:</p> <img alt="crypto_pyaes" src="https://vstinner.github.io/images/speed2017/crypto_pyaes.png" /> <p>deltablue:</p> <img alt="deltablue" src="https://vstinner.github.io/images/speed2017/deltablue.png" /> <p>logging_silent:</p> <img alt="logging_silent" src="https://vstinner.github.io/images/speed2017/logging_silent.png" /> <p>mako:</p> <img alt="mako" src="https://vstinner.github.io/images/speed2017/mako.png" /> <p>xml_etree_process:</p> <img alt="xml_etree_process" src="https://vstinner.github.io/images/speed2017/xml_etree_process.png" /> <p>xml_etre_iterparse:</p> <img alt="xml_etre_iterparse" src="https://vstinner.github.io/images/speed2017/xml_etre_iterparse.png" /> </div> FASTCALL issues2017-02-25T00:00:00+01:002017-02-25T00:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-25:/fastcall-issues.html<p>Here is the raw list of the 46 CPython issues I opended between 2016-04-21 and 2017-02-10 to implement my FASTCALL optimization. Most issues created in 2016 are already part of Python 3.6.0, some are already merged into the future Python 3.7, the few remaining issues are still …</p><p>Here is the raw list of the 46 CPython issues I opended between 2016-04-21 and 2017-02-10 to implement my FASTCALL optimization. Most issues created in 2016 are already part of Python 3.6.0, some are already merged into the future Python 3.7, the few remaining issues are still open.</p> <div class="section" id="fastcall-issues-1"> <h2>27 FASTCALL issues</h2> <ul class="simple"> <li>2016-04-21: <a class="reference external" href="http://bugs.python.org/issue26814">[WIP] Add a new _PyObject_FastCall() function which avoids the creation of a tuple or dict for arguments</a></li> <li>2016-05-26: <a class="reference external" href="http://bugs.python.org/issue27128">Add _PyObject_FastCall()</a></li> <li>2016-08-20: <a class="reference external" href="http://bugs.python.org/issue27809">Add _PyFunction_FastCallDict(): fast call with keyword arguments as a dict</a></li> <li>2016-08-20: <a class="reference external" href="http://bugs.python.org/issue27810">Add METH_FASTCALL: new calling convention for C functions</a></li> <li>2016-08-22: <a class="reference external" href="http://bugs.python.org/issue27830">Add _PyObject_FastCallKeywords(): avoid the creation of a temporary dictionary for keyword arguments</a></li> <li>2016-08-23: <a class="reference external" href="http://bugs.python.org/issue27840">functools.partial: don't copy keywoard arguments in partial_call()?</a> [<strong>REJECTED</strong>]</li> <li>2016-08-23: <a class="reference external" href="http://bugs.python.org/issue27841">Use fast call in method_call() and slot_tp_new()</a></li> <li>2016-08-23: <a class="reference external" href="http://bugs.python.org/issue27845">Optimize update_keyword_args() function</a></li> <li>2016-11-22: <a class="reference external" href="http://bugs.python.org/issue28770">Update python-gdb.py for fastcalls</a></li> <li>2016-11-30: <a class="reference external" href="http://bugs.python.org/issue28839">_PyFunction_FastCallDict(): replace PyTuple_New() with PyMem_Malloc()</a> [<strong>REJECTED</strong>]</li> <li>2016-12-02: <a class="reference external" href="http://bugs.python.org/issue28855">Compiler warnings in _PyObject_CallArg1()</a></li> <li>2016-12-02: <a class="reference external" href="http://bugs.python.org/issue28858">Fastcall uses more C stack</a></li> <li>2016-12-09: <a class="reference external" href="http://bugs.python.org/issue28915">Modify PyObject_CallFunction() to use fast call internally</a></li> <li>2017-01-10: <a class="reference external" href="http://bugs.python.org/issue29227">Reduce C stack consumption in function calls</a></li> <li>2017-01-10: <a class="reference external" href="http://bugs.python.org/issue29233">call_method(): call _PyObject_FastCall() rather than _PyObject_VaCallFunctionObjArgs()</a></li> <li>2017-01-11: <a class="reference external" href="http://bugs.python.org/issue29234">Disable inlining of _PyStack_AsTuple() to reduce the stack consumption</a></li> <li>2017-01-13: <a class="reference external" href="http://bugs.python.org/issue29259">Add tp_fastcall to PyTypeObject: support FASTCALL calling convention for all callable objects</a> [<strong>REJECTED</strong>]</li> <li>2017-01-13: <a class="reference external" href="http://bugs.python.org/issue29263">Implement LOAD_METHOD/CALL_METHOD for C functions</a></li> <li>2017-01-18: <a class="reference external" href="http://bugs.python.org/issue29306">Check usage of Py_EnterRecursiveCall() and Py_LeaveRecursiveCall() in new FASTCALL functions</a></li> <li>2017-01-19: <a class="reference external" href="http://bugs.python.org/issue29318">Optimize _PyFunction_FastCallDict() for **kwargs</a> [<strong>REJECTED</strong>]</li> <li>2017-01-24: <a class="reference external" href="http://bugs.python.org/issue29358">Add tp_fastnew and tp_fastinit to PyTypeObject, 15-20% faster object instanciation</a> [<strong>REJECTED</strong>]</li> <li>2017-01-24: <a class="reference external" href="http://bugs.python.org/issue29360">_PyStack_AsDict(): Don't check if all keys are strings nor if keys are unique</a></li> <li>2017-01-25: <a class="reference external" href="http://bugs.python.org/issue29367">python-gdb: display wrapper_call()</a></li> <li>2017-02-05: <a class="reference external" href="http://bugs.python.org/issue29451">Use _PyArg_Parser for _PyArg_ParseStack(): support positional only arguments</a></li> <li>2017-02-06: <a class="reference external" href="http://bugs.python.org/issue29465">Modify _PyObject_FastCall() to reduce stack consumption</a></li> <li>2017-02-09: <a class="reference external" href="http://bugs.python.org/issue29507">Use FASTCALL in call_method() to avoid temporary tuple</a></li> <li>2017-02-10: <a class="reference external" href="http://bugs.python.org/issue29524">Move functions to call objects into a new Objects/call.c file</a></li> </ul> </div> <div class="section" id="issues-converting-functions-to-fastcall"> <h2>3 issues converting functions to FASTCALL</h2> <ul class="simple"> <li>2017-01-16: <a class="reference external" href="http://bugs.python.org/issue29286">Use METH_FASTCALL in str methods</a></li> <li>2017-01-18: <a class="reference external" href="http://bugs.python.org/issue29312">Use FASTCALL in dict.update()</a> [<strong>REJECTED</strong>]</li> <li>2017-02-05: <a class="reference external" href="http://bugs.python.org/issue29452">Use FASTCALL for collections.deque methods: index, insert, rotate</a></li> </ul> </div> <div class="section" id="argument-clinic-issues"> <h2>6 Argument Clinic issues</h2> <p>Converting code to Argument Clinic converts METH_VARARGS methods to METH_FASTCALL.</p> <ul class="simple"> <li>2017-01-16: <a class="reference external" href="http://bugs.python.org/issue29289">Convert OrderedDict methods to Argument Clinic</a></li> <li>2017-01-17: <a class="reference external" href="http://bugs.python.org/issue29299">Argument Clinic: Fix signature of optional positional-only arguments</a></li> <li>2017-01-17: <a class="reference external" href="http://bugs.python.org/issue29300">Modify the _struct module to use FASTCALL and Argument Clinic</a></li> <li>2017-01-17: <a class="reference external" href="http://bugs.python.org/issue29301">decimal: Use FASTCALL and/or Argument Clinic</a></li> <li>2017-01-18: <a class="reference external" href="http://bugs.python.org/issue29311">Argument Clinic: convert dict methods</a></li> <li>2017-02-02: <a class="reference external" href="http://bugs.python.org/issue29419">Argument Clinic: inline PyArg_UnpackTuple and PyArg_ParseStack(AndKeyword)?</a></li> </ul> </div> <div class="section" id="other-optimization-issues"> <h2>10 other optimization issues</h2> <ul class="simple"> <li>2016-08-24: <a class="reference external" href="http://bugs.python.org/issue27848">C function calls: use Py_ssize_t rather than C int for number of arguments</a></li> <li>2016-09-07: <a class="reference external" href="http://bugs.python.org/issue28004">Optimize bytes.join(sequence)</a> [<strong>REJECTED</strong>]</li> <li>2016-11-05: <a class="reference external" href="http://bugs.python.org/issue28618">Decorate hot functions using __attribute__((hot)) to optimize Python</a></li> <li>2016-11-07: <a class="reference external" href="http://bugs.python.org/issue28637">Python startup performance regression</a></li> <li>2016-11-25: <a class="reference external" href="http://bugs.python.org/issue28800">Add RETURN_NONE bytecode instruction</a> [<strong>REJECTED</strong>]</li> <li>2016-11-25: <a class="reference external" href="http://bugs.python.org/issue28799">Drop CALL_PROFILE special build?</a></li> <li>2016-12-09: <a class="reference external" href="http://bugs.python.org/issue28924">Inline PyEval_EvalFrameEx() in callers</a> [<strong>REJECTED</strong>]</li> <li>2016-12-15: <a class="reference external" href="http://bugs.python.org/issue28977">Document PyObject_CallFunction() special case more explicitly</a></li> <li>2017-02-06: <a class="reference external" href="http://bugs.python.org/issue29461">Experiment usage of likely/unlikely in CPython core</a></li> <li>2017-02-08: <a class="reference external" href="http://bugs.python.org/issue29502">Should PyObject_Call() call the profiler on C functions, use C_TRACE() macro?</a></li> </ul> </div> FASTCALL microbenchmarks2017-02-24T22:00:00+01:002017-02-24T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-24:/fastcall-microbenchmarks.html<p>For my FASTCALL project (CPython optimization avoiding temporary tuples and dictionaries to pass arguments), I wrote many short microbenchmarks. I grouped them into a new Git repository: <a class="reference external" href="https://github.com/vstinner/pymicrobench">pymicrobench</a>. Benchmark results are required by CPython developers to prove that an optimization is worth it. It's not uncommon that I abandon a …</p><p>For my FASTCALL project (CPython optimization avoiding temporary tuples and dictionaries to pass arguments), I wrote many short microbenchmarks. I grouped them into a new Git repository: <a class="reference external" href="https://github.com/vstinner/pymicrobench">pymicrobench</a>. Benchmark results are required by CPython developers to prove that an optimization is worth it. It's not uncommon that I abandon a change because the speedup is not significant, makes CPython slower, or because the change is too complex. Last 12 months, I counted that I abandonned 9 optimization issues, rejected for different reasons, on a total of 46 optimization issues.</p> <p>This article gives Python 3.7 results of these microbenchmarks compared to Python 3.5 (before FASTCALL). I ignored 3 microbenchmarks which are between 2% and 5% slower: the code was not optimized and the result is not signifiant (less than 10% on a <em>microbenchmark</em> is not significant).</p> <p>On results below, the speedup is between 1.11x faster (-10%) and 1.92x faster (-48%). It's not easy to isolate the speedup of only FASTCALL. Since Python 3.5, Python 3.7 got many other optimizations.</p> <p>Using FASTCALL gives a speedup around 20 ns: measured on a patch to use FASTCALL. It's not a lot, but many builtin functions take less than 100 ns, so 20 ns is significant in practice! Avoiding a tuple to pass positional arguments is interesting, but FASTCALL also allows further internal optimizations.</p> <p>Microbenchmark on calling builtin functions:</p> <table border="1" class="docutils"> <colgroup> <col width="53%" /> <col width="11%" /> <col width="36%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Benchmark</th> <th class="head">3.5</th> <th class="head">3.7</th> </tr> </thead> <tbody valign="top"> <tr><td>struct.pack(&quot;i&quot;, 1)</td> <td>105 ns</td> <td>77.6 ns: 1.36x faster (-26%)</td> </tr> <tr><td>getattr(1, &quot;real&quot;)</td> <td>79.4 ns</td> <td>64.4 ns: 1.23x faster (-19%)</td> </tr> </tbody> </table> <p>Microbenchmark on calling methods of builtin types:</p> <table border="1" class="docutils"> <colgroup> <col width="53%" /> <col width="11%" /> <col width="36%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Benchmark</th> <th class="head">3.5</th> <th class="head">3.7</th> </tr> </thead> <tbody valign="top"> <tr><td>{1: 2}.get(7, None)</td> <td>84.9 ns</td> <td>61.6 ns: 1.38x faster (-27%)</td> </tr> <tr><td>collections.deque([None]).index(None)</td> <td>116 ns</td> <td>87.0 ns: 1.33x faster (-25%)</td> </tr> <tr><td>{1: 2}.get(1)</td> <td>79.4 ns</td> <td>59.6 ns: 1.33x faster (-25%)</td> </tr> <tr><td>&quot;a&quot;.replace(&quot;x&quot;, &quot;y&quot;)</td> <td>134 ns</td> <td>101 ns: 1.33x faster (-25%)</td> </tr> <tr><td>b&quot;&quot;.decode()</td> <td>71.5 ns</td> <td>54.5 ns: 1.31x faster (-24%)</td> </tr> <tr><td>b&quot;&quot;.decode(&quot;ascii&quot;)</td> <td>99.1 ns</td> <td>75.7 ns: 1.31x faster (-24%)</td> </tr> <tr><td>collections.deque.rotate(1)</td> <td>106 ns</td> <td>82.8 ns: 1.28x faster (-22%)</td> </tr> <tr><td>collections.deque.insert()</td> <td>778 ns</td> <td>608 ns: 1.28x faster (-22%)</td> </tr> <tr><td>b&quot;&quot;.join((b&quot;hello&quot;, b&quot;world&quot;) * 100)</td> <td>4.02 us</td> <td>3.32 us: 1.21x faster (-17%)</td> </tr> <tr><td>[0].count(0)</td> <td>53.9 ns</td> <td>46.3 ns: 1.16x faster (-14%)</td> </tr> <tr><td>collections.deque.rotate()</td> <td>72.6 ns</td> <td>63.1 ns: 1.15x faster (-13%)</td> </tr> <tr><td>b&quot;&quot;.join((b&quot;hello&quot;, b&quot;world&quot;))</td> <td>102 ns</td> <td>89.8 ns: 1.13x faster (-12%)</td> </tr> </tbody> </table> <p>Microbenchmark on builtin functions calling Python functions (callbacks):</p> <table border="1" class="docutils"> <colgroup> <col width="53%" /> <col width="11%" /> <col width="36%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Benchmark</th> <th class="head">3.5</th> <th class="head">3.7</th> </tr> </thead> <tbody valign="top"> <tr><td>map(lambda x: x, list(range(1000)))</td> <td>76.1 us</td> <td>61.1 us: 1.25x faster (-20%)</td> </tr> <tr><td>sorted(list(range(1000)), key=lambda x: x)</td> <td>90.2 us</td> <td>78.2 us: 1.15x faster (-13%)</td> </tr> <tr><td>filter(lambda x: x, list(range(1000)))</td> <td>81.8 us</td> <td>73.4 us: 1.11x faster (-10%)</td> </tr> </tbody> </table> <p>Microbenchmark on calling slots (<tt class="docutils literal">__getitem__</tt>, <tt class="docutils literal">__init__</tt>, <tt class="docutils literal">__int__</tt>) implemented in Python:</p> <table border="1" class="docutils"> <colgroup> <col width="53%" /> <col width="11%" /> <col width="36%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Benchmark</th> <th class="head">3.5</th> <th class="head">3.7</th> </tr> </thead> <tbody valign="top"> <tr><td>Python __getitem__: obj[0]</td> <td>167 ns</td> <td>87.0 ns: 1.92x faster (-48%)</td> </tr> <tr><td>call_pyinit_kw1</td> <td>348 ns</td> <td>240 ns: 1.45x faster (-31%)</td> </tr> <tr><td>call_pyinit_kw5</td> <td>564 ns</td> <td>401 ns: 1.41x faster (-29%)</td> </tr> <tr><td>call_pyinit_kw10</td> <td>960 ns</td> <td>734 ns: 1.31x faster (-24%)</td> </tr> <tr><td>Python __int__: int(obj)</td> <td>241 ns</td> <td>207 ns: 1.16x faster (-14%)</td> </tr> </tbody> </table> <p>Microbenchmark on calling a method descriptor (static method):</p> <table border="1" class="docutils"> <colgroup> <col width="53%" /> <col width="11%" /> <col width="36%" /> </colgroup> <thead valign="bottom"> <tr><th class="head">Benchmark</th> <th class="head">3.5</th> <th class="head">3.7</th> </tr> </thead> <tbody valign="top"> <tr><td>int.to_bytes(1, 4, &quot;little&quot;)</td> <td>177 ns</td> <td>103 ns: 1.72x faster (-42%)</td> </tr> </tbody> </table> <p>Benchmarks were run on <tt class="docutils literal"><span class="pre">speed-python</span></tt>, server used to run CPython benchmarks.</p> The start of the FASTCALL project2017-02-16T17:00:00+01:002017-02-16T17:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-16:/start-fastcall-project.html<div class="section" id="false-start"> <h2>False start</h2> <p>In April 2016, I experimented a Python change to avoid temporary tuple to call functions. Builtin functions were between 20 and 50% faster!</p> <p>Sadly, some benchmarks were randomy slower. It will take me four months to understand why!</p> </div> <div class="section" id="work-on-benchmarks"> <h2>Work on benchmarks</h2> <p>During four months, I worked on making …</p></div><div class="section" id="false-start"> <h2>False start</h2> <p>In April 2016, I experimented a Python change to avoid temporary tuple to call functions. Builtin functions were between 20 and 50% faster!</p> <p>Sadly, some benchmarks were randomy slower. It will take me four months to understand why!</p> </div> <div class="section" id="work-on-benchmarks"> <h2>Work on benchmarks</h2> <p>During four months, I worked on making benchmarks more stable. See my previous blog posts:</p> <ul class="simple"> <li><a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-system.html">My journey to stable benchmark, part 1 (system)</a> (May 21, 2016)</li> <li><a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html">My journey to stable benchmark, part 2 (deadcode)</a> (May 22, 2016)</li> <li><a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-average.html">My journey to stable benchmark, part 3 (average)</a> (May 23, 2016)</li> <li><a class="reference external" href="https://vstinner.github.io/perf-visualize-system-noise-with-cpu-isolation.html">Visualize the system noise using perf and CPU isolation</a> (June 16, 2016)</li> <li><a class="reference external" href="https://vstinner.github.io/intel-cpus.html">Intel CPUs: P-state, C-state, Turbo Boost, CPU frequency, etc.</a> (July 15, 2015)</li> <li><a class="reference external" href="https://vstinner.github.io/intel-cpus-part2.html">Intel CPUs (part 2): Turbo Boost, temperature, frequency and Pstate C0 bug</a> (September 23, 2016)</li> <li><a class="reference external" href="https://vstinner.github.io/analysis-python-performance-issue.html">Analysis of a Python performance issue</a> (November 19, 2016)</li> <li>...</li> </ul> <p>See my talk <a class="reference external" href="https://fosdem.org/2017/schedule/event/python_stable_benchmark/">How to run a stable benchmark</a> that I gave at FOSDEM 2017 (Brussels, Belgium): slides + video. I listed all the issues that I had to get reliable benchmarks.</p> </div> <div class="section" id="ask-for-permission"> <h2>Ask for permission</h2> <p>August 2016, I confirmed that my change didn't introduce any slowndown. So I asked for the permission on the python-dev mailing list to start pushing changes: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-August/145793.html">New calling convention to avoid temporarily tuples when calling functions</a>.</p> <p>Guido van Rossum asked me for benchmark results:</p> <blockquote> But is there a performance improvement?</blockquote> </div> <div class="section" id="benchmark-results"> <h2>Benchmark results</h2> <p>On micro-benchmarks, FASTCALL is much faster:</p> <ul class="simple"> <li><tt class="docutils literal">getattr(1, &quot;real&quot;)</tt> becomes <strong>44%</strong> faster</li> <li><tt class="docutils literal">list(filter(lambda x: x, <span class="pre">list(range(1000))))</span></tt> becomes <strong>31%</strong> faster</li> <li><tt class="docutils literal">namedtuple.attr</tt> (read the attribute) becomes <strong>23%</strong> faster</li> <li>...</li> </ul> <p>Full results:</p> <ul class="simple"> <li><a class="reference external" href="https://bugs.python.org/issue26814#msg263999">FASTCALL compared to Python 3.6 (default branch)</a></li> <li><a class="reference external" href="https://bugs.python.org/issue26814#msg264003">2.7 / 3.4 / 3.5 / 3.6 / 3.6 FASTCALL comparison</a></li> </ul> <p>On the <a class="reference external" href="https://bugs.python.org/issue26814#msg266359">CPython benchmark suite</a>, I also saw many faster benchmarks:</p> <ul class="simple"> <li>pickle_list: <strong>1.29x faster</strong></li> <li>etree_generate: <strong>1.22x faster</strong></li> <li>pickle_dict: <strong>1.19x faster</strong></li> <li>etree_process: <strong>1.16x faster</strong></li> <li>mako_v2: <strong>1.13x faster</strong></li> <li>telco: <strong>1.09x faster</strong></li> <li>...</li> </ul> </div> <div class="section" id="replies-to-my-email"> <h2>Replies to my email</h2> <p>I got two very positive replies, so I understood that it was ok.</p> <p>Brett Canon:</p> <blockquote> I just wanted to say I'm excited about this and I'm glad someone is taking advantage of what Argument Clinic allows for and what I know Larry had initially hoped AC would make happen!</blockquote> <p>Yury Selivanov:</p> <blockquote> Exceptional results, congrats Victor. Will be happy to help with code review.</blockquote> </div> <div class="section" id="real-start"> <h2>Real start</h2> <p>That's how the FASTCALL began for real! I started to push a long serie of patches adding new private functions and then modify code to call these new functions.</p> </div> My contributions to CPython during 2016 Q42017-02-16T11:00:00+01:002017-02-16T11:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-16:/contrib-cpython-2016q4.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q4 (october, november, december):</p> <pre class="literal-block"> hg log -r 'date(&quot;2016-10-01&quot;):date(&quot;2016-12-31&quot;)' --no-merges -u Stinner </pre> <p>Statistics: 105 non-merge commits + 31 merge commits (total: 136 commits).</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q3.html">My contributions to CPython during 2016 Q3</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to CPython during 2017 Q1</a>.</p> <p>Table of …</p><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q4 (october, november, december):</p> <pre class="literal-block"> hg log -r 'date(&quot;2016-10-01&quot;):date(&quot;2016-12-31&quot;)' --no-merges -u Stinner </pre> <p>Statistics: 105 non-merge commits + 31 merge commits (total: 136 commits).</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q3.html">My contributions to CPython during 2016 Q3</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2017q1.html">My contributions to CPython during 2017 Q1</a>.</p> <p>Table of Contents:</p> <ul class="simple"> <li>Python startup performance regression</li> <li>Optimizations</li> <li>Code placement and __attribute__((hot))</li> <li>Interesting bug: duplicated filters when tests reload the warnings module</li> <li>Contributions</li> <li>regrtest</li> <li>Other changes</li> </ul> <div class="section" id="python-startup-performance-regression"> <h2>Python startup performance regression</h2> <div class="section" id="regresion"> <h3>Regresion</h3> <p>My work on tracking Python performances started to become useful :-) I identified a performance slowdown on the <tt class="docutils literal">bm_python_startup</tt> benchmark (average time to start Python).</p> <p>Before September 2016, the start took around <strong>17.9 ms</strong>. At September 15, after the <a class="reference external" href="https://vstinner.github.io/cpython-sprint-2016.html">CPython sprint</a>, it was better: <strong>13.4 ms</strong>. But suddenly, at september 19, it became much worse: <strong>22.8 ms</strong>. What happened?</p> <p>Timeline of Python startup performance on speed.python.org:</p> <a class="reference external image-reference" href="https://speed.python.org/timeline/#/?exe=5&amp;ben=python_startup&amp;env=1&amp;revs=50&amp;equid=off&amp;quarts=on&amp;extr=on"> <img alt="Timeline of Python startup performance" src="https://vstinner.github.io/images/python_startup_regression.png" /> </a> <p>I looked at commits between September 15 and September 19, and I quickly identified the commit of the <a class="reference external" href="http://bugs.python.org/issue28082">convert re flags to (much friendlier) IntFlag constants (issue #28082)</a>. The <tt class="docutils literal">re</tt> module now imports the <tt class="docutils literal">enum</tt> module to get a better representation for their flags. Example:</p> <pre class="literal-block"> $ ./python Python 3.7.0a0 &gt;&gt;&gt; import re; re.M &lt;RegexFlag.MULTILINE: 8&gt; </pre> </div> <div class="section" id="revert"> <h3>Revert</h3> <p>At November 7, I opened the issue #28637 to propose to revert the commit to get back better Python startup performance. The revert was approved by Guido van Rossum, so I pushed it.</p> </div> <div class="section" id="better-fix"> <h3>Better fix</h3> <p>I also noticed that the <tt class="docutils literal">re</tt> module is not imported by default if Python is installed or if Python is run from its source code directory. The <tt class="docutils literal">re</tt> module is only imported by default if Python is installed in a virtual environment.</p> <p><strong>Serhiy Storchaka</strong> proposed a change to not import <tt class="docutils literal">re</tt> anymore in the <tt class="docutils literal">site</tt> module when Python runs into a virutal environment. Since the benefit was obvious (avoid an import at startup) and simple, it was quickly merged.</p> </div> <div class="section" id="restore-reverted-enum-change"> <h3>Restore reverted enum change</h3> <p>Since using <tt class="docutils literal">enum</tt> in <tt class="docutils literal">re</tt> has no more impact on Python startup performance by default, the <tt class="docutils literal">enum</tt> change was restored at November 14.</p> <p>Sadly, the <tt class="docutils literal">enum</tt> change still have an impact on performance: <tt class="docutils literal">re.compile()</tt> became 1.2x slower (312 ms =&gt; 376 ms: +20%).</p> <a class="reference external image-reference" href="https://speed.python.org/timeline/#/?exe=5&amp;ben=regex_compile&amp;env=1&amp;revs=50&amp;equid=off&amp;quarts=on&amp;extr=on"> <img alt="Timeline of re.compile() performance" src="https://vstinner.github.io/images/regex_compile_perf.png" /> </a> <p>I think that it's ok since it is very easy to use precompiled regular expressions in an application: store and reuse the result of <tt class="docutils literal">re.compile()</tt>, instead of calling directly <tt class="docutils literal">re.match()</tt> for example.</p> </div> </div> <div class="section" id="optimizations"> <h2>Optimizations</h2> <div class="section" id="fastcall"> <h3>FASTCALL</h3> <p>Same than 2016 Q3: I pushed a <em>lot</em> of changes for FASTCALL optimizations, but I will write a dedicated article later.</p> </div> <div class="section" id="no-int-int-micro-optimization-thank-you"> <h3>No int+int micro-optimization, thank you</h3> <p>After 2 years of benchmarking and a huge effort of making Python benchmarks more reliable and stable, I decided to close the issue #21955 &quot;ceval.c: implement fast path for integers with a single digit&quot; as REJECTED. It became clear to me that such micro-optimization has no effect on non-trivial code, but only on specially crafted micro-benchmarks. I added a comment in the C code to prevent further optimizations attempts:</p> <pre class="literal-block"> /* NOTE(haypo): Please don't try to micro-optimize int+int on CPython using bytecode, it is simply worthless. See http://bugs.python.org/issue21955 and http://bugs.python.org/issue10044 for the discussion. In short, no patch shown any impact on a realistic benchmark, only a minor speedup on microbenchmarks. */ </pre> </div> <div class="section" id="timeit"> <h3>timeit</h3> <p>I enhanced the <tt class="docutils literal">timeit</tt> benchmark module to make it more reliable (issue #28240):</p> <ul class="simple"> <li>Autorange now starts with a single loop iteration instead of 10. For example, <tt class="docutils literal">python3 <span class="pre">-m</span> timeit <span class="pre">-s</span> 'import time' 'time.sleep(1)'</tt> now only takes 4 seconds instead of 40 seconds.</li> <li>Repeat the benchmarks 5 times by default, instead of only 3, to make benchmarks more reliable.</li> <li>Remove <tt class="docutils literal"><span class="pre">-c/--clock</span></tt> and <tt class="docutils literal"><span class="pre">-t/--time</span></tt> command line options which were deprecated since Python 3.3.</li> <li>Add <tt class="docutils literal">nsec</tt> (nanosecond) unit to format timings</li> <li>Enhance formatting of raw timings in verbose mode. Add newlines to the output for readability.</li> </ul> </div> <div class="section" id="micro-optimizations"> <h3>Micro-optimizations</h3> <p>I also pushed two minor micro-optimizations:</p> <ul class="simple"> <li>Use <tt class="docutils literal">PyThreadState_GET()</tt> macro in performance critical code. <tt class="docutils literal">_PyThreadState_UncheckedGet()</tt> calls are not inlined as expected, even when using <tt class="docutils literal">gcc <span class="pre">-O3</span></tt>.</li> <li>Modify <tt class="docutils literal">type_setattro()</tt> to call directly <tt class="docutils literal">_PyObject_GenericSetAttrWithDict()</tt> instead of <tt class="docutils literal">PyObject_GenericSetAttr()</tt>. <tt class="docutils literal">PyObject_GenericSetAttr()</tt> is a thin wrapper to <tt class="docutils literal">_PyObject_GenericSetAttrWithDict()</tt>.</li> </ul> </div> </div> <div class="section" id="code-placement-and-attribute-hot"> <h2>Code placement and __attribute__((hot))</h2> <p>On <a class="reference external" href="https://speed.python.org/">speed.python.org</a>, I still noticed random performance slowdowns on the evil <tt class="docutils literal">call_simple</tt> benchmark. This benchmark is a <em>micro</em>-benchmark measuring the performance of a single Python function call, it is CPU-bound and very small and so impact by CPU caches. I was bitten again by significant performance slowdown only caused by code placement.</p> <p>It wasn't possible to use <em>Profiled Guided Optimization</em> (PGO) on the benchmark runner, since it used Ubuntu 14.04 and GCC crashed with an &quot;internal error&quot;.</p> <p>So I tried something different: mark &quot;hot functions&quot; with <tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt>. It's a GCC and Clang attribute helping code placements: &quot;hot functions&quot; are moved to a dedicated ELF section and so are closer in memory, and the compiler tries to optimize these functions even more.</p> <p>The following functions are considered as hot according to statistics collected by Linux <tt class="docutils literal">perf record</tt> and <tt class="docutils literal">perf report</tt> commands:</p> <ul class="simple"> <li>_PyEval_EvalFrameDefault()</li> <li>call_function()</li> <li>_PyFunction_FastCall()</li> <li>PyFrame_New()</li> <li>frame_dealloc()</li> <li>PyErr_Occurred()</li> </ul> <p>I added a <tt class="docutils literal">_Py_HOT_FUNCTION</tt> macro which uses <tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt> and used <tt class="docutils literal">_Py_HOT_FUNCTION</tt> on these functions (issue #28618).</p> <p>Read also my previous blog article <a class="reference external" href="https://vstinner.github.io/analysis-python-performance-issue.html">Analysis of a Python performance issue</a> for a deeper analysis.</p> <p>Sadly, after I wrote this blog post and after more analysis of <tt class="docutils literal">call_simple</tt> benchmark results, I saw that <tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt> wasn't enough. I still had random major performance slowdown.</p> <p>I dediced to upgrade the performance runner to Ubuntu 16.04. It was dangerous because nobody has access to the physical server, so it may takes weeks to repair it if I did a mistake. Hopefully, the upgrade gone smoothly and I was able to run again all benchmarks using PGO. As expected, using PGO+LTO, benchmark results are more stable!</p> </div> <div class="section" id="interesting-bug-duplicated-filters-when-tests-reload-the-warnings-module"> <h2>Interesting bug: duplicated filters when tests reload the warnings module</h2> <p>Python test suite has an old bug: the issue #18383 opened in July 2013. Sometimes, the test suite emits the following warning:</p> <pre class="literal-block"> [247/375] test_warnings Warning -- warnings.filters was modified by test_warnings </pre> <p>Since it's only a warning and it only occurs in the Python test suite, it was a low priority and took 3 years to be fixed! It also took time to find the right design to fix the root cause.</p> <div class="section" id="duplicated-filters"> <h3>Duplicated filters</h3> <p>test_warnings imports the <tt class="docutils literal">warnings</tt> module 3 times:</p> <pre class="literal-block"> import warnings as original_warnings # Python py_warnings = support.import_fresh_module('warnings', blocked=['_warnings']) # Python c_warnings = support.import_fresh_module('warnings', fresh=['_warnings']) # C </pre> <p>The Python <tt class="docutils literal">warnings</tt> module (<tt class="docutils literal">Lib/warnings.py</tt>) installs warning filters when the module is loaded:</p> <pre class="literal-block"> _processoptions(sys.warnoptions) </pre> <p>where <tt class="docutils literal">sys.warnoptions</tt> contains the value of the <tt class="docutils literal"><span class="pre">-W</span></tt> command line option.</p> <p>If the Python module is loaded more than once, filters are duplicated.</p> </div> <div class="section" id="first-fix-use-the-right-module"> <h3>First fix: use the right module</h3> <p>I pushed a first fix in september 2015.</p> <p>Fix test_warnings: don't modify warnings.filters. BaseTest now ensures that unittest.TestCase.assertWarns() uses the same warnings module than warnings.catch_warnings(). Otherwise, warnings.catch_warnings() will be unable to remove the added filter.</p> </div> <div class="section" id="second-fix-don-t-add-duplicated-filters"> <h3>Second fix: don't add duplicated filters</h3> <p>Issue #18383: the first patch was proposed by <strong>Florent Xicluna</strong> in 2013: save the length of filters, and remove newly added filters after <tt class="docutils literal">warnings</tt> modules are reloaded by <tt class="docutils literal">test_warnings</tt>. December 2014, <strong>Serhiy Storchaka</strong> reviewed the patch: he didn't like this <em>workaround</em>, he would like to fix the <em>root cause</em>.</p> <p>March 2015, <strong>Alex Shkop</strong> proposed a patch which avoids to add duplicated filters.</p> <p>September 2015, <strong>Martin Panter</strong> proposed to try to save/restore filters on the C warnings module. I proposed something similar in the issue #26742. But this solution has the same flaw that Florent's idea: it's only a workaround.</p> <p>Martin also proposed add a private flag to say that filters were already set to not try to add again same filters.</p> <p>Finally, in may 2016, Martin updated Alex's patch avoiding duplicated filters and pushed it.</p> </div> <div class="section" id="third-fix"> <h3>Third fix</h3> <p>The filter comparisons wasn't perfect. A filter can be made of a precompiled regular expression, whereas these objects don't implement comparison.</p> <p>November 2016, I opened the issue #28727 to propose to implement rich comparison for <tt class="docutils literal">_sre.SRE_Pattern</tt>.</p> <p>My first patch didn't implement <tt class="docutils literal">hash()</tt> and had different bugs. It took me almost one week and 6 versions to write complete unit tests and handle all cases: support bytes and Unicode and handle regular expression flags.</p> <p><strong>Serhiy Storchaka</strong> found bugs and helps me to write the implementation.</p> </div> </div> <div class="section" id="contributions"> <h2>Contributions</h2> <p>As usual, I reviewed and pushed changes written by other contributors:</p> <ul> <li><p class="first">Issue #27896: Allow passing sphinx options to Doc/Makefile. Patch written by <strong>Julien Palard</strong>.</p> </li> <li><p class="first">Issue #28476: Reuse math.factorial() in test_random. Patch written by <strong>Francisco Couzo</strong>.</p> </li> <li><p class="first">Issue #28479: Fix reST syntax in windows.rst. Patch written by <strong>Julien Palard</strong>.</p> </li> <li><p class="first">Issue #26273: Add new constants: <tt class="docutils literal">socket.TCP_CONGESTION</tt> (Linux 2.6.13) and <tt class="docutils literal">socket.TCP_USER_TIMEOUT</tt> (Linux 2.6.37). Patch written by <strong>Omar Sandoval</strong>.</p> </li> <li><p class="first">Issue #28979: Fix What's New in Python 3.6: compact dict is not faster, but only more compact. Patch written by <strong>Brendan Donegan</strong>.</p> </li> <li><p class="first">Issue #28147: Fix a memory leak in split-table dictionaries: <tt class="docutils literal">setattr()</tt> must not convert combined table into split table. Patch written by <strong>INADA Naoki</strong>.</p> </li> <li><p class="first">Issue #29109: Enhance tracemalloc documentation:</p> <ul class="simple"> <li>Wrong parameter name, 'group_by' instead of 'key_type'</li> <li>Don't round up numbers when explaining the examples. If they exactly match what can be read in the script output, it is to easier to understand (4.8 MiB vs 4855 KiB)</li> <li>Fix incorrect method link that was pointing to another module</li> </ul> <p>Patch written by <strong>Loic Pefferkorn</strong>.</p> </li> </ul> </div> <div class="section" id="regrtest"> <h2>regrtest</h2> <ul class="simple"> <li>regrtest <tt class="docutils literal"><span class="pre">--fromfile</span></tt> now accepts a list of filenames, not only a list of <em>test</em> names.</li> <li>Issue #28409: regrtest: fix the parser of command line arguments.</li> </ul> </div> <div class="section" id="other-changes"> <h2>Other changes</h2> <ul class="simple"> <li>Fix <tt class="docutils literal">_Py_normalize_encoding()</tt> function: It was not exactly the same than Python <tt class="docutils literal">encodings.normalize_encoding()</tt>: the C function now also converts to lowercase.</li> <li>Issue #28256: Cleanup <tt class="docutils literal">_math.c</tt>: only define fallback implementations when needed. It avoids producing deadcode when the system provides required math functions, and so enhance the code coverage.</li> <li>_csv: use <tt class="docutils literal">_PyLong_AsInt()</tt> to simplify the code, the function checks for the limits of the C <tt class="docutils literal">int</tt> type.</li> <li>Issue #28544: Fix <tt class="docutils literal">_asynciomodule.c</tt> on Windows. <tt class="docutils literal">PyType_Ready()</tt> sets the reference to <tt class="docutils literal">&amp;PyType_Type</tt>. <tt class="docutils literal">&amp;PyType_Type</tt> address cannot be resolved at compilation time (not on Windows?).</li> <li>Issue #28082: Add basic unit tests on the new <tt class="docutils literal">re</tt> enums.</li> <li>Issue #28691: Fix <tt class="docutils literal">warn_invalid_escape_sequence()</tt>: handle correctly <tt class="docutils literal">DeprecationWarning</tt> raised as an exception. First clear the current exception to replace the <tt class="docutils literal">DeprecationWarning</tt> exception with a <tt class="docutils literal">SyntaxError</tt> exception. Unit test written by <strong>Serhiy Storchaka</strong>.</li> <li>Issue #28023: Fix python-gdb.py on old GDB versions. Replace <tt class="docutils literal"><span class="pre">int(value.address)+offset</span></tt> with <tt class="docutils literal">value.cast(unsigned <span class="pre">char*)+offset</span></tt>. It seems like <tt class="docutils literal">int(value.address)</tt> fails on old GDB versions.</li> <li>Issue #28765: <tt class="docutils literal">_sre.compile()</tt> now checks the type of <tt class="docutils literal">groupindex</tt> and <tt class="docutils literal">indexgroup</tt> arguments. <tt class="docutils literal">groupindex</tt> must a dictionary and <tt class="docutils literal">indexgroup</tt> must be a tuple. Previously, <tt class="docutils literal">indexgroup</tt> was a list. Use a tuple to reduce the memory usage.</li> <li>Issue #28782: Fix a bug in the implementation <tt class="docutils literal">yield from</tt> (fix <tt class="docutils literal">_PyGen_yf()</tt> function). Fix the test checking if the next instruction is <tt class="docutils literal">YIELD_FROM</tt>. Regression introduced by the new &quot;WordCode&quot; bytecode (issue #26647). Fix reviewed by <strong>Serhiy Storchaka</strong> and <strong>Yury Selivanov</strong>.</li> <li>Issue #28792: Remove aliases from <tt class="docutils literal">_bisect</tt>. Remove aliases from the C module. Always implement <tt class="docutils literal">bisect()</tt> and <tt class="docutils literal">insort()</tt> aliases in <tt class="docutils literal">bisect.py</tt>. Remove also the <tt class="docutils literal"># backward compatibility</tt> comment: there is no plan to deprecate nor remove these aliases. When keys are equal, it makes sense to use <tt class="docutils literal">bisect.bisect()</tt> and <tt class="docutils literal">bisect.insort()</tt>.</li> <li>Fix a <tt class="docutils literal">ResourceWarning</tt> in <tt class="docutils literal">generate_opcode_h.py</tt>. Use a context manager to close the Python file. Replace also <tt class="docutils literal">open()</tt> with <tt class="docutils literal">tokenize.open()</tt> to handle coding cookie of <tt class="docutils literal">Lib/opcode.py</tt>.</li> <li>Issue #28740: Add <tt class="docutils literal">sys.getandroidapilevel()</tt> function: return the build time API version of Android as an integer. Function only available on Android. The availability of this function can be tested to check if Python is running on Android.</li> <li>Issue #28152: Fix <tt class="docutils literal"><span class="pre">-Wunreachable-code</span></tt> warnings on Clang.<ul> <li>Don't declare dead code when the code is compiled with Clang.</li> <li>Replace C <tt class="docutils literal">if()</tt> with precompiler <tt class="docutils literal">#if</tt> to fix a warning on dead code when using Clang.</li> <li>Replace <tt class="docutils literal">0</tt> with <tt class="docutils literal">(0)</tt> to ignore a compiler warning about dead code on <tt class="docutils literal"><span class="pre">((int)(SEM_VALUE_MAX)</span> &lt; 0)</tt>: <tt class="docutils literal">SEM_VALUE_MAX</tt> is not negative on Linux.</li> </ul> </li> <li>Issue #28835: Fix a regression introduced in <tt class="docutils literal">warnings.catch_warnings()</tt>: call <tt class="docutils literal">warnings.showwarning()</tt> if it was overriden inside the context manager.</li> <li>Issue #28915: Replace <tt class="docutils literal">int</tt> with <tt class="docutils literal">Py_ssize_t</tt> in <tt class="docutils literal">modsupport</tt>. <tt class="docutils literal">Py_ssize_t</tt> type is better for indexes. The compiler might emit more efficient code for <tt class="docutils literal">i++</tt>. <tt class="docutils literal">Py_ssize_t</tt> is the type of a PyTuple index for example. Replace also <tt class="docutils literal">int endchar</tt> with <tt class="docutils literal">char endchar</tt>.</li> <li>Initialize variables to fix compiler warnings. Warnings seen on the &quot;AMD64 Debian PGO 3.x&quot; buildbot. Warnings are false positive, but variable initialization should not harm performances.</li> <li>Remove useless variable initialization. Don't initialize variables which are not used before they are assigned.</li> <li>Issue #28838: Cleanup <tt class="docutils literal">abstract.h</tt>. Rewrite all comments to use the same style than other Python header files: comment functions <em>before</em> their declaration, no newline between the comment and the declaration. Reformat some comments, add newlines, to make them easier to read. Quote argument like 'arg' to mention an argument in a comment.</li> <li>Issue #28838: <tt class="docutils literal">abstract.h</tt>: remove long outdated comment. The documentation of the Python C API is more complete and more up to date than this old comment. Removal suggested by <strong>Antoine Pitrou</strong>.</li> <li>python-gdb.py: catch <tt class="docutils literal">gdb.error</tt> on <tt class="docutils literal">gdb.selected_frame()</tt>.</li> <li>Issue #28383: <tt class="docutils literal">__hash__</tt> documentation recommends naive XOR to combine, but this is suboptimal. Update the documentation to suggest to reuse the <tt class="docutils literal">hash()</tt> function on a tuple, with an example.</li> </ul> </div> My contributions to CPython during 2016 Q32017-02-14T19:00:00+01:002017-02-14T19:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-14:/contrib-cpython-2016q3.html<p class="first last">My contributions to CPython during 2016 Q3</p> <p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q3 (july, august, september):</p> <pre class="literal-block"> hg log -r 'date(&quot;2016-07-01&quot;):date(&quot;2016-09-30&quot;)' --no-merges -u Stinner </pre> <p>Statistics: 161 non-merge commits + 29 merge commits (total: 190 commits).</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q2.html">My contributions to CPython during 2016 Q2</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q4.html">My contributions to CPython during 2016 Q4</a>.</p> <p>Table of Contents:</p> <ul class="simple"> <li>Two new core developers</li> <li>CPython sprint, September, in California</li> <li>PEP 524: Make os.urandom() blocking on Linux</li> <li>PEP 509: private dictionary version</li> <li>FASTCALL: optimization avoiding temporary tuple to call functions</li> <li>More efficient CALL_FUNCTION bytecode</li> <li>Work on optimization</li> <li>Interesting bug: hidden resource warnings</li> <li>Contributions</li> <li>Bugfixes</li> <li>regrtest changes</li> <li>Tests changes</li> <li>Other changes</li> </ul> <div class="section" id="two-new-core-developers"> <h2>Two new core developers</h2> <p>New core developers is the result of the productive third 2016 quarter.</p> <p>At september 25, 2016, Yury Selivanov proposed to give <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2016-September/004013.html">commit privileges for INADA Naoki</a>. Naoki became a core developer the day after!</p> <p>At november 14, 2016, I proposed to <a class="reference external" href="https://mail.python.org/pipermail/python-committers/2016-November/004045.html">promote Xiang Zhang as a core developer</a>. One week later, he also became a core developer! I mentored him during one month, and later let him push directly changes.</p> <p>Most Python core developers are men coming from North America and Europe. INADA Naoki comes from Japan and Xiang Zhang comes from China: more core developers from Asia, we increased the diversity of Python core developers!</p> </div> <div class="section" id="cpython-sprint-september-in-california"> <h2>CPython sprint, September, in California</h2> <p>I was invited at my first CPython sprint in September! Five days, September 5-9, at Instagram office in California, USA. I reviewed a lot of changes and pushed many new features! Read my previous blog post: <a class="reference external" href="https://vstinner.github.io/cpython-sprint-2016.html">CPython sprint, september 2016</a>.</p> </div> <div class="section" id="pep-524-make-os-urandom-blocking-on-linux"> <h2>PEP 524: Make os.urandom() blocking on Linux</h2> <p>I pushed the implementation my PEP 524: read my previous blog post: <a class="reference external" href="https://vstinner.github.io/pep-524-os-urandom-blocking.html">PEP 524: os.urandom() now blocks on Linux in Python 3.6</a>.</p> </div> <div class="section" id="pep-509-private-dictionary-version"> <h2>PEP 509: private dictionary version</h2> <p>Another enhancement from my <a class="reference external" href="http://faster-cpython.readthedocs.io/fat_python.html">FAT Python</a> project: my <a class="reference external" href="https://www.python.org/dev/peps/pep-0509/">PEP 509: Add a private version to dict</a> was approved at the CPython sprint by Guido van Rossum.</p> <p>The dictionary version is used by FAT Python to check quickly if a variable was modified in a Python namespace. Technically, a Python namespace is a regular dictionary.</p> <p>Using the feedback from the python-ideas mailing list on the first version of my PEP, I made further changes:</p> <ul class="simple"> <li>Use 64-bit unsigned integers on 32-bit system: &quot;A risk of an integer overflow every 584 years is acceptable.&quot; Using 32-bit, an overflow occurs every 4 seconds!</li> <li>Don't expose the version at Python level to prevent users writing optimizations based on it in Python. Reading the dictionary version in Python is as slow as a dictionary lookup, wheras the version is usually used to avoid a &quot;slow&quot; dictionary lookup. The version is only accessible at the C level.</li> </ul> <p>While my experimental FAT Python static optimizer didn't convince Guido, Yury Selivanov wrote yet another cache for global variables using the dictionary version: <a class="reference external" href="http://bugs.python.org/issue28158">Implement LOAD_GLOBAL opcode cache</a> (sadly, not merged yet).</p> <p>I added the private version to the builtin dict type with the issue #26058. The global dictionary version is incremented at each dictionary creation and at each dictionary change, and each dictionary has its own version as well.</p> </div> <div class="section" id="fastcall-optimization-avoiding-temporary-tuple-to-call-functions"> <h2>FASTCALL: optimization avoiding temporary tuple to call functions</h2> <p>Thanks to my work on making Python benchmarks more stable, I confirmed that my FASTCALL patches don't introduce performance regressions, and make Python faster in some specific cases.</p> <p>I started to push FASTCALL changes. It will take me 6 months to push most changes to enable fully FASTCALL &quot;everywhere&quot; in the code base and to finish the implementation.</p> <p>Following blog posts will describe FASTCALL changes, its history and performance enhancements. Spoiler: Python 3.6 is fast!</p> </div> <div class="section" id="more-efficient-call-function-bytecode"> <h2>More efficient CALL_FUNCTION bytecode</h2> <p>I reviewed and merged Demur Rumed's patch to make the CALL_FUNCTION opcodes more efficient. Demur implemented the design proposed by Serhiy Storchaka. Serhiy Storchaka also reviewied the implementation with me.</p> <p>Issue #27213: Rework CALL_FUNCTION* opcodes to produce shorter and more efficient bytecode:</p> <ul class="simple"> <li><tt class="docutils literal">CALL_FUNCTION</tt> now only accepts positional arguments</li> <li><tt class="docutils literal">CALL_FUNCTION_KW</tt> accepts positional arguments and keyword arguments, keys of keyword arguments are packed into a constant tuple.</li> <li><tt class="docutils literal">CALL_FUNCTION_EX</tt> is the most generic opcode: it expects a tuple and a dict for positional and keyword arguments.</li> </ul> <p><tt class="docutils literal">CALL_FUNCTION_VAR</tt> and <tt class="docutils literal">CALL_FUNCTION_VAR_KW</tt> opcodes have been removed.</p> <p>Demur Rumed also implemented &quot;Wordcode&quot;, a new bytecode format using fixed units of 16-bit: 8-bit opcode with 8-bit argument. Wordcode was merged in May 2016, see <a class="reference external" href="http://bugs.python.org/issue26647">issue #26647: ceval: use Wordcode, 16-bit bytecode</a>.</p> <p>All instructions have an argument: opcodes without argument use the argument <tt class="docutils literal">0</tt>. It allowed to remove the following conditional code in the very hot code of <tt class="docutils literal">Python/ceval.c</tt>:</p> <pre class="literal-block"> if (HAS_ARG(opcode)) oparg = NEXTARG(); </pre> <p>The bytecode is now fetched using 16-bit words, instead of loading one or two 8-bit words per instruction.</p> </div> <div class="section" id="work-on-optimization"> <h2>Work on optimization</h2> <p>I continued with work on the <a class="reference external" href="https://github.com/python/performance">performance</a> Python benchmark suite. The suite works on CPython and PyPy, but it's maybe not fine tuned for PyPy yet.</p> <ul class="simple"> <li>Issue #27938: Add a fast-path for us-ascii encoding</li> <li>Issue #15369: Remove the (old version of) pybench microbenchmark. Please use the new &quot;performance&quot; benchmark suite which includes a more recent version of pybench.</li> <li>Issue #15369. Remove old and unreliable pystone microbenchmark. Please use the new &quot;performance&quot; benchmark suite which is much more reliable.</li> </ul> </div> <div class="section" id="interesting-bug-hidden-resource-warnings"> <h2>Interesting bug: hidden resource warnings</h2> <p>At 2016-08-22, I started to investigate why &quot;Warning -- xxx was modfied by test_xxx&quot; warnings were not logged on some buildbots (issue #27829).</p> <p>I modified the code logging the warning to flush immediatly stderr: <tt class="docutils literal"><span class="pre">print(...,</span> flush=True)</tt>.</p> <p>19 days later, I tried to remove a quiet flag <tt class="docutils literal"><span class="pre">-q</span></tt> on the Windows build... but it was a mistake, this flag doesn't mean quiet in the modified batch script :-)</p> <p>13 days later, I finally understood that the <tt class="docutils literal"><span class="pre">-W</span></tt> option of regrtest was eating stderr if the test pass but the environment was modified.</p> <p>I fixed regrtest to log stderr in all cases, except if the test pass! It should now be easier to fix &quot;environment changed&quot; warnings emitted by regrtest.</p> </div> <div class="section" id="contributions"> <h2>Contributions</h2> <p>As usual, I reviewed and pushed changes written by other contributors:</p> <ul class="simple"> <li>Issue #27350: I reviewed and pushed the implementation of compact dictionaries preserving insertion order. This resulted in dictionaries using 20% to 25% less memory when compared to Python 3.5. The implementation was written by <strong>INADA Naoki</strong>, based on the PyPy implementation, with a design by Raymond Hettinger.</li> <li>&quot;make tags&quot;: remove <tt class="docutils literal"><span class="pre">-t</span></tt> option of <tt class="docutils literal">ctags</tt>. The option was kept for backward compatibility, but it was completly removed recently. Patch written by <strong>Stéphane Wirtel</strong>.</li> <li>Issue #27558: Fix a <tt class="docutils literal">SystemError</tt> in the implementation of &quot;raise&quot; statement. In a brand new thread, raise a RuntimeError since there is no active exception to reraise. Patch written by <strong>Xiang Zhang</strong>.</li> <li>Issue #28120: Fix <tt class="docutils literal">dict.pop()</tt> for splitted dictionary when trying to remove a &quot;pending key&quot;: a key not yet inserted in split-table. Patch by <strong>Xiang Zhang</strong>.</li> </ul> </div> <div class="section" id="bugfixes"> <h2>Bugfixes</h2> <ul> <li><p class="first">socket: Fix <tt class="docutils literal">internal_select()</tt> function. Bug found by <strong>Pavel Belikov</strong> (&quot;Fragment N1&quot;): <a class="reference external" href="http://www.viva64.com/en/b/0414/#ID0ECDAE">http://www.viva64.com/en/b/0414/#ID0ECDAE</a></p> </li> <li><p class="first">socket: use INVALID_SOCKET.</p> <ul class="simple"> <li>Replace <tt class="docutils literal">fd = <span class="pre">-1</span></tt> with <tt class="docutils literal">fd = INVALID_SOCKET</tt></li> <li>Replace <tt class="docutils literal">fd &lt; 0</tt> with <tt class="docutils literal">fd == INVALID_SOCKET</tt>: SOCKET_T is unsigned on Windows</li> </ul> <p>Bug found by Pavel Belikov (&quot;Fragment N1&quot;): <a class="reference external" href="http://www.viva64.com/en/b/0414/#ID0ECDAE">http://www.viva64.com/en/b/0414/#ID0ECDAE</a></p> </li> <li><p class="first">Issue #11048: ctypes, fix <tt class="docutils literal">CThunkObject_new()</tt></p> <ul class="simple"> <li>Initialize restype and flags fields to fix a crash when Python runs on a read-only file system</li> <li>Use <tt class="docutils literal">Py_ssize_t</tt> type rather than <tt class="docutils literal">int</tt> for the <tt class="docutils literal">i</tt> iterator variable</li> <li>Reorder assignements to be able to more easily check if all fields are initialized</li> </ul> <p>Initial patch written by <strong>Marcin Bachry</strong>.</p> </li> <li><p class="first">Issue #27744: socket: Fix memory leak in <tt class="docutils literal">sendmsg()</tt> and <tt class="docutils literal">sendmsg_afalg()</tt>. Release <tt class="docutils literal">msg.msg_iov</tt> memory block. Release memory on <tt class="docutils literal">PyMem_Malloc(controllen)</tt> failure</p> </li> <li><p class="first">Issue #27866: ssl: Fix refleak in <tt class="docutils literal">cipher_to_dict()</tt>.</p> </li> <li><p class="first">Issue #28077: Fix dict type, <tt class="docutils literal">find_empty_slot()</tt> only supports combined dictionaries.</p> </li> <li><p class="first">Issue #28200: Fix memory leak in <tt class="docutils literal">path_converter()</tt>. Replace <tt class="docutils literal">PyUnicode_AsWideCharString()</tt> <tt class="docutils literal">with PyUnicode_AsUnicodeAndSize()</tt>.</p> </li> <li><p class="first">Issue #27955: Catch permission error (<tt class="docutils literal">EPERM</tt>) in <tt class="docutils literal">py_getrandom()</tt>. Fallback on reading from the <tt class="docutils literal">/dev/urandom</tt> device when the <tt class="docutils literal">getrandom()</tt> syscall fails with <tt class="docutils literal">EPERM</tt>, for example if blocked by SECCOMP.</p> </li> <li><p class="first">Issue #27778: Fix a memory leak in <tt class="docutils literal">os.getrandom()</tt> when the <tt class="docutils literal">getrandom()</tt> is interrupted by a signal and a signal handler raises a Python exception.</p> </li> <li><p class="first">Issue #28233: Fix <tt class="docutils literal">PyUnicode_FromFormatV()</tt> error handling. Fix a memory leak if the format string contains a non-ASCII character: destroy the unicode writer.</p> </li> </ul> </div> <div class="section" id="regrtest-changes"> <h2>regrtest changes</h2> <ul class="simple"> <li>regrtest: rename <tt class="docutils literal"><span class="pre">--slow</span></tt> option to <tt class="docutils literal"><span class="pre">--slowest</span></tt> (to get same option name than the <tt class="docutils literal">testr</tt> tool). Thanks to optparse, --slow syntax still works ;-) Add --slowest option to buildbots. Display the top 10 slowest tests.</li> <li>regrtest: nicer output for durations. Use milliseconds and minutes units, not only seconds.</li> <li>regrtest: Add a summary of the tests at the end of tests output: &quot;Tests result: xxx&quot;. It was sometimes hard to check quickly if tests succeeded, failed or something bad happened.</li> <li>regrtest: accept options after test names. For example, <tt class="docutils literal">./python <span class="pre">-m</span> test test_os <span class="pre">-v</span></tt> runs <tt class="docutils literal">test_os</tt> in verbose mode. Before, regrtest tried to run a test called &quot;-v&quot;!</li> <li>Issue #28195: Fix <tt class="docutils literal">test_huntrleaks_fd_leak()</tt> of test_regrtest. Don't expect the fd leak message to be on a specific line number, just make sure that the line is present in the output.</li> </ul> <p>Example of a recent (2017-02-15) successful test run, truncated output:</p> <pre class="literal-block"> ... 0:08:20 [403/404] test_codecs passed 0:08:21 [404/404] test_threading passed 391 tests OK. 10 slowest tests: - test_multiprocessing_spawn: 1 min 24 sec - test_concurrent_futures: 1 min 3 sec - test_multiprocessing_forkserver: 60 sec ... 13 tests skipped: test_devpoll test_ioctl test_kqueue ... Total duration: 8 min 22 sec Tests result: SUCCESS </pre> </div> <div class="section" id="tests-changes"> <h2>Tests changes</h2> <ul> <li><p class="first">script_helper: kill the subprocess on error. If Popen.communicate() raises an exception, kill the child process to not leave a running child process in background and maybe create a zombi process. This change fixes a ResourceWarning in Python 3.6 when unit tests are interrupted by CTRL+c.</p> </li> <li><p class="first">Issue #27181: Skip test_statistics tests known to fail until a fix is found.</p> </li> <li><p class="first">Issue #18401: Fix test_pdb if $HOME is not set. HOME is not set on Windows for example.</p> </li> <li><p class="first">test_eintr: Fix <tt class="docutils literal">ResourceWarning</tt> warnings</p> </li> <li><p class="first">Buildbot: give 20 minute per test file. It seems like at least 2 buildbots need more than 15 minutes per test file. Example with &quot;AMD64 Snow Leop 3.x&quot;:</p> <pre class="literal-block"> 10 slowest tests: - test_tools: 14 min 40 sec - test_tokenize: 11 min 57 sec - test_datetime: 11 min 25 sec - ... </pre> </li> <li><p class="first">Issue #28176: test_asynico: fix test_sock_connect_sock_write_race(), increase the timeout from 10 seconds to 60 seconds.</p> </li> </ul> </div> <div class="section" id="other-changes"> <h2>Other changes</h2> <ul class="simple"> <li>Issue #22624: Python 3 now requires the <tt class="docutils literal">clock()</tt> function to build to simplify the C code.</li> <li>Issue #27404: tag security related changes with the &quot;[Security]&quot; prefix in the changelog Misc/NEWS.</li> <li>Issue #27776: <tt class="docutils literal">dev_urandom(raise=0)</tt> now closes the file descriptor on error</li> <li>Issue #27128, #18295: Use <tt class="docutils literal">Py_ssize_t</tt> in <tt class="docutils literal">_PyEval_EvalCodeWithName()</tt>. Replace <tt class="docutils literal">int</tt> type with <tt class="docutils literal">Py_ssize_t</tt> for index variables used for positional arguments. It should help to avoid integer overflow and help to emit better machine code for <tt class="docutils literal">i++</tt> (no trap needed for overflow). Make also the <tt class="docutils literal">total_args</tt> variable constant.</li> <li>Fix &quot;make tags&quot;: set locale to C to call sort. vim expects that the tags file is sorted using english collation, so it fails if the locale is french for example. Use LC_ALL=C to force english sorting order. Issue #27726.</li> <li>Issue #27698: Add <tt class="docutils literal">socketpair</tt> function to <tt class="docutils literal">socket.__all__</tt> on Windows</li> <li>Issue #27786: Simplify (optimize?) PyLongObject private function <tt class="docutils literal">x_sub()</tt>: the <tt class="docutils literal">z</tt> variable is known to be a new object which cannot be shared, <tt class="docutils literal">Py_SIZE()</tt> can be used directly to negate the number.</li> <li>Fix a clang warning in grammar.c. Clang is smarter than GCC and emits a warning for dead code on a function declared with <tt class="docutils literal"><span class="pre">__attribute__((__noreturn__))</span></tt> (the <tt class="docutils literal">Py_FatalError()</tt> function in this case).</li> <li>Issue #28114: Add unit tests on <tt class="docutils literal"><span class="pre">os.spawn*()</span></tt> to prepare to fix a crash with bytes environment.</li> <li>Issue #28127: Add <tt class="docutils literal">_PyDict_CheckConsistency()</tt>: function checking that a dictionary remains consistent after any change. By default, only basic attributes are tested, table content is not checked because the impact on Python performance is too important. <tt class="docutils literal">DEBUG_PYDICT</tt> must be defined (ex: <tt class="docutils literal">gcc <span class="pre">-D</span> DEBUG_PYDICT</tt>) to check also dictionaries content.</li> </ul> </div> CPython sprint, september 20162017-02-14T18:00:00+01:002017-02-14T18:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-14:/cpython-sprint-2016.html<p>I was invited at my first CPython sprint in September! Five days, September 5-9, at Instagram office in California, USA. The sprint was sponsored by Instagram, Microsoft, and the PSF.</p> <p><strong>First little game:</strong> Many happy faces, but <em>Where is Victor?</em></p> <a class="reference external image-reference" href="http://blog.python.org/2016/09/python-core-development-sprint-2016-36.html"> <img alt="CPython developers at the Facebook sprint" src="https://vstinner.github.io/images/cpython_sprint_2016_photo.jpg" /> </a> <p>IMHO it was the most productive CPython week ever :-) Having …</p><p>I was invited at my first CPython sprint in September! Five days, September 5-9, at Instagram office in California, USA. The sprint was sponsored by Instagram, Microsoft, and the PSF.</p> <p><strong>First little game:</strong> Many happy faces, but <em>Where is Victor?</em></p> <a class="reference external image-reference" href="http://blog.python.org/2016/09/python-core-development-sprint-2016-36.html"> <img alt="CPython developers at the Facebook sprint" src="https://vstinner.github.io/images/cpython_sprint_2016_photo.jpg" /> </a> <p>IMHO it was the most productive CPython week ever :-) Having Guido van Rossum in a room helped to get many PEPs accepted. Having a lot of highly skilled reviewers in the same room helped to get many new features and many PEP implementations merged much faster than usual.</p> <p><strong>Second little game:</strong> try to spot the sprint on the CPython commit statistics of the last 12 months (Feb, 2016-Feb, 2017) ;-)</p> <a class="reference external image-reference" href="https://github.com/python/cpython/graphs/commit-activity"> <img alt="CPython commits statistics" src="https://vstinner.github.io/images/cpython_sprint_2016_commits.png" /> </a> <div class="section" id="compact-dict"> <h2>Compact dict</h2> <p>Issue #27350: I reviewed and pushed the &quot;compact dict&quot; implementation which makes Python dictionaries ordered (by insertion order) by default. It reduces the memory usage of dictionaries betwen 20% and 25%.</p> <p>The implementation was written by INADA Naoki, based on the PyPy implementation, with a design by Raymond Hettinger.</p> </div> <div class="section" id="fastcall"> <h2>FASTCALL</h2> <p>&quot;Fast calls&quot;: Python 3.6 has a new private C API and a new METH_FASTCALL calling convention which avoids temporary tuple for positional arguments and avoids temporary dictionary for keyword arguments. Changes:</p> <ul class="simple"> <li>Add a new C calling convention: METH_FASTCALL</li> <li>Add _PyArg_ParseStack() function</li> <li>Add _PyCFunction_FastCallKeywords() function: issue #27810</li> <li>Add _PyObject_FastCallKeywords() function: issue #27830</li> </ul> </div> <div class="section" id="more-efficient-call-function-bytecode"> <h2>More efficient CALL_FUNCTION bytecode</h2> <p>I reviewed and pushed: &quot;Rework CALL_FUNCTION* opcodes to produce shorter and more efficient bytecode&quot; (issue #27213).</p> <p>Patch writen by Demur Rumed, design by Serhiy Storchaka, reviewed by Serhiy Storchaka and me.</p> </div> <div class="section" id="pep-509-add-a-private-version-to-dict"> <h2>PEP 509: Add a private version to dict</h2> <p>Guido approved my PEP 509 &quot;Add a new private version to the builtin dict type&quot;.</p> <p>I pushed the implementation.</p> </div> <div class="section" id="pep-524-make-os-urandom-blocking-on-linux"> <h2>PEP 524: Make os.urandom() blocking on Linux</h2> <p>I pushed the implementation of my PEP 524: &quot;Make os.urandom() blocking on Linux&quot;.</p> <p>Issue #27776: The os.urandom() function does now block on Linux 3.17 and newer until the system urandom entropy pool is initialized to increase the security.</p> <p>Read my previous blog post for the painful story behind the PEP: <a class="reference external" href="https://vstinner.github.io/pep-524-os-urandom-blocking.html">PEP 524: os.urandom() now blocks on Linux</a>.</p> </div> <div class="section" id="asynchronous-pep-525-and-530"> <h2>Asynchronous PEP 525 and 530</h2> <p>Guido van Rossum approved two PEPs of Yury Selivanov:</p> <ul class="simple"> <li>PEP 525: Asynchronous Generators</li> <li>PEP 530: Asynchronous Comprehensions</li> </ul> <p>I reviewed the huge C implementation with Yury on my side :-)</p> </div> <div class="section" id="unicode-escape-codec-optimization"> <h2>unicode_escape codec optimization</h2> <p>I reviewed and pushed &quot;Optimize unicode_escape and raw_unicode_escape&quot; (the isue #16334), patch written by Serhiy Storchaka.</p> </div> <div class="section" id="python-3-6-bugfixes"> <h2>Python 3.6 bugfixes</h2> <p>I happily found many issues including a major one: regular list-comprehension were completely broken :-)</p> <p>Another minor issue: SyntaxError didn't reported the correct line number in a specific case.</p> <p>Don't worry, Yury fixed both ;-)</p> </div> <div class="section" id="official-sprint-report"> <h2>Official sprint report</h2> <p>Read also the official report: <a class="reference external" href="http://blog.python.org/2016/09/python-core-development-sprint-2016-36.html">Python Core Development Sprint 2016: 3.6 and beyond!</a>.</p> </div> PEP 524: os.urandom() now blocks on Linux in Python 3.62017-02-14T12:00:00+01:002017-02-14T12:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-14:/pep-524-os-urandom-blocking.html<div class="section" id="getrandom-avoids-file-descriptors"> <h2>getrandom() avoids file descriptors</h2> <p>Last years, I'm making sometimes enhancements in the Python code used to generate random numbers, the C implementation of <tt class="docutils literal">os.urandom()</tt>. My main two changes were to use the new <tt class="docutils literal">getentropy()</tt> and <tt class="docutils literal">getrandom()</tt> functions when available on Linux, Solaris, OpenBSD, etc.</p> <p>In 2013, <tt class="docutils literal">os.urandom()</tt> opened …</p></div><div class="section" id="getrandom-avoids-file-descriptors"> <h2>getrandom() avoids file descriptors</h2> <p>Last years, I'm making sometimes enhancements in the Python code used to generate random numbers, the C implementation of <tt class="docutils literal">os.urandom()</tt>. My main two changes were to use the new <tt class="docutils literal">getentropy()</tt> and <tt class="docutils literal">getrandom()</tt> functions when available on Linux, Solaris, OpenBSD, etc.</p> <p>In 2013, <tt class="docutils literal">os.urandom()</tt> opened a file descriptor to read from <tt class="docutils literal">/dev/urandom</tt> and then closed it. It was decided to use a single private file descriptor and keep it open to prevent <tt class="docutils literal">EMFILE</tt> or <tt class="docutils literal">ENFILE</tt> errors (too many open files) under high system loads with many threads: see the issue #18756.</p> <p>The private file descriptor introduced a backward incompatible change in badly written programs. The code was modified to call <tt class="docutils literal">fstat()</tt> to check if the file descriptor was closed and then replaced with a different file descriptor (but same number): check if <tt class="docutils literal">st_dev</tt> or <tt class="docutils literal">st_ino</tt> attributes changed.</p> <p>In 2014, the new Linux kernel 3.17 added a new <tt class="docutils literal">getrandom()</tt> syscall which gives access to random bytes without having to handle a file descriptor. I modified <tt class="docutils literal">os.urandom()</tt> to call <tt class="docutils literal">getrandom()</tt> to avoid file descriptors, but a different issue appeared.</p> </div> <div class="section" id="getrandom-hangs-at-system-startup"> <h2>getrandom() hangs at system startup</h2> <p>On embedded devices and virtual machines, Python 3.5 started to hang at startup.</p> <p>On Debian, a systemd script used Python to compute a MD5 checksum, but Python was blocked during its initialization. Other users reported that Python blocked on importing the <tt class="docutils literal">random</tt> module, sometimes imported indirectly by a different module.</p> <p>Python was blocked on the <tt class="docutils literal">getrandom(0)</tt> syscall, waiting until the system collected enough entropy to initialize the urandom pool. It took longer than 90 seconds, so systemd killed the service with a timeout. As a consequence, the system boot takes longer than 90 seconds or can even fail!</p> </div> <div class="section" id="fix-python-startup"> <h2>Fix Python startup</h2> <p>The fix was obvious: call <tt class="docutils literal">getrandom(GRND_NONBLOCK)</tt> which fails immediately if the call would block, and fall back on reading from <tt class="docutils literal">/dev/urandom</tt> which doesn't block even if the entropy pool is not initialized yet.</p> <p>Quickly, our security experts complained that falling back on <tt class="docutils literal">/dev/urandom</tt> makes Python less secure. When the fall back path is taken, <tt class="docutils literal">/dev/urandom</tt> returns random number not suitable for security purpose (initialized with low entropy), wheras <a class="reference external" href="https://docs.python.org/dev/library/os.html#os.urandom">os.urandom() documenation</a> says: &quot;The returned data should be unpredictable enough for cryptographic applications&quot; (and &quot;though its exact quality depends on the OS implementation.&quot;).</p> <p>Calling <tt class="docutils literal">getrandom()</tt> in blocking mode for <tt class="docutils literal">os.urandom()</tt> makes Python more secure, but it doesn't fix the startup bug.</p> </div> <div class="section" id="discussion-storm"> <h2>Discussion storm</h2> <p>The proposed change started a huge rain of messages. More than 200 messages, maybe even more than 500 messages, on the bug tracker and python-dev mailing list. Everyone became a security expert and wanted to give his/her very important opinion, without listening to other arguments.</p> <p>Two Python security experts left the discussion.</p> <p>I also ignored new messages. I simply had not enough time to read all of them, and the discussion tone made me angry.</p> </div> <div class="section" id="new-mailing-list-and-two-new-peps"> <h2>New mailing list and two new PEPs</h2> <p>A new <tt class="docutils literal"><span class="pre">security-sig</span></tt> mailing list, subtitled &quot;os.urandom rehab clinic&quot;, was created just to take a decision on <tt class="docutils literal">os.urandom()</tt>!</p> <p>Nick Coghlan wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0522/">PEP 522: Allow BlockingIOError in security sensitive APIs</a>. Basically: he considers that there is no good default behaviour when <tt class="docutils literal">os.urandom()</tt> would block, so raise an exception to let users decide.</p> <p>I wrote <a class="reference external" href="https://www.python.org/dev/peps/pep-0524/">PEP 524: Make os.urandom() blocking on Linux</a>. My PEP proposes to make <tt class="docutils literal">os.urandom()</tt> blocking, <em>but</em> also modify Python startup to fall back on non-blocking RNG to initialize the secret hash seed and the <tt class="docutils literal">random</tt> module (which is <em>not</em> sensitive for security, except of <tt class="docutils literal">random.SystemRandom</tt>).</p> <p>Nick's PEP describes an important use case: be able to check if <tt class="docutils literal">os.urandom()</tt> would block. Instead of adding a flag to <tt class="docutils literal">os.urandom()</tt>, I chose to expose the low-level C <tt class="docutils literal">getrandom()</tt> function as a new Python <tt class="docutils literal">os.getrandom()</tt> function. Calling <tt class="docutils literal">os.getrandom(1, os.GRND_NONBLOCK)</tt> raises a <tt class="docutils literal">BlockingIOError</tt> exception, as Nick proposed for <tt class="docutils literal">os.urandom()</tt>, so it's possible to decide what to do in this case.</p> <p>While both PEPs are valid, IMHO my PEP was <em>less</em> backward incompatible, simpler and maybe closer to what users <em>expect</em>. The &quot;os.urandom() would block&quot; case is a special case with my PEP, but my PEP allows to decide what to do in that case (thanks to <tt class="docutils literal">os.getrandom()</tt>).</p> <p>Guido van Rossum approved my PEP and rejected Nick's PEP. I worked with Nick to implement my PEP.</p> </div> <div class="section" id="python-3-6-changes"> <h2>Python 3.6 changes</h2> <p>I added a new <tt class="docutils literal">os.getrandom()</tt> function: expose the Linux <tt class="docutils literal">getrandom()</tt> syscall (issue #27778). I also added the two getrandom() flags: <tt class="docutils literal">os.GRND_NONBLOCK</tt> and <tt class="docutils literal">os.GRND_RANDOM</tt>.</p> <p>I modified <tt class="docutils literal">os.urandom()</tt> to block on Linux: call <tt class="docutils literal">getrandom(0)</tt> instead of <tt class="docutils literal">getrandom(GRND_NONBLOCK)</tt> (issue #27776).</p> <p>I also added a private <tt class="docutils literal">_PyOS_URandomNonblock()</tt> function used to initialize the hash secret and used by <tt class="docutils literal">random.Random.seed()</tt> (used to initialize the <tt class="docutils literal">random</tt> module).</p> <p>The <tt class="docutils literal">os.urandom()</tt> function now blocks in Python 3.6 on Linux 3.17 and newer until the system urandom entropy pool is initialized to increase the security.</p> </div> <div class="section" id="read-also-lwn-articles"> <h2>Read also LWN articles</h2> <ul class="simple"> <li><a class="reference external" href="https://lwn.net/Articles/606141/">A system call for random numbers: getrandom()</a> (July 2014)</li> <li><a class="reference external" href="https://lwn.net/Articles/693189/">Python's os.urandom() in the absence of entropy</a> (July 2016) -- this story</li> <li><a class="reference external" href="https://lwn.net/Articles/711013/">The long road to getrandom() in glibc</a> (January 2017)</li> </ul> </div> My contributions to CPython during 2016 Q22017-02-12T18:00:00+01:002017-02-12T18:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-12:/contrib-cpython-2016q2.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q2 (april, may, june):</p> <pre class="literal-block"> hg log -r 'date(&quot;2016-04-01&quot;):date(&quot;2016-06-30&quot;)' --no-merges -u Stinner </pre> <p>Statistics: 52 non-merge commits + 22 merge commits (total: 74 commits).</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q1.html">My contributions to CPython during 2016 Q1</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q3.html">My contributions to CPython during 2016 Q3</a>.</p> <div class="section" id="start-of-my-work-on-optimization"> <h2>Start of …</h2></div><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q2 (april, may, june):</p> <pre class="literal-block"> hg log -r 'date(&quot;2016-04-01&quot;):date(&quot;2016-06-30&quot;)' --no-merges -u Stinner </pre> <p>Statistics: 52 non-merge commits + 22 merge commits (total: 74 commits).</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q1.html">My contributions to CPython during 2016 Q1</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q3.html">My contributions to CPython during 2016 Q3</a>.</p> <div class="section" id="start-of-my-work-on-optimization"> <h2>Start of my work on optimization</h2> <p>During 2016 Q2, I started to spend more time on optimizing CPython.</p> <p>I experimented a change on CPython: a new FASTCALL calling convention to avoid the creation of a temporary tuple to pass positional argulments: <a class="reference external" href="http://bugs.python.org/issue26814">issue26814</a>. Early results were really good: calling builtin functions became between 20% and 50% faster!</p> <p>Quickly, my optimization work was blocked by unreliable benchmarks. I spent the rest of the year 2016 analyzing benchmarks and making benchmarks more stable.</p> </div> <div class="section" id="subprocess-now-emits-resourcewarning"> <h2>subprocess now emits ResourceWarning</h2> <p>subprocess.Popen destructor now emits a ResourceWarning warning if the child process is still running (issue #26741). The warning helps to track and fix zombi processes. I updated asyncio to prevent a false ResourceWarning (warning whereas the child process completed): asyncio now copies the child process exit status to the internal Popen object.</p> <p>I also fixed the POSIX implementation of subprocess.Popen._execute_child(): it now sets the returncode attribute from the child process exit status when exec failed.</p> </div> <div class="section" id="security-fix-potential-shell-injections-in-ctypes-util"> <h2>Security: fix potential shell injections in ctypes.util</h2> <p>I rewrote methods of the ctypes.util module using <tt class="docutils literal">os.popen()</tt>. I replaced <tt class="docutils literal">os.popen()</tt> with <tt class="docutils literal">subprocess.Popen</tt> without shell (issue #22636) to fix a class of security vulneratiblity, &quot;shell injection&quot; (inject arbitrary shell commands to take the control of a computer).</p> <p>The <tt class="docutils literal">os.popen()</tt> function uses a shell, so there is a risk if the command line arguments are not properly escaped for shell. Using <tt class="docutils literal">subproces.Popen</tt> without shell fixes completely the risk.</p> <p>Note: the <tt class="docutils literal">ctypes</tt> is generally not considered as &quot;safe&quot;, but it doesn't harm to make it more secure ;-)</p> </div> <div class="section" id="optimization-pymem-malloc-now-uses-pymalloc"> <h2>Optimization: PyMem_Malloc() now uses pymalloc</h2> <p>PyMem_Malloc() now uses the fast Python &quot;pymalloc&quot; memory allocator which is optimized for small objects with a short lifetime (issue #26249). The change makes some benchmarks up to 4% faster.</p> <p>This change was possible thanks to the whole preparation work I did in the 2016 Q1, especially the new GIL check in memory allocator debug hooks and the new <tt class="docutils literal">PYTHONMALLOC=debug</tt> environment variable enabling these hooks on a Python compiled in released mode.</p> <p>I tested lxml, Pillow, cryptography and numpy before pushing the change, as asked by Marc-Andre Lemburg. All these projects work with the change, except of numpy. I wrote a fix for numpy: <a class="reference external" href="https://github.com/numpy/numpy/pull/7404">Use PyMem_RawMalloc on Python 3.4 and newer</a>, merged one month later (my first contribution to numy!).</p> <p>The change indirectly helped to identify and fix a memory leak in the <tt class="docutils literal">formatfloat()</tt> function used to format bytes strings: <tt class="docutils literal"><span class="pre">b&quot;%f&quot;</span> % 1.2</tt> (issue #25349, #26249).</p> </div> <div class="section" id="optimization"> <h2>Optimization</h2> <p>Issue #27056: Optimize pickle.load() and pickle.loads(), up to 10% faster to deserialize a lot of small objects. I found this optimization using Linux perf on Python compiled with PGO. My change implements manually the optimization if Python is not compiled with PGO.</p> <p>Issue #26770: When <tt class="docutils literal">set_inheritable()</tt> is implemented with <tt class="docutils literal">fcntl()</tt>, don't call <tt class="docutils literal">fcntl()</tt> twice if the <tt class="docutils literal">FD_CLOEXEC</tt> flag is already set to the requested value. Linux uses <tt class="docutils literal">ioctl()</tt> and so always only need a single syscall.</p> </div> <div class="section" id="changes"> <h2>Changes</h2> <ul> <li><p class="first">Issue #26716: Replace IOError with OSError in fcntl documentation, IOError is a deprecated alias to OSError since Python 3.3.</p> </li> <li><p class="first">Issue #26639: Replace the deprecated <tt class="docutils literal">imp</tt> module with the <tt class="docutils literal">importlib</tt> module in <tt class="docutils literal">Tools/i18n/pygettext.py</tt>. Remove <tt class="docutils literal">_get_modpkg_path()</tt>, replaced with <tt class="docutils literal">importlib.util.find_spec()</tt>.</p> </li> <li><p class="first">Issue #26735: Fix os.urandom() on Solaris 11.3 and newer when reading more than 1024 bytes: call getrandom() multiple times with a limit of 1024 bytes per call.</p> </li> <li><p class="first">configure: fix <tt class="docutils literal">HAVE_GETRANDOM_SYSCALL</tt> check, syscall() function requires <tt class="docutils literal">#include &lt;unistd.h&gt;</tt>.</p> </li> <li><p class="first">Issue #26766: Fix _PyBytesWriter_Finish(). Return a bytearray object when bytearray is requested and when the small buffer is used. Fix also test_bytes: bytearray%args must return a bytearray type.</p> </li> <li><p class="first">Issue #26777: Fix random failure of test_asyncio.test_timeout_disable() on the &quot;AMD64 FreeBSD 9.x 3.5&quot; buildbot:</p> <pre class="literal-block"> File &quot;.../Lib/test/test_asyncio/test_tasks.py&quot;, line 2398, in go self.assertTrue(0.09 &lt; dt &lt; 0.11, dt) AssertionError: False is not true : 0.11902812402695417 </pre> <p>Replace <tt class="docutils literal">&lt; 0.11</tt> with <tt class="docutils literal">&lt; 0.15</tt>.</p> </li> <li><p class="first">Backport test_gdb fix for s390x buildbots to Python 3.5.</p> </li> <li><p class="first">Cleanup import.c: replace <tt class="docutils literal">PyUnicode_RPartition()</tt> with <tt class="docutils literal">PyUnicode_FindChar()</tt> and <tt class="docutils literal">PyUnicode_Substring()</tt> to avoid the creation of a temporary tuple. Use <tt class="docutils literal">PyUnicode_FromFormat()</tt> to build a string and avoid the single_dot ('.') singleton.</p> </li> <li><p class="first">regrtest now uses subprocesses when the <tt class="docutils literal"><span class="pre">-j1</span></tt> command line option is used: each test file runs in a fresh child process. Before, the -j1 option was ignored. <tt class="docutils literal">Tools/buildbot/test.bat</tt> script now uses -j1 by default to run each test file in fresh child process.</p> </li> <li><p class="first">regrtest: display test result (passed, failed, ...) after each test completion. In multiprocessing mode: always display the result. In sequential mode: only display the result if the test did not pass</p> </li> <li><p class="first">Issue #27278: Fix <tt class="docutils literal">os.urandom()</tt> implementation using <tt class="docutils literal">getrandom()</tt> on Linux. Truncate size to <tt class="docutils literal">INT_MAX</tt> and loop until we collected enough random bytes, instead of casting a directly <tt class="docutils literal">Py_ssize_t</tt> to <tt class="docutils literal">int</tt>.</p> </li> </ul> </div> <div class="section" id="contributions"> <h2>Contributions</h2> <p>I also pushed a few changes written by other contributors.</p> <p>Issue #26839: <tt class="docutils literal">os.urandom()</tt> doesn't block on Linux anymore. On Linux, <tt class="docutils literal">os.urandom()</tt> now calls getrandom() with <tt class="docutils literal">GRND_NONBLOCK</tt> to fall back on reading <tt class="docutils literal">/dev/urandom</tt> if the urandom entropy pool is not initialized yet. Patch written by <strong>Colm Buckley</strong>. This issue started a huge annoying discussion around random number generation on the bug tracker and the python-dev mailing list. I later wrote the <a class="reference external" href="https://www.python.org/dev/peps/pep-0524/">PEP 524: Make os.urandom() blocking on Linux</a> to fix the issue!</p> <p>Other changes:</p> <ul class="simple"> <li>Issue #26647: Cleanup opcode: simplify code to build <tt class="docutils literal">opcode.opname</tt>. Patch written by <strong>Demur Rumed</strong>.</li> <li>Issue #26647: Cleanup modulefinder: use <tt class="docutils literal">dis.opmap[name]</tt> rather than <tt class="docutils literal">dis.opname.index(name)</tt>. Patch written by <strong>Demur Rumed</strong>.</li> <li>Issue #26801: Fix error handling in <tt class="docutils literal">shutil.get_terminal_size()</tt>: catch AttributeError instead of NameError. Skip the functional test of test_shutil using the <tt class="docutils literal">stty size</tt> command if the <tt class="docutils literal">os.get_terminal_size()</tt> function is missing. Patch written by <strong>Emanuel Barry</strong>.</li> <li>Issue #26802: Optimize function calls only using unpacking like <tt class="docutils literal"><span class="pre">func(*tuple)</span></tt> (no other positional argument, no keyword argument): avoid copying the tuple. Patch written by <strong>Joe Jevnik</strong>.</li> <li>Issue #21668: Add missing libm dependency in setup.py: link audioop, _datetime, _ctypes_test modules to libm, except on Mac OS X. Patch written by <strong>Chi Hsuan Yen</strong>.</li> <li>Issue #26799: Fix python-gdb.py: don't get C types at startup, only on demand. The C types can change if python-gdb.py is loaded before loading the Python executable in gdb. Patch written by <strong>Thomas Ilsche</strong>.</li> <li>Issue #27057: Fix os.set_inheritable() on Android, ioctl() is blocked by SELinux and fails with EACCESS. The function now falls back to fcntl(). Patch written by <strong>Michał Bednarski</strong>.</li> <li>Issue #26647: Fix typo in test_grammar. Patch written by <strong>Demur Rumed</strong>.</li> </ul> </div> My contributions to CPython during 2016 Q12017-02-09T17:00:00+01:002017-02-09T17:00:00+01:00Victor Stinnertag:vstinner.github.io,2017-02-09:/contrib-cpython-2016q1.html<p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q1 (january, februrary, march):</p> <pre class="literal-block"> hg log -r 'date(&quot;2016-01-01&quot;):date(&quot;2016-03-31&quot;)' --no-merges -u Stinner </pre> <p>Statistics: 196 non-merge commits + 33 merge commits (total: 229 commits).</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2015q4.html">My contributions to CPython during 2015 Q4</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q2.html">My contributions to CPython during 2016 Q2</a>.</p> <div class="section" id="summary"> <h2>Summary</h2> <p>Since …</p></div><p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2016 Q1 (january, februrary, march):</p> <pre class="literal-block"> hg log -r 'date(&quot;2016-01-01&quot;):date(&quot;2016-03-31&quot;)' --no-merges -u Stinner </pre> <p>Statistics: 196 non-merge commits + 33 merge commits (total: 229 commits).</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2015q4.html">My contributions to CPython during 2015 Q4</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q2.html">My contributions to CPython during 2016 Q2</a>.</p> <div class="section" id="summary"> <h2>Summary</h2> <p>Since this report is much longer than I expected, here are the highlights:</p> <ul class="simple"> <li>Python 8: no pep8, no chocolate!</li> <li>AST enhancements coming from FAT Python</li> <li>faulthandler now catchs Windows fatal exceptions</li> <li>New PYTHONMALLOC environment variable</li> <li>tracemalloc: new C API and support multiple address spaces</li> <li>ResourceWarning warnings now come with a traceback</li> <li>PyMem_Malloc() now fails if the GIL is not held</li> <li>Interesting bug: reentrant flag in tracemalloc</li> </ul> </div> <div class="section" id="python-8-no-pep8-no-chocolate"> <h2>Python 8: no pep8, no chocolate!</h2> <p>I prepared an April Fool: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-March/143603.html">[Python-Dev] The next major Python version will be Python 8</a> :-)</p> <p>I increased Python version to 8, added the <tt class="docutils literal">pep8</tt> module and modified <tt class="docutils literal">importlib</tt> to raise an <tt class="docutils literal">ImportError</tt> if a module is not PEP8-compliant!</p> </div> <div class="section" id="ast-enhancements-coming-from-fat-python"> <h2>AST enhancements coming from FAT Python</h2> <p>Changes coming from my <a class="reference external" href="http://faster-cpython.readthedocs.io/fat_python.html">FAT Python</a> (AST optimizer, run ahead of time):</p> <p>The compiler now ignores constant statements like <tt class="docutils literal">b'bytes'</tt> (issue #26204). I had to replace constant statement with expressions to prepare the change (ex: replace <tt class="docutils literal">b'bytes'</tt> with <tt class="docutils literal">x = b'bytes'</tt>). First, the compiler emited a <tt class="docutils literal">SyntaxWarning</tt>, but it was quickly decided to let linters to emit such warnings to not annoy users: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-February/143163.html">read the thread on python-dev</a>.</p> <p>Example, Python 3.5:</p> <pre class="literal-block"> &gt;&gt;&gt; def f(): ... b'bytes' ... &gt;&gt;&gt; import dis; dis.dis(f) 2 0 LOAD_CONST 1 (b'bytes') 3 POP_TOP 4 LOAD_CONST 0 (None) 7 RETURN_VALUE </pre> <p>Python 3.6:</p> <pre class="literal-block"> &gt;&gt;&gt; def f(): ... b'bytes' ... &gt;&gt;&gt; import dis; dis.dis(f) 1 0 LOAD_CONST 0 (None) 2 RETURN_VALUE </pre> <p>Other changes:</p> <ul class="simple"> <li>Issue #26107: The format of the co_lnotab attribute of code objects changes to support negative line number delta. It allows AST optimizers to move instructions without breaking Python tracebacks. Change needed by the loop unrolling optimization of FAT Python.</li> <li>Issue #26146: Add a new kind of AST node: <tt class="docutils literal">ast.Constant</tt>. It can be used by external AST optimizers like FAT Python, but the compiler does not emit directly such node. Update code to accept ast.Constant instead of ast.Num and/or ast.Str.</li> <li>Issue #26146: <tt class="docutils literal">marshal.loads()</tt> now uses the empty frozenset singleton. It fixes a test failure in FAT Python and reduces the memory footprint.</li> </ul> </div> <div class="section" id="faulthandler-now-catchs-windows-fatal-exceptions"> <h2>faulthandler now catchs Windows fatal exceptions</h2> <p>I enhanced the faulthandler.enable() function on Windows to set a handler for Windows fatal exceptions using <tt class="docutils literal">AddVectoredExceptionHandler()</tt> (issue #23848).</p> <p>Windows exceptions are the native way to handle fatal errors on Windows, whereas UNIX signals SIGSEGV, SIGFPE and SIGABRT are &quot;emulated&quot; on top of that.</p> </div> <div class="section" id="new-pythonmalloc-environment-variable"> <h2>New PYTHONMALLOC environment variable</h2> <p>I added a new <tt class="docutils literal">PYTHONMALLOC</tt> environment variable (issue #26516) to set the Python memory allocators.</p> <p><tt class="docutils literal">PYTHONMALLOC=debug</tt> enables debug hooks on a Python compiled in release mode, whereas Python 3.5 requires to recompile Python in debug mode. These hooks implements various checks:</p> <ul class="simple"> <li>Detect <strong>buffer underflow</strong>: write before the start of the buffer</li> <li>Detect <strong>buffer overflow</strong>: write after the end of the buffer</li> <li>Detect API violations, ex: <tt class="docutils literal">PyObject_Free()</tt> called on a buffer allocated by <tt class="docutils literal">PyMem_Malloc()</tt></li> <li>Check if the GIL is held when allocator functions of PYMEM_DOMAIN_OBJ (ex: <tt class="docutils literal">PyObject_Malloc()</tt>) and PYMEM_DOMAIN_MEM (ex: <tt class="docutils literal">PyMem_Malloc()</tt>) domains are called</li> </ul> <p>Moreover, logging a fatal memory error now uses the tracemalloc module to get the traceback where a memory block was allocated. Example of a buffer overflow using <tt class="docutils literal">python3.6 <span class="pre">-X</span> tracemalloc=5</tt> (store 5 frames in traces):</p> <pre class="literal-block"> Debug memory block at address p=0x7fbcd41666f8: API 'o' 4 bytes originally requested The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected. The 8 pad bytes at tail=0x7fbcd41666fc are not all FORBIDDENBYTE (0xfb): at tail+0: 0x02 *** OUCH at tail+1: 0xfb at tail+2: 0xfb ... The block was made by call #1233329 to debug malloc/realloc. Data at p: 1a 2b 30 00 Memory block allocated at (most recent call first): File &quot;test/test_bytes.py&quot;, line 323 File &quot;unittest/case.py&quot;, line 600 ... Fatal Python error: bad trailing pad byte Current thread 0x00007fbcdbd32700 (most recent call first): File &quot;test/test_bytes.py&quot;, line 323 in test_hex File &quot;unittest/case.py&quot;, line 600 in run ... </pre> <p><tt class="docutils literal">PYTHONMALLOC=malloc</tt> forces the usage of the system <tt class="docutils literal">malloc()</tt> allocator. This option can be used with Valgrind. Without this option, Valgrind emits tons of false alarms in the Python <tt class="docutils literal">pymalloc</tt> memory allocator.</p> </div> <div class="section" id="tracemalloc-new-c-api-and-support-multiple-address-spaces"> <h2>tracemalloc: new C API and support multiple address spaces</h2> <p>Antoine Pitrou and Nathaniel Smith asked me to enhance the tracemalloc module:</p> <ul class="simple"> <li>Add a C API to be able to manually track/untrack memory blocks, to track the memory allocated by custom memory allocators. For example, numpy uses allocators with a specific memory alignment for SIMD instructions.</li> <li>Support tracking memory of different address spaces. For example, central (CPU) memory and GPU memory for numpy.</li> </ul> <div class="section" id="support-multiple-address-spaces"> <h3>Support multiple address spaces</h3> <p>I made deep changes in the <tt class="docutils literal">hashtable.c</tt> code (simple C implementation of an hash table used by <tt class="docutils literal">_tracemalloc</tt>) to support keys of a variable size (issue #26588), instead of using an hardcoded <tt class="docutils literal">void *</tt> size. It allows to support keys larger than <tt class="docutils literal">sizeof(void*)</tt>, but also to use <em>less</em> memory for keys smaller than <tt class="docutils literal">sizeof(void*)</tt> (ex: <tt class="docutils literal">int</tt> keys).</p> <p>Then I extended the C <tt class="docutils literal">_tracemalloc</tt> module and the Python <tt class="docutils literal">tracemalloc</tt> to add a new <tt class="docutils literal">domain</tt> attribute to traces: add <tt class="docutils literal">Trace.domain</tt> attribute and <tt class="docutils literal">tracemalloc.DomainFilter</tt> class.</p> <p>The final step was to optimize the memory footprint of _tracemalloc. Start with compact keys (<tt class="docutils literal">Py_uintptr_t</tt> type) and only switch to <tt class="docutils literal">pointer_t</tt> keys when the first memory block with a non-zero domain is tracked (when one more one address space is used). So the <tt class="docutils literal">_tracemalloc</tt> memory usage doesn't change by default in Python 3.6!</p> </div> <div class="section" id="c-api"> <h3>C API</h3> <p>I added a private C API (issue #26530):</p> <pre class="literal-block"> int _PyTraceMalloc_Track(_PyTraceMalloc_domain_t domain, Py_uintptr_t ptr, size_t size); int _PyTraceMalloc_Untrack(_PyTraceMalloc_domain_t domain, Py_uintptr_t ptr); </pre> <p>I waited for Antoine and Nathaniel feedback on this API, but the API remains private in Python 3.6 since none reviewed it.</p> </div> </div> <div class="section" id="resourcewarning-warnings-now-come-with-a-traceback"> <h2>ResourceWarning warnings now come with a traceback</h2> <div class="section" id="final-result"> <h3>Final result</h3> <p>Before going to explain the long development of the feature, let's see an example of the final result! Example with the script <tt class="docutils literal">example.py</tt>:</p> <pre class="literal-block"> import warnings def func(): return open(__file__) f = func() f = None </pre> <p>Output of the command <tt class="docutils literal">python3.6 <span class="pre">-Wd</span> <span class="pre">-X</span> tracemalloc=5 example.py</tt>:</p> <pre class="literal-block"> example.py:7: ResourceWarning: unclosed file &lt;_io.TextIOWrapper name='example.py' mode='r' encoding='UTF-8'&gt; f = None Object allocated at (most recent call first): File &quot;example.py&quot;, lineno 4 return open(__file__) File &quot;example.py&quot;, lineno 6 f = func() </pre> <p>The <tt class="docutils literal">Object allocated at <span class="pre">(...)</span></tt> part is the new feature ;-)</p> </div> <div class="section" id="add-source-parameter-to-warnings"> <h3>Add source parameter to warnings</h3> <p>Python 3 logs <tt class="docutils literal">ResourceWarning</tt> warnings when a resource is not closed properly to help developers to handle resources correctly. The problem is that the warning is only logged when the object is destroy, which can occur far from the object creation and can occur on a line unrelated to the object because of the garbage collector.</p> <p>I added a new <tt class="docutils literal">tracemalloc</tt> module to Python 3.4 which has an interesting <tt class="docutils literal">tracemalloc.get_object_traceback()</tt> function. If tracemalloc traced the allocation of an object, it is able to provide later the traceback where the object was allocated.</p> <p>I wanted to modify the <tt class="docutils literal">warnings</tt> module to call <tt class="docutils literal">get_object_traceback()</tt>, but I noticed that it wasn't possible to easily extend the <tt class="docutils literal">warnings</tt> API because this module allows to override <tt class="docutils literal">showwarning()</tt> and <tt class="docutils literal">formatwarning()</tt> functions and these functions have a fixed number of parameters. Example:</p> <pre class="literal-block"> def showwarning(message, category, filename, lineno, file=None, line=None): ... </pre> <p>With the issue #26568, I added new <tt class="docutils literal">_showwarnmsg()</tt> and <tt class="docutils literal">_formatwarnmsg()</tt> functions to the warnings module which get a <tt class="docutils literal">warnings.WarningMessage</tt> object instead of a list of parameters:</p> <pre class="literal-block"> def _showwarnmsg(msg): ... </pre> <p>I added a <tt class="docutils literal">source</tt> attribute to <tt class="docutils literal">warnings.WarningMessage</tt> (issue #26567) and a new optional <tt class="docutils literal">source</tt> parameter to <tt class="docutils literal">warnings.warn()</tt> (issue #26604): the leaked resource object. I modified <tt class="docutils literal">_formatwarnmsg()</tt> to log the traceback where resource was allocated, if available.</p> <p>The tricky part was to fix corner cases when the following functions of the <tt class="docutils literal">warnings</tt> module are overriden:</p> <ul class="simple"> <li><tt class="docutils literal">formatwarning()</tt>, <tt class="docutils literal">showwarning()</tt></li> <li><tt class="docutils literal">_formatwarnmsg()</tt>, <tt class="docutils literal">_showwarnmsg()</tt></li> </ul> </div> <div class="section" id="set-the-source-parameter"> <h3>Set the source parameter</h3> <p>I started to modify modules to set the source parameter when logging <tt class="docutils literal">ResourceWarning</tt> warnings.</p> <p>The easy part was to modify <tt class="docutils literal">asyncore</tt>, <tt class="docutils literal">asyncio</tt> and <tt class="docutils literal">_pyio</tt> modules to set the <tt class="docutils literal">source</tt> parameter. These modules are implemented in Python, the change was just to add <tt class="docutils literal">source=self</tt>. Example of <tt class="docutils literal">asyncio</tt> destructor:</p> <pre class="literal-block"> def __del__(self): if not self.is_closed(): warnings.warn(&quot;unclosed event loop %r&quot; % self, ResourceWarning, source=self) if not self.is_running(): self.close() </pre> <p>Note: The warning is logged before the resource is closed to provide more information in <tt class="docutils literal">repr()</tt>. Many objects clear most information in their <tt class="docutils literal">close()</tt> method.</p> <p>Modifying C modules was more tricky than expected. I had to implement &quot;finalizers&quot; (<a class="reference external" href="https://www.python.org/dev/peps/pep-0442/">PEP 432: Safe object finalization</a>) for the <tt class="docutils literal">_socket.socket</tt> type (issue #26590) and for the <tt class="docutils literal">os.scandir()</tt> iterator (issue #26603).</p> </div> <div class="section" id="more-reliable-warnings"> <h3>More reliable warnings</h3> <p>The Python shutdown process is complex, and some Python functions are broken during the shutdown. I enhanced the warnings module to handle nicely these failures and try to log warnings anyway.</p> <p>I modified <tt class="docutils literal">warnings.formatwarning()</tt> to catch <tt class="docutils literal">linecache.getline()</tt> failures on formatting the traceback.</p> <p>Logging the resource traceback is complex, so I only implemented it in Python. Python tries to use the Python <tt class="docutils literal">warnings</tt> module if it was imported, or falls back on the C <tt class="docutils literal">_warnings</tt> module. To get the resource traceback at Python shutdown, I modified the C module to try to import the Python warning: <tt class="docutils literal">_warnings.warn_explicit()</tt> now tries to import the Python warnings module if the source parameter is set to be able to log the traceback where the source was allocated (issue #26592).</p> </div> <div class="section" id="fix-resourcewarning-warnings"> <h3>Fix ResourceWarning warnings</h3> <p>Since it became easy to debug these warnings, I fixed some of them in the Python test suite:</p> <ul class="simple"> <li>Issue #26620: Fix ResourceWarning in test_urllib2_localnet. Use context manager on urllib objects and use self.addCleanup() to cleanup resources even if a test is interrupted with CTRL+c</li> <li>Issue #25654: multiprocessing: open file with <tt class="docutils literal">closefd=False</tt> to avoid ResourceWarning. _test_multiprocessing: open file with <tt class="docutils literal">O_EXCL</tt> to detect bugs in tests (if a previous test forgot to remove TESTFN). <tt class="docutils literal">test_sys_exit()</tt>: remove TESTFN after each loop iteration</li> <li>Fix <tt class="docutils literal">ResourceWarning</tt> in test_unittest when interrupted</li> </ul> </div> </div> <div class="section" id="pymem-malloc-now-fails-if-the-gil-is-not-held"> <h2>PyMem_Malloc() now fails if the GIL is not held</h2> <p>Since using the mall object allocator (<tt class="docutils literal">pymalloc)</tt>) for dictionary key storage showed speedup for the dict type (issue #23601), I proposed to generalize the change, use <tt class="docutils literal">pymalloc</tt> for <tt class="docutils literal">PyMem_Malloc()</tt>: <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2016-February/143084.html">[Python-Dev] Modify PyMem_Malloc to use pymalloc for performance</a>.</p> <p>The main issue was that the change means that <tt class="docutils literal">PyMem_Malloc()</tt> now requires to hold the GIL, whereas it didn't before since it called directly <tt class="docutils literal">malloc()</tt>.</p> <div class="section" id="check-if-the-gil-is-held"> <h3>Check if the GIL is held</h3> <p>CPython has a <tt class="docutils literal">PyGILState_Check()</tt> function to check if the GIL is held. Problem: the function doesn't work with subinterpreters: see issues #10915 and #15751.</p> <p>I added an internal flag to <tt class="docutils literal">PyGILState_Check()</tt> (issue #26558) to skip the test. The flag value is false at startup, set to true once the GIL is fully initialized (Python initialization), set to false again when the GIL is destroyed (Python finalization). The flag is also set to false when the first subinterpreter is created.</p> <p>This hack works around <tt class="docutils literal">PyGILState_Check()</tt> limitations allowing to call <cite>PyGILState_Check()`</cite> anytime to debug more bugs earlier.</p> <p><tt class="docutils literal">_Py_dup()</tt>, <tt class="docutils literal">_Py_fstat()</tt>, <tt class="docutils literal">_Py_read()</tt> and <tt class="docutils literal">_Py_write()</tt> are low-level helper functions for system functions, but these functions require the GIL to be held. Thanks to the <tt class="docutils literal">PyGILState_Check()</tt> enhancement, it became possible to check the GIL using an assertion.</p> </div> <div class="section" id="pymem-malloc-and-gil"> <h3>PyMem_Malloc() and GIL</h3> <p>Issue #26563: Debug hooks on Python memory allocators now raise a fatal error if memory allocator functions like PyMem_Malloc() and PyMem_Malloc() are called without holding the GIL.</p> <p>The change spotted two bugs which I fixed:</p> <ul class="simple"> <li>Issue #26563: Replace PyMem_Malloc() with PyMem_RawMalloc() in the Windows implementation of os.stat(), the code is called without holding the GIL.</li> <li>Issue #26563: Fix usage of PyMem_Malloc() in overlapped.c. Replace PyMem_Malloc() with PyMem_RawFree() since PostToQueueCallback() calls PyMem_Free() in a new C thread which doesn't hold the GIL.</li> </ul> <p>I wasn't able to switch <tt class="docutils literal">PyMem_Malloc()</tt> to <tt class="docutils literal">pymalloc</tt> in this quarter, since it took more a lot of time to implement requested checks and test third party modules.</p> </div> <div class="section" id="fatal-error-and-faulthandler"> <h3>Fatal error and faulthandler</h3> <p>I enhanced the faulthandler module to work in non-Python threads (issue #26563). I fixed <tt class="docutils literal">Py_FatalError()</tt> if called without holding the GIL: don't try to print the current exception, nor try to flush stdout and stderr: only dump the traceback of Python threads.</p> </div> </div> <div class="section" id="interesting-bug-reentrant-flag-in-tracemalloc"> <h2>Interesting bug: reentrant flag in tracemalloc</h2> <p>A bug annoyed me a lot: a random assertion error related to a reentrant flag in the _tracemalloc module.</p> <p>Story starting in the <a class="reference external" href="http://bugs.python.org/issue26588#msg262125">middle of the issue #26588 (2016-03-21)</a>. While working on issue #26588, &quot;_tracemalloc: add support for multiple address spaces (domains)&quot;, I noticed an assertion failure in set_reentrant(), a helper function to set a <em>Thread Local Storage</em> (TLS), on a buildbot:</p> <pre class="literal-block"> python: ./Modules/_tracemalloc.c:195: set_reentrant: Assertion `PyThread_get_key_value(tracemalloc_reentrant_key) == ((PyObject *) &amp;_Py_TrueStruct)' failed. </pre> <p>I was unable to reproduce the bug on my Fedora 23 (AMD64). After changes on my patch, I pushed it the day after, but the assertion failed again. I added assertions and debug informations. More failures, an interesting one on Windows which uses a single process.</p> <p>I added an assertion in tracemalloc_init() to ensure that the reeentrant flag is set at the end of the function. The reentrant flag was no more set at tracemalloc_start() entry for an unknown reason. I changed the module initialization to no call tracemalloc_init() anymore, it's only called on tracemalloc.start().</p> <p>&quot;The bug was seen on 5 buildbots yet: PPC Fedora, AMD64 Debian, s390x RHEL, AMD64 Windows, x86 Ubuntu.&quot;</p> <p>I finally understood and fixed the bug with the <a class="reference external" href="https://hg.python.org/cpython/rev/af1c1149784a">change af1c1149784a</a>: tracemalloc_start() and tracemalloc_stop() don't clear/set the reentrant flag anymore.</p> <p>The problem was that I expected that tracemalloc_init() and tracemalloc_start() functions would always be called in the same thread, whereas it occurred that tracemalloc_init() was called in thread A when the tracemalloc module is imported, whereas tracemalloc_start() was called in thread B.</p> </div> <div class="section" id="other-commits"> <h2>Other commits</h2> <div class="section" id="enhancements"> <h3>Enhancements</h3> <p>The developers of the <tt class="docutils literal">vmprof</tt> profiler asked me to expose the atomic variable <tt class="docutils literal">_PyThreadState_Current</tt>. The private variable was removed from Python 3.5.1 API because the implementation of atomic variables depends on the compiler, compiler options, etc. and so caused compilation issues. I added a new private <tt class="docutils literal">_PyThreadState_UncheckedGet()</tt> function (issue #26154) which gets the value of the variable without exposing its implementation.</p> <p>Other enhancements:</p> <ul class="simple"> <li>Issue #26099: The site module now writes an error into stderr if sitecustomize module can be imported but executing the module raise an ImportError. Same change for usercustomize.</li> <li>Issue #26516: Enhance Python memory allocators documentation. Add link to PYTHONMALLOCSTATS environment variable. Add parameters to PyMem macros like PyMem_MALLOC().</li> <li>Issue #26569: Fix pyclbr.readmodule() and pyclbr.readmodule_ex() to support importing packages.</li> <li>Issue #26564, #26516, #26563: Enhance documentation on memory allocator debug hooks.</li> <li>doctest now supports packages. Issue #26641: doctest.DocFileTest and doctest.testfile() now support packages (module splitted into multiple directories) for the package parameter.</li> </ul> </div> <div class="section" id="bugfixes"> <h3>Bugfixes</h3> <p>Issue #25843: When compiling code, don't merge constants if they are equal but have a different types. For example, <tt class="docutils literal">f1, f2 = lambda: 1, lambda: 1.0</tt> is now correctly compiled to two different functions: <tt class="docutils literal">f1()</tt> returns <tt class="docutils literal">1</tt> (int) and <tt class="docutils literal">f2()</tt> returns <tt class="docutils literal">1.0</tt> (int), even if 1 and 1.0 are equal.</p> <p>Other fixes:</p> <ul class="simple"> <li>Issue #26101: Fix test_compilepath() of test_compileall. Exclude Lib/test/ from sys.path in test_compilepath(). The directory contains invalid Python files like Lib/test/badsyntax_pep3120.py, whereas the test ensures that all files can be compiled.</li> <li>Issue #24520: Replace fpgetmask() with fedisableexcept(). On FreeBSD, fpgetmask() was deprecated long time ago. fedisableexcept() is now preferred.</li> <li>Issue #26161: Use Py_uintptr_t instead of void* for atomic pointers in pyatomic.h. Use atomic_uintptr_t when &lt;stdatomic.h&gt; is used. Using void* causes compilation warnings depending on which implementation of atomic types is used.</li> <li>Issue #26637: The importlib module now emits an ImportError rather than a TypeError if __import__() is tried during the Python shutdown process but sys.path is already cleared (set to None).</li> <li>doctest: fix _module_relative_path() error message. Write the module name rather than &lt;module&gt; in the error message, if module has no __file__ attribute (ex: package).</li> </ul> </div> <div class="section" id="fix-type-downcasts-on-windows-64-bit"> <h3>Fix type downcasts on Windows 64-bit</h3> <p>In my spare time, I'm trying to fix a few compiler warnings on Windows 64-bit where the C <tt class="docutils literal">long</tt> type is only 32-bit, whereas pointers are <tt class="docutils literal"><span class="pre">64-bit</span></tt> long:</p> <ul class="simple"> <li>posix_getcwd(): limit to INT_MAX on Windows. It's more to fix a compiler warning during compilation, I don't think that Windows support current working directories larger than 2 GB :-)</li> <li>_pickle: Fix load_counted_tuple(), use Py_ssize_t for size. Fix a warning on Windows 64-bit.</li> <li>getpathp.c: fix compiler warning, wcsnlen_s() result type is size_t.</li> <li>compiler.c: fix compiler warnings on Windows</li> <li>_msi.c: try to fix compiler warnings</li> <li>longobject.c: fix compilation warning on Windows 64-bit. We know that Py_SIZE(b) is -1 or 1 an so fits into the sdigit type.</li> <li>On Windows, socket.setsockopt() now raises an OverflowError if the socket option is larger than INT_MAX bytes.</li> </ul> </div> <div class="section" id="unicode-bugfixes"> <h3>Unicode bugfixes</h3> <ul class="simple"> <li>Issue #26227: On Windows, getnameinfo(), gethostbyaddr() and gethostbyname_ex() functions of the socket module now decode the hostname from the ANSI code page rather than UTF-8.</li> <li>Issue #26217: Unicode resize_compact() must set wstr_length to 0 after freeing the wstr string. Otherwise, an assertion fails in _PyUnicode_CheckConsistency().</li> <li>Issue #26464: Fix str.translate() when string is ASCII and first replacements removes characters, but next replacements use a non-ASCII character or a string longer than 1 character. Regression introduced in Python 3.5.0.</li> </ul> </div> <div class="section" id="buildbot-tests"> <h3>Buildbot, tests</h3> <p>Just to give you an idea of the work required to keep a working CI, here is the list of changes I maded in a single quarter to make tests and Python buildbots more reliable.</p> <ul class="simple"> <li>Issue #26610: Skip test_venv.test_with_pip() if ctypes miss</li> <li>test_asyncio: fix test_timeout_time(). Accept time delta up to 0.12 second, instead of 0.11, for the &quot;AMD64 FreeBSD 9.x&quot; buildbot slave.</li> <li>Issue #13305: Always test datetime.datetime.strftime(&quot;%4Y&quot;) for years &lt; 1900. Change quickly reverted, strftime(&quot;%4Y&quot;) fails on most platforms.</li> <li>Issue #17758: Skip test_site if site.USER_SITE directory doesn't exist and cannot be created.</li> <li>Fix test_venv on FreeBSD buildbot. Ignore pip warning in test_venv.test_with_venv().</li> <li>Issue #26566: Rewrite test_signal.InterProcessSignalTests. Don't use os.fork() with a subprocess to not inherit existing signal handlers or threads: start from a fresh process. Use a timeout of 10 seconds to wait for the signal instead of 1 second</li> <li>Issue #26538: regrtest: Fix module.__path__. libregrtest: Fix setup_tests() to keep module.__path__ type (_NamespacePath), don't convert to a list. Add _NamespacePath.__setitem__() method to importlib._bootstrap_external.</li> <li>regrtest: add time to output. Timestamps should help to debug slow buildbots, and timeout and hang on buildbots.</li> <li>regrtest: add timeout to main process when using -jN. libregrtest: add a watchdog to run_tests_multiprocess() using faulthandler.dump_traceback_later().</li> <li>Makefile: change default value of TESTTIMEOUT from 1 hour to 15 min. The whole test suite takes 6 minutes on my laptop. It takes less than 30 minutes on most buildbots. The TESTTIMEOUT is the timeout for a single test file.</li> <li>Buildbots: change also Windows timeout from 1 hour to 15 min</li> <li>regrtest: display test duration in sequential mode. Only display duration if a test takes more than 30 seconds.</li> <li>Issue #18787: Try to fix test_spwd on OpenIndiana. Try to get the &quot;root&quot; entry which should exist on all UNIX instead of &quot;bin&quot; which doesn't exist on OpenIndiana.</li> <li>regrtest: fix --fromfile feature. Update code for the name regrtest output format. Enhance also test_regrtest test on --fromfile</li> <li>regrtest: mention if tests run sequentially or in parallel</li> <li>regrtest: when parallel tests are interrupted, display progress</li> <li>support.temp_dir(): call support.rmtree() instead of shutil.rmtree(). Try harder to remove directories on Windows.</li> <li>rt.bat: use -m test instead of Libtestregrtest.py</li> <li>Refactor regrtest.</li> <li>Fix test_warnings.test_improper_option(). test_warnings: only run test_improper_option() and test_warnings_bootstrap() once. The unit test doesn't depend on self.module.</li> <li>Fix test_os.test_symlink(): remove created symlink.</li> <li>Issue #26643: Add missing shutil resources to regrtest.py</li> <li>test_urllibnet: set timeout on test_fileno(). Use the default timeout of 30 seconds to avoid blocking forever.</li> <li>Issue #26295: When using &quot;python3 -m test --testdir=TESTDIR&quot;, regrtest doesn't add &quot;test.&quot; prefix to test module names. regrtest also prepends testdir to sys.path.</li> <li>Issue #26295: test_regrtest now uses a temporary directory</li> </ul> </div> <div class="section" id="contributions"> <h3>Contributions</h3> <p>I also pushed a few changes written by other contributors:</p> <ul class="simple"> <li>Issue #25907: Use {% trans %} tags in HTML templates to ease the translation of the documentation. The tag comes from Jinja templating system, used by Sphinx. Patch written by <strong>Julien Palard</strong>.</li> <li>Issue #26248: Enhance os.scandir() doc, patch written by Ben Hoyt:</li> <li>Fix error message in asyncio.selector_events. Patch written by <strong>Carlo Beccarini</strong>.</li> <li>Issue #16851: Fix inspect.ismethod() doc, return also True if object is an unbound method. Patch written by <strong>Anna Koroliuk</strong>.</li> <li>Issue #26574: Optimize bytes.replace(b'', b'.') and bytearray.replace(b'', b'.'): up to 80% faster. Patch written by <strong>Josh Snider</strong>.</li> </ul> </div> </div> Analysis of a Python performance issue2016-11-19T00:30:00+01:002016-11-19T00:30:00+01:00Victor Stinnertag:vstinner.github.io,2016-11-19:/analysis-python-performance-issue.html<p>I am working on the CPython benchmark suite (<a class="reference external" href="https://github.com/python/performance">performance</a>) and I run the benchmark suite to upload results to <a class="reference external" href="http://speed.python.org/">speed.python.org</a>. While analying results, I noticed a temporary peak on the <tt class="docutils literal">call_method</tt> benchmark at October 19th:</p> <img alt="call_method microbenchmark" src="https://vstinner.github.io/images/call_method.png" /> <p>The graphic shows the performance of the <tt class="docutils literal">call_method</tt> microbenchmark between Feb 29, 2016 …</p><p>I am working on the CPython benchmark suite (<a class="reference external" href="https://github.com/python/performance">performance</a>) and I run the benchmark suite to upload results to <a class="reference external" href="http://speed.python.org/">speed.python.org</a>. While analying results, I noticed a temporary peak on the <tt class="docutils literal">call_method</tt> benchmark at October 19th:</p> <img alt="call_method microbenchmark" src="https://vstinner.github.io/images/call_method.png" /> <p>The graphic shows the performance of the <tt class="docutils literal">call_method</tt> microbenchmark between Feb 29, 2016 and November 17, 2016 on the <tt class="docutils literal">default</tt> branch of CPython. The average is around 17.2 ms, whereas the peak is at 29.0 ms: <strong>68% slower</strong>!</p> <p>The server has two &quot;Intel(R) Xeon(R) CPU X5680 &#64; 3.33GHz&quot; CPUs, total: 24 logical cores (12 physical cores with HyperThreading). This CPU was launched in 2010 and based on the <a class="reference external" href="https://en.wikipedia.org/wiki/Gulftown">Westmere-EP microarchitecture</a>. Westmere-EP is based on Westmere, which is the 32 nm shrink of the Nehalem microarchitecture.</p> <div class="section" id="reproduce-results"> <h2>Reproduce results</h2> <p>Before going too far, the first step is to validate that results are reproductible: reboot the computer, recompile Python, run again the benchmark.</p> <p>Instead of running the full benchmark suite, install Python, ..., we will run directly the benchmark manually using the Python freshly built in its source code directory.</p> <p>Interesting dots on the graphic (can be seen at speed.python.org, not on the screenshot):</p> <ul class="simple"> <li>678fe178da0d, Oct 09, 17.0 ms: &quot;Fast&quot;</li> <li>1ce50f7027c1, Oct 19, 28.9 ms: &quot;Slow&quot;</li> <li>36af3566b67a, Nov 3, 16.9 ms: Fast again</li> </ul> <p>I use the following directories:</p> <ul class="simple"> <li>~/perf: GitHub haypo/perf project</li> <li>~/performance: GitHub python/performance project</li> <li>~/cpython: Mercurial CPython repository</li> </ul> <p>Tune the system for benchmarks:</p> <pre class="literal-block"> sudo python3 -m perf system tune </pre> <p>Note: all <tt class="docutils literal">system</tt> commands in this article are optional. They help to reduce the operating system jitter (make benchmarks more reliablee).</p> <p>Fast:</p> <pre class="literal-block"> $ hg up -C -r 678fe178da0d $ ./configure --with-lto -C &amp;&amp; make clean &amp;&amp; make $ mv python python-fast $ PYTHONPATH=~/perf ./python-fast ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast call_method: Median +- std dev: 17.0 ms +- 0.1 ms </pre> <p>Slow:</p> <pre class="literal-block"> $ hg up -C -r 1ce50f7027c1 $ ./configure --with-lto -C &amp;&amp; make clean &amp;&amp; make $ mv python python-slow $ PYTHONPATH=~/perf ./python-slow ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast call_method: Median +- std dev: 29.3 ms +- 0.9 ms </pre> <p>We reproduced the significant benchmark result: 17 ms =&gt; 29 ms.</p> <p>I use <tt class="docutils literal">./configure</tt> and <tt class="docutils literal">make clean</tt> instead of incremental compilation, <tt class="docutils literal">make</tt> command, to avoid compilation errors, and to avoid potential side effects only caused by the incremental compilation.</p> </div> <div class="section" id="analysis-with-the-linux-perf-tool"> <h2>Analysis with the Linux perf tool</h2> <p>To collect perf events, we will run the benchmark with <tt class="docutils literal"><span class="pre">--worker</span></tt> to run a single process and with <tt class="docutils literal"><span class="pre">-w0</span> <span class="pre">-n100</span></tt> to run the benchmark long enough: 100 samples means at least 10 seconds (a single sample takes at least 100 ms).</p> <p>First, reset the system configuration to reset the Linux perf configuration:</p> <pre class="literal-block"> sudo python3 -m perf system reset </pre> <p>Note: <tt class="docutils literal">python3 <span class="pre">-m</span> perf system tune</tt> reduces the sampling rate of Linux perf to reduce operating system jitter.</p> </div> <div class="section" id="perf-stat"> <h2>perf stat</h2> <p>Command to get general statistics on the benchmark:</p> <pre class="literal-block"> $ perf stat ./python-slow ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --worker -v -w0 -n100 </pre> <p>&quot;Fast&quot; results:</p> <pre class="literal-block"> Performance counter stats for ./python-fast: 3773.585194 task-clock (msec) # 0.998 CPUs utilized 369 context-switches # 0.098 K/sec 0 cpu-migrations # 0.000 K/sec 8,300 page-faults # 0.002 M/sec 12,981,234,867 cycles # 3.440 GHz [83.27%] 1,460,980,720 stalled-cycles-frontend # 11.25% frontend cycles idle [83.36%] 435,806,788 stalled-cycles-backend # 3.36% backend cycles idle [66.72%] 29,982,530,201 instructions # 2.31 insns per cycle # 0.05 stalled cycles per insn [83.40%] 5,613,631,616 branches # 1487.612 M/sec [83.40%] 16,006,564 branch-misses # 0.29% of all branches [83.27%] 3.780064486 seconds time elapsed </pre> <p>&quot;Slow&quot; results:</p> <pre class="literal-block"> Performance counter stats for ./python-slow: 5906.239860 task-clock (msec) # 0.998 CPUs utilized 556 context-switches # 0.094 K/sec 0 cpu-migrations # 0.000 K/sec 8,393 page-faults # 0.001 M/sec 20,651,474,102 cycles # 3.497 GHz [83.36%] 8,480,803,345 stalled-cycles-frontend # 41.07% frontend cycles idle [83.37%] 4,247,826,420 stalled-cycles-backend # 20.57% backend cycles idle [66.64%] 30,011,465,614 instructions # 1.45 insns per cycle # 0.28 stalled cycles per insn [83.32%] 5,612,485,730 branches # 950.264 M/sec [83.36%] 13,584,136 branch-misses # 0.24% of all branches [83.29%] 5.915402403 seconds time elapsed </pre> <p>Significant differences, Fast =&gt; Slow:</p> <ul class="simple"> <li>Instruction per cycle: 2.31 =&gt; 1.45</li> <li>stalled-cycles-frontend: <strong>11.25% =&gt; 41.07%</strong></li> <li>stalled-cycles-backend: <strong>3.36% =&gt; 20.57%</strong></li> </ul> <p>The increase of stalled cycles is interesting. Since the code is supposed to be identical, it probably means that fetching instructions is slower. It sounds like an issue with CPU caches.</p> </div> <div class="section" id="statistics-on-the-cpu-l1-instruction-cache"> <h2>Statistics on the CPU L1 instruction cache</h2> <p>The <tt class="docutils literal">perf list</tt> command can be used to get the name of events collecting statistics on the CPU L1 instruction cache:</p> <pre class="literal-block"> $ perf list | grep L1 L1-icache-loads [Hardware cache event] L1-icache-load-misses [Hardware cache event] (...) </pre> <p>Collect statistics on the CPU L1 instruction cache:</p> <pre class="literal-block"> PYTHONPATH=~/perf perf stat -e L1-icache-loads,L1-icache-load-misses ./python-slow ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --worker -w0 -n10 </pre> <p>&quot;Fast&quot; statistics:</p> <pre class="literal-block"> Performance counter stats for './python-fast (...)': 10,134,106,571 L1-icache-loads 10,917,606 L1-icache-load-misses # 0.11% of all L1-icache hits 3.775067668 seconds time elapsed </pre> <p>&quot;Slow&quot; statistics:</p> <pre class="literal-block"> Performance counter stats for './python-slow (...)': 10,753,371,258 L1-icache-loads 848,511,308 L1-icache-load-misses # 7.89% of all L1-icache hits 6.020490449 seconds time elapsed </pre> <p>Cache misses on the L1 cache: <strong>0.1%</strong> (Fast) =&gt; <strong>8.0%</strong> (Slow).</p> <p>The slow Python has <strong>71.7x more L1 cache misses</strong> than the fast Python! It can explain the significant performance drop.</p> <div class="section" id="perf-report"> <h3>perf report</h3> <p>The <tt class="docutils literal">perf record</tt> command can be used to collect statistics on the functions where the benchmark spends most of its time. Commands:</p> <pre class="literal-block"> PYTHONPATH=~/perf perf record ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --worker -v -w0 -n100 perf report </pre> <p>Output:</p> <pre class="literal-block"> 40.27% python python [.] _PyEval_EvalFrameDefault 10.30% python python [.] call_function 10.21% python python [.] PyFrame_New 8.56% python python [.] frame_dealloc 5.51% python python [.] PyObject_GenericGetAttr (...) </pre> <p>More than 64% of the time is spent in these 5 functions.</p> </div> <div class="section" id="system-tune"> <h3>system tune</h3> <p>To run benchmark, tune again the system for benchmarks:</p> <pre class="literal-block"> sudo python3 -m perf system tune </pre> </div> </div> <div class="section" id="hg-bisect"> <h2>hg bisect</h2> <p>To find the revision which introduces the performance slowdown, we use a shell script to automate the bisection of the Mercurial history.</p> <p><tt class="docutils literal">cmd.sh</tt> script checking if a revision is fast or slow:</p> <pre class="literal-block"> set -e -x ./configure --with-lto -C &amp;&amp; make clean &amp;&amp; make rm -f json PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --worker -o json -v PYTHONPATH=~/perf python3 cmd.py json </pre> <p><tt class="docutils literal">cmd.sh</tt> uses the following <tt class="docutils literal">cmd.py</tt> script which checks if the benchmark is slow: if it takes longer than 23 ms (average between 17 ans 29 ms):</p> <pre class="literal-block"> import perf, sys bench = perf.Benchmark.load('json') bad = (29 + 17) / 2.0 ms = bench.median() * 1e3 if ms &gt;= bad: print(&quot;BAD! %.1f ms &gt;= %.1f ms&quot; % (ms, bad)) sys.exit(1) else: print(&quot;good: %.1f ms &lt; %.1f ms&quot; % (ms, bad)) </pre> <p>In the bisection, &quot;good&quot; means &quot;fast&quot; (17 ms), whereas &quot;bad&quot; means &quot;slow&quot; (29 ms). The peak, revision 1ce50f7027c1, is used as the first &quot;bad&quot; revision. The previous fast revision before the peak is 678fe178da0d, our first &quot;good&quot; revision.</p> <p>Commands to identify the first revision which introduced the slowdown:</p> <pre class="literal-block"> hg bisect --reset hg bisect -b 1ce50f7027c1 hg bisect -g 678fe178da0d time hg bisect -c ./cmd.sh </pre> <p>3 min 52 sec later:</p> <pre class="literal-block"> The first bad revision is: changeset: 104531:83877018ef97 parent: 104528:ce85a1f129e3 parent: 104530:2d352bf2b228 user: Serhiy Storchaka &lt;storchaka&#64;gmail.com&gt; date: Tue Oct 18 13:27:54 2016 +0300 files: Misc/NEWS description: Issue #23782: Fixed possible memory leak in _PyTraceback_Add() and exception loss in PyTraceBack_Here(). </pre> <p>Thank you <tt class="docutils literal">hg bisect</tt>! I love this tool.</p> <p>Even if I trust <tt class="docutils literal">hg bisect</tt>, I don't trust benchmarks, so I recheck manually:</p> <p>Slow:</p> <pre class="literal-block"> $ hg up -C -r 83877018ef97 $ ./configure --with-lto -C &amp;&amp; make clean &amp;&amp; make $ PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast call_method: Median +- std dev: 29.4 ms +- 1.8 ms </pre> <p>Use <tt class="docutils literal">hg parents</tt> to get the latest fast revision:</p> <pre class="literal-block"> $ hg parents -r 83877018ef97 changeset: 104528:ce85a1f129e3 (...) changeset: 104530:2d352bf2b228 branch: 3.6 (...) </pre> <p>Check the parent:</p> <pre class="literal-block"> $ hg up -C -r ce85a1f129e3 $ ./configure --with-lto -C &amp;&amp; make clean &amp;&amp; make $ PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast call_method: Median +- std dev: 17.1 ms +- 0.1 ms </pre> <p>The revision ce85a1f129e3 is fast and the following revision 83877018ef97 is slow. <strong>The revision 83877018ef97 introduced the slowdown</strong>. We found it!</p> </div> <div class="section" id="analysis-of-the-revision-introducing-the-slowdown"> <h2>Analysis of the revision introducing the slowdown</h2> <p>The <a class="reference external" href="https://hg.python.org/cpython/rev/83877018ef97/">revision 83877018ef97</a> changes two files: Misc/NEWS and Python/traceback.c. The NEWS file is only documentation and so must not impact performances. Python/traceback.c is part of the C code and so is more interesting.</p> <p>The commit only changes two C functions: <tt class="docutils literal">PyTraceBack_Here()</tt> and <tt class="docutils literal">_PyTraceback_Add()</tt>, but <tt class="docutils literal">perf report</tt> didn't show these functions as &quot;hot&quot;. In fact, these functions are never called by the benchmark.</p> <p><strong>The commit doesn't touch the C code used in the benchmark.</strong></p> <p>Unrelated C change impacting performances reminds me my previous <a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html">deadcode horror story</a>. The performance difference is probably caused by <strong>&quot;code placement&quot;</strong>: <tt class="docutils literal">perf stat</tt> showed a significant increase of the cache miss rate on the L1 instruction cache.</p> </div> <div class="section" id="use-gcc-attribute-hot"> <h2>Use GCC __attribute__((hot))</h2> <p>Using PGO compilation was the solution for deadcode, but PGO doesn't work on Ubuntu 14.04 (the OS used by the benchmark server, speed-python) and PGO seems to make benchmarks less reliable.</p> <p>I wanted to try something else: mark hot functions using the GCC <tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt> attribute. PGO compilation does this automatically.</p> <p>This attribute only has an impact on the code placement: where functions are loaded in memory. The flag declares functions in the <tt class="docutils literal">.text.hot</tt> ELF section rather than the <tt class="docutils literal">.text</tt> ELF section. Grouping hot functions in the same functions helps to reduce the distance between functions and so enhance the usage of CPU caches.</p> <p>I wrote and then pushed a patch in the <a class="reference external" href="http://bugs.python.org/issue28618">issue #28618</a>: &quot;Decorate hot functions using __attribute__((hot)) to optimize Python&quot;.</p> <p>The patch marks 6 functions as hot:</p> <ul class="simple"> <li><tt class="docutils literal">_PyEval_EvalFrameDefault()</tt></li> <li><tt class="docutils literal">call_function()</tt></li> <li><tt class="docutils literal">_PyFunction_FastCall()</tt></li> <li><tt class="docutils literal">PyFrame_New()</tt></li> <li><tt class="docutils literal">frame_dealloc()</tt></li> <li><tt class="docutils literal">PyErr_Occurred()</tt></li> </ul> <p>Let's try the patch:</p> <pre class="literal-block"> $ hg up -C -r 83877018ef97 $ wget https://hg.python.org/cpython/raw-rev/59b91b4e9506 -O patch $ patch -p1 &lt; patch $ ./configure --with-lto -C &amp;&amp; make clean &amp;&amp; make $ PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast call_method: Median +- std dev: 16.7 ms +- 0.3 ms </pre> <p>It's easy to make mistakes and benchmarks are always suprising, so let's retry without the patch:</p> <pre class="literal-block"> $ hg up -C -r 83877018ef97 $ ./configure --with-lto -C &amp;&amp; make clean &amp;&amp; make $ PYTHONPATH=~/perf ./python ~/performance/performance/benchmarks/bm_call_method.py --inherit-environ=PYTHONPATH --fast call_method: Median +- std dev: 29.3 ms +- 0.6 ms </pre> <p>The check confirms that the GCC attribute fixed the issue!</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>On modern Intel CPUs, the code placement can have a major impact on the performance of microbenchmarks.</p> <p>The GCC <tt class="docutils literal"><span class="pre">__attribute__((hot))</span></tt> attribute can be used manually to make &quot;hot functions&quot; close in memory to enhance the usage of CPU caches.</p> <p>To know more about the impact of code placement, see the very good talk of Zia Ansari (Intel) at the LLVM Developers' Meeting 2016: <a class="reference external" href="https://llvmdevelopersmeetingbay2016.sched.org/event/8YzY/causes-of-performance-instability-due-to-code-placement-in-x86">Causes of Performance Swings Due to Code Placement in IA</a>. He describes well &quot;performance swings&quot; like the one described in this article and explains how CPUs work internally and how code placement impacts CPU performances.</p> </div> Intel CPUs (part 2): Turbo Boost, temperature, frequency and Pstate C0 bug2016-09-23T23:00:00+02:002016-09-23T23:00:00+02:00Victor Stinnertag:vstinner.github.io,2016-09-23:/intel-cpus-part2.html<p class="first last">Intel CPUs (part 2): Turbo Boost, temperature, frequency and Pstate C0 bug</p> <p>My first article <a class="reference external" href="https://vstinner.github.io/intel-cpus.html">Intel CPUs</a> is a general introduction on modern CPU technologies having an impact on benchmarks.</p> <p>This second article is much more concrete with numbers and a concrete bug having a major impact on benchmarks: a benchmark suddenly becomes 2x faster!</p> <p>I will tell you how I first noticed the bug, which tests I ran to analyze the issue, how I found commands to reproduce the bug, and finally how I identified the bug.</p> <div class="section" id="glitch-in-benchmarks"> <h2>&quot;Glitch&quot; in benchmarks</h2> <p>Last week I ran a benchmark to check if enabling Profile Guided Optimization (PGO) when compiling Python makes benchmark results less stable. I recompiled Python 5 times, and after each compilation I ran a benchmark. I tested different commands and options to compile Python. Everything was fine until the last benchmark of the last compilation. <strong>The benchmark suddenly became 2 times faster.</strong></p> <p>Hopefully, my perf module collects a lot of metadata. I was able to analyze in depth what happened.</p> <p>The &quot;glitch&quot; occurred in a benchmark having 400 runs (benchmark run in 400 different processes), between the run 105 (20.3 ms) and the run 106 (11.0 ms).</p> <p>I noticed that the CPU temperature was between 69°C and 72°C until the run 105, and then decreased to from 69°C to 58°C.</p> <p>The system load slowly increased from 1.25 up to 1.62 around the run 108 and then slowly decreased to 1.00.</p> <p>The system was not idle while the benchmark was running. I was working on the PC too! But according to timestamps, it seems like the glitch was close to when I stopped working. When I stopped working, I closed all applications (except of the benchmark running in background) and turned of my two monitors.</p> <p>Well, at this point, it's hard to correlate for sure an event with the major performance change.</p> <p>So I started to analyze different factors affecting CPUs and benchmarks: Turbo Boost, CPU temperature and CPU frequency.</p> </div> <div class="section" id="impact-of-turbo-boost-on-benchmarks"> <h2>Impact of Turbo Boost on benchmarks</h2> <p>Without Turbo Boost, the maximum frequency of the &quot;Intel(R) Core(TM) i7-3520M CPU &#64; 2.90GHz&quot; of my laptop is 2.9 GHz. With Turbo Boost, the maximum frequency is 3.6 GHz if only one core is active, or 3.4 GHz otherwise:</p> <pre class="literal-block"> $ sudo cpupower frequency-info ... boost state support: Supported: yes Active: yes 3400 MHz max turbo 4 active cores 3400 MHz max turbo 3 active cores 3400 MHz max turbo 2 active cores 3600 MHz max turbo 1 active cores </pre> <p>I ran the bm_call_simple.py microbenchmark (CPU-bound) of performance 0.2.2.</p> <p>Turbo Boost disabled:</p> <ul class="simple"> <li>1 physical CPU active: 2.9 GHz, Median +- std dev: 14.6 ms +- 0.3 ms</li> <li>2 physical CPU active: 2.9 GHz, Median +- std dev: 14.7 ms +- 0.5 ms</li> </ul> <p>Turbo Boost enabled:</p> <ul class="simple"> <li>1 physical CPU active: 3.6 GHz, Median +- std dev: 11.8 ms +- 0.3 ms</li> <li>2 physical CPU active: 3.4 GHz, Median +- std dev: 12.4 ms +- 0.1 ms</li> </ul> <p><strong>The maximum performance boost is 19% faster</strong> (14.6 ms =&gt; 11.8 ms), the minimum boost if 15% faster (14.6 ms =&gt; 12.4 ms).</p> <p>Hum, I don't think that Turbo Boost can explain the bug.</p> </div> <div class="section" id="impact-of-the-cpu-temperature-on-benchmarks"> <h2>Impact of the CPU temperature on benchmarks</h2> <p>The CPU temperature is mentionned in Intel Turbo Boost documentation as a factor used to decide which P-state will be used. I always wanted to check how the CPU temperature impacts its performance.</p> <div class="section" id="burn-the-cpu-of-my-desktop-pc"> <h3>Burn the CPU of my desktop PC</h3> <p>CPU of my desktop PC: &quot;Intel(R) Core(TM) i7-2600 CPU &#64; 3.40GHz&quot;.</p> <p>I used my <a class="reference external" href="https://github.com/vstinner/misc/blob/master/bin/system_load.py">system_load.py script</a> to generate a system load higher than 10.</p> <p>When the fan is cooling correctly the CPU, all CPU run at 3.4 GHz (Turbo Boost was disabled) and the CPU temperature is 66°C.</p> <p>I used a simple sheet of paper to block the fan of my CPU. Yeah, I really wanted to <a class="reference external" href="https://www.youtube.com/watch?v=Xf0VuRG7MN4">burn my CPU</a>! More seriously, I checked the CPU temperature every second using the <tt class="docutils literal">sensors</tt> command and was prepared to unblock the fan if sometimes gone wrong.</p> <img alt="Sheet of paper blocking the CPU fan" src="https://vstinner.github.io/images/paper_blocks_cpu_fan.jpg" /> <p>After one minute, the CPU reached 97°C. I expected a system crash, smoke or something worse, but I was disappointed. <strong>At 97°C, I was still able to use my computer as everything was fine. The CPU was slowly down automatically to the minimum CPU frequency: 1533 MHz</strong> according to turbostat (the minimum frequency of this CPU is 1.6 GHz).</p> <p>When I unblocked the fan, the temperature decreased quickly to go back to its previous state (62°C) and the CPU frequency quickly increased to 3.4 GHz as well.</p> <p>My Intel CPU is really impressive! I didn't expect such very efficient protection against overheating!</p> </div> <div class="section" id="burn-my-laptop-cpu"> <h3>Burn my laptop CPU</h3> <p>I used my system_load.py script to get a system load over 200. I also opened 4 tabs in Firefox playing Youtube videos to stress also the GPU which is integrated into the CPU (IGP) on such laptop.</p> <img alt="Stress test playing Youtube videos in Firefox, CPU at 102°" src="https://vstinner.github.io/images/burn_cpu_firefox.jpg" /> <p>With such crazy stress test, the CPU temperature was &quot;only&quot; 83°C.</p> <p>Using a simple tissue, I closed the air hole used by the CPU fan. <strong>When the CPU temperature increased from 100°C to 101°C, the CPU frequency started slowly to decrease from 3391 MHz to 3077 MHz</strong> (with steps between 10 MHz and 50 MHz every second, or something like that).</p> <p>When pushing hard the tissue and waiting longer than 5 minutes, the CPU temperature increased up to 102°C, but the CPU frequency was only decreased from 3.4 GHz (Turbo Mode with 4 active logical CPUs) to 3.1 GHz.</p> <p>The maximum frequency is 2.9 GHz. Frequencies higher than 2.9 GHz means that the Turbo Mode was enabled! It means that <strong>even with overheating, the CPU is still fine and able to &quot;overclock&quot; itself!</strong></p> <p>Again, I was disapointed. With a CPU at 102°C, my laptop was still super fast and reactive. It seems like mobile CPUs handle even better overheating than desktop CPUs (which is not something suprising at all).</p> </div> </div> <div class="section" id="impact-of-the-cpu-frequency-on-benchmarks"> <h2>Impact of the CPU frequency on benchmarks</h2> <p>I ran the bm_call_simple.py microbenchmark (CPU-bound) of performance 0.2.2 on my desktop PC.</p> <p>Command to set the frequency of CPU 0 to the minimum frequency (1.6 GHz):</p> <pre class="literal-block"> $ cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq|sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 1600000 </pre> <p>Command to set the frequency of CPU 0 to the maximum frequency (3.4 GHz):</p> <pre class="literal-block"> $ cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq|sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 3400000 </pre> <ul class="simple"> <li>CPU running at 1.6 GHz (min freq): Median +- std dev: 27.7 ms +- 0.7 ms</li> <li>CPU running at 3.4 GHz (min freq): Median +- std dev: 12.9 ms +- 0.2 ms</li> </ul> <p>The impact of the CPU frequency is quite obvious: <strong>when the CPU frequency is doubled, the performance is also doubled</strong>. The benchmark is 53% faster (27.7 ms =&gt; 12.9 ms).</p> </div> <div class="section" id="bug-reproduced-and-then-identified-in-the-linux-cpu-driver"> <h2>Bug reproduced and then identified in the Linux CPU driver</h2> <p>Two days ago, I ran a very simple &quot;timeit&quot; microbenchmark to try to bisect a performance regression in Python 3.6 on <tt class="docutils literal">functools.partial</tt>. Again, suddenly, the microbenchmark became 2x faster!</p> <p>But this time, I found something: I noticed that running or stopping <tt class="docutils literal">cpupower monitor</tt> and/or <tt class="docutils literal">turbostat</tt> can &quot;enable&quot; or &quot;disable&quot; the bug.</p> <p>After a lot of tests, I understood that running the benchmark with turbostat &quot;disables&quot; the bug, whereas running &quot;cpupower monitor&quot; while running a benchmark enables the bug.</p> <p>I reported the bug in the Fedora bug tracker, on the component kernel: <a class="reference external" href="https://bugzilla.redhat.com/show_bug.cgi?id=1378529">intel_pstate C0 bug on isolated CPUs with the performance governor and NOHZ_FULL</a>.</p> <p>It seems like the bug is related to CPU isolation and NOHZ_FULL. The NOHZ_FULL option is able to fully disable the scheduler clock interruption on isolated CPUs. I understood the the <tt class="docutils literal">intel_pstate</tt> driver uses a callback on the scheduler to update the Pstate of the CPU. According to an Intel engineer, the <tt class="docutils literal">intel_pstate</tt> driver was never tested with CPU isolation.</p> <p>The issue is not fully analyzed yet, but at least I succeeded to write a list of commands to reproduce it with a success rate of 100% :-) Moreover, the Intel engineer suggested to add an extra parameter to the Linux kernel command (<tt class="docutils literal">rcu_nocbs=3,7</tt>) line which works around the issue.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>This article describes how I found and then identified a bug in the Linux driver of my CPU.</p> <p>Summary:</p> <ul class="simple"> <li>The maximum speedup of Turbo Boost is 20%</li> <li>Overheating on a dekstop PC can decrease the CPU frequency to its minimum (half of the maximum in my case) which imply a slowdown of 50%</li> <li>A bug in the Linux CPU driver changes suddenly the CPU frequency from its minimum to maximum (or the opposite) which means a speedup of 50% (or slowdown of 50%)</li> </ul> <p><strong>To get stable benchmarks, the safest fix for all these issues is probably to set the CPU frequency of the CPUs used by benchmarks to the minimum.</strong> It seems like nothing can reduce the frequency of a CPU below its minimum.</p> <p><strong>When running benchmarks, raw timings and CPU performance don't matter. Only comparisons between benchmark results and stable performances matter.</strong></p> </div> Intel CPUs: P-state, C-state, Turbo Boost, CPU frequency, etc.2016-07-15T12:00:00+02:002016-07-15T12:00:00+02:00Victor Stinnertag:vstinner.github.io,2016-07-15:/intel-cpus.html<p class="first last">Intel CPUs: Hyper-threading, Turbo Boost, CPU frequency, etc.</p> <p>Ten years ago, most computers were desktop computers designed for best performances and their CPU frequency was fixed. Nowadays, most devices are embedded and use <a class="reference external" href="https://en.wikipedia.org/wiki/Low-power_electronics">low power consumption</a> processors like ARM CPUs. The power consumption now matters more than performance peaks.</p> <p>Intel CPUs evolved from a single core to multiple physical cores in the same <a class="reference external" href="https://en.wikipedia.org/wiki/CPU_socket">package</a> and got new features: <a class="reference external" href="https://en.wikipedia.org/wiki/Hyper-threading">Hyper-threading</a> to run two threads on the same physical core and <a class="reference external" href="https://en.wikipedia.org/wiki/Intel_Turbo_Boost">Turbo Boost</a> to maximum performances. CPU cores can be completely turned off (CPU HALT, frequency of 0) temporarily to reduce the power consumption, and the frequency of cores changes regulary depending on many factors like the workload and temperature. The power consumption is now an important part in the design of modern CPUs.</p> <p>Warning! This article is a summary of what I learnt last weeks from random articles. It may be full of mistakes, don't hesitate to report them, so I can enhance the article! It's hard to find simple articles explaining performances of modern Intel CPUs, so I tried to write mine.</p> <div class="section" id="tools-used-in-this-article"> <h2>Tools used in this article</h2> <p>This article mentions various tools. Commands to install them on Fedora 24:</p> <p><tt class="docutils literal">dnf install <span class="pre">-y</span> <span class="pre">util-linux</span></tt>:</p> <ul class="simple"> <li>lscpu</li> </ul> <p><tt class="docutils literal">dnf install <span class="pre">-y</span> <span class="pre">kernel-tools</span></tt>:</p> <ul class="simple"> <li><a class="reference external" href="http://linux.die.net/man/1/cpupower">cpupower</a></li> <li>turbostat</li> </ul> <p><tt class="docutils literal">sudo dnf install <span class="pre">-y</span> <span class="pre">msr-tools</span></tt>:</p> <ul class="simple"> <li>rdmsr</li> <li>wrmsr</li> </ul> <p>Other interesting tools, not used in this article: i7z (sadly no more maintained), lshw, dmidecode, sensors.</p> <p>The sensors tool is supposed to report the current CPU voltage, but it doesn't provide this information on my computers. At least, it gives the temperature of different components, but also the speed of fans.</p> </div> <div class="section" id="example-of-intel-cpus"> <h2>Example of Intel CPUs</h2> <div class="section" id="my-laptop-cpu-proc-cpuinfo"> <h3>My laptop CPU: /proc/cpuinfo</h3> <p>On Linux, the most common way to retrieve information on the CPU is to read <tt class="docutils literal">/proc/cpuinfo</tt>. Example on my laptop:</p> <pre class="literal-block"> selma$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel model name : Intel(R) Core(TM) i7-3520M CPU &#64; 2.90GHz cpu MHz : 1200.214 ... processor : 1 vendor_id : GenuineIntel model name : Intel(R) Core(TM) i7-3520M CPU &#64; 2.90GHz cpu MHz : 3299.882 ... </pre> <p>&quot;i7-3520M&quot; CPU is a model designed for Mobile Platforms (see the &quot;M&quot; suffix). It was built in 2012 and is the third generation of the Intel i7 microarchitecture: <a class="reference external" href="https://en.wikipedia.org/wiki/Ivy_Bridge_(microarchitecture)">Ivy Bridge</a>.</p> <p>The CPU has two physical cores, I disabled HyperThreading in the BIOS.</p> <p>The first strange thing is that the CPU announces &quot;2.90 GHz&quot; but Linux reports 1.2 GHz on the first core, and 3.3 GHz on the second core. 3.3 GHz is greater than 2.9 GHz!</p> </div> <div class="section" id="my-desktop-cpu-cpu-topology-with-lscpu"> <h3>My desktop CPU: CPU topology with lscpu</h3> <p>cpuinfo:</p> <pre class="literal-block"> smithers$ cat /proc/cpuinfo processor : 0 physical id : 0 core id : 0 ... model name : Intel(R) Core(TM) i7-2600 CPU &#64; 3.40GHz cpu cores : 4 ... processor : 1 physical id : 0 core id : 1 ... (...) processor : 7 physical id : 0 core id : 3 ... </pre> <p>The CPU i7-2600 is the 2nd generation: <a class="reference external" href="https://en.wikipedia.org/wiki/Sandy_Bridge">Sandy Bridge microarchitecture</a>. There are 8 logical cores and 4 physical cores (so with Hyper-threading).</p> <p>The <tt class="docutils literal">lscpu</tt> renders a short table which helps to understand the CPU topology:</p> <pre class="literal-block"> smithers$ lscpu -a -e CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ 0 0 0 0 0:0:0:0 yes 3800.0000 1600.0000 1 0 0 1 1:1:1:0 yes 3800.0000 1600.0000 2 0 0 2 2:2:2:0 yes 3800.0000 1600.0000 3 0 0 3 3:3:3:0 yes 3800.0000 1600.0000 4 0 0 0 0:0:0:0 yes 3800.0000 1600.0000 5 0 0 1 1:1:1:0 yes 3800.0000 1600.0000 6 0 0 2 2:2:2:0 yes 3800.0000 1600.0000 7 0 0 3 3:3:3:0 yes 3800.0000 1600.0000 </pre> <p>There are 8 logical CPUs (<tt class="docutils literal">CPU <span class="pre">0..7</span></tt>), all on the same node (<tt class="docutils literal">NODE 0</tt>) and the same socket (<tt class="docutils literal">SOCKET 0</tt>). There are only 4 physical cores (<tt class="docutils literal">CORE <span class="pre">0..3</span></tt>). For example, the physical core <tt class="docutils literal">2</tt> is made of the two logical CPUs: <tt class="docutils literal">2</tt> and <tt class="docutils literal">6</tt>.</p> <p>Using the <tt class="docutils literal">L1d:L1i:L2:L3</tt> column, we can see that each pair of two logical cores share the same physical core caches for levels 1 (L1 data, L1 instruction) and 2 (L2). All physical cores share the same cache level 3 (L3).</p> </div> </div> <div class="section" id="p-states"> <h2>P-states</h2> <p>A new CPU driver <tt class="docutils literal">intel_pstate</tt> was added to the Linux kernel 3.9 (April 2009). First, it only supported SandyBridge CPUs (2nd generation), Linux 3.10 extended it to Ivybridge generation CPUs (3rd gen), and so on and so forth.</p> <p>This driver supports recent features and thermal control of modern Intel CPUs. Its name comes from P-states.</p> <p>The processor P-state is the capability of running the processor at different voltage and/or frequency levels. Generally, P0 is the highest state resulting in maximum performance, while P1, P2, and so on, will save power but at some penalty to CPU performance.</p> <p>It is possible to force the legacy CPU driver (<tt class="docutils literal">acpi_cpufreq</tt>) using <tt class="docutils literal">intel_pstate=disable</tt> option in the kernel command line.</p> <p>See also:</p> <ul class="simple"> <li><a class="reference external" href="https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt">Documentation of the intel-pstate driver</a></li> <li><a class="reference external" href="https://plus.google.com/+ArjanvandeVen/posts/dLn9T4ehywL">Some basics on CPU P states on Intel processors</a> (2013) by Arjan van de Ven (Intel)</li> <li><a class="reference external" href="https://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf">Balancing Power and Performance in the Linux Kernel</a> talk at LinuxCon Europe 2015 by Kristen Accardi (Intel)</li> <li><a class="reference external" href="https://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1">What exactly is a P-state? (Pt. 1)</a> (2008) by Taylor K. (Intel)</li> </ul> </div> <div class="section" id="idle-states-c-states"> <h2>Idle states: C-states</h2> <p>C-states are idle power saving states, in contrast to P-states, which are execution power saving states.</p> <p>During a P-state, the processor is still executing instructions, whereas during a C-state (other than C0), the processor is idle, meaning that nothing is executing.</p> <p>C-states:</p> <ul class="simple"> <li>C0 is the operational state, meaning that the CPU is doing useful work</li> <li>C1 is the first idle state</li> <li>C2 is the second idle state: The external I/O Controller Hub blocks interrupts to the processor.</li> <li>etc.</li> </ul> <p>When a logical processor is idle (C-state except of C0), its frequency is typically 0 (HALT).</p> <p>The <tt class="docutils literal">cpupower <span class="pre">idle-info</span></tt> command lists supported C-states:</p> <pre class="literal-block"> selma$ cpupower idle-info CPUidle driver: intel_idle CPUidle governor: menu analyzing CPU 0: Number of idle states: 6 Available idle states: POLL C1-IVB C1E-IVB C3-IVB C6-IVB C7-IVB ... </pre> <p>The <tt class="docutils literal">cpupower monitor</tt> shows statistics on C-states:</p> <pre class="literal-block"> smithers$ sudo cpupower monitor -m Idle_Stats |Idle_Stats CPU | POLL | C1-S | C1E- | C3-S | C6-S 0| 0,00| 0,19| 0,09| 0,58| 96,23 4| 0,00| 0,00| 0,00| 0,00| 99,90 1| 0,00| 2,34| 0,00| 0,00| 97,63 5| 0,00| 0,00| 0,17| 0,00| 98,02 2| 0,00| 0,00| 0,00| 0,00| 0,00 6| 0,00| 0,00| 0,00| 0,00| 0,00 3| 0,00| 0,00| 0,00| 0,00| 0,00 7| 0,00| 0,00| 0,00| 0,00| 49,97 </pre> <p>See also: <a class="reference external" href="https://software.intel.com/en-us/articles/power-management-states-p-states-c-states-and-package-c-states">Power Management States: P-States, C-States, and Package C-States</a>.</p> </div> <div class="section" id="turbo-boost-1"> <h2>Turbo Boost</h2> <p>In 2005, Intel introduced <a class="reference external" href="https://en.wikipedia.org/wiki/SpeedStep">SpeedStep</a>, a serie of dynamic frequency scaling technologies to reduce the power consumption of laptop CPUs. Turbo Boost is an enhancement of these technologies, now also used on desktop and server CPUs.</p> <p>Turbo Boost allows to run one or many CPU cores to higher P-states than usual. The maximum P-state is constrained by the following factors:</p> <ul class="simple"> <li>The number of active cores (in C0 or C1 state)</li> <li>The estimated current consumption of the processor (Imax)</li> <li>The estimated power consumption (TDP - Thermal Design Power) of processor</li> <li>The temperature of the processor</li> </ul> <p>Example on my laptop:</p> <pre class="literal-block"> selma$ cat /proc/cpuinfo model name : Intel(R) Core(TM) i7-3520M CPU &#64; 2.90GHz ... selma$ sudo cpupower frequency-info analyzing CPU 0: driver: intel_pstate ... boost state support: Supported: yes Active: yes 3400 MHz max turbo 4 active cores 3400 MHz max turbo 3 active cores 3400 MHz max turbo 2 active cores 3600 MHz max turbo 1 active cores </pre> <p>The CPU base frequency is 2.9 GHz. If more than one physical cores is &quot;active&quot; (busy), their frequency can be increased up to 3.4 GHz. If only 1 physical core is active, its frequency can be increased up to 3.6 GHz.</p> <p>In this example, Turbo Boost is supported and active.</p> <p>See also the <a class="reference external" href="https://www.kernel.org/doc/Documentation/cpu-freq/boost.txt">Linux cpu-freq documentation on CPU boost</a>.</p> <div class="section" id="turbo-boost-msr"> <h3>Turbo Boost MSR</h3> <p>The bit 38 of the <a class="reference external" href="https://en.wikipedia.org/wiki/Model-specific_register">Model-specific register (MSR)</a> <tt class="docutils literal">0x1a0</tt> can be used to check if the Turbo Boost is enabled:</p> <pre class="literal-block"> selma$ sudo rdmsr -f 38:38 0x1a0 0 </pre> <p><tt class="docutils literal">0</tt> means that Turbo Boost is enabled, whereas <tt class="docutils literal">1</tt> means disabled (no turbo). (The <tt class="docutils literal"><span class="pre">-f</span> 38:38</tt> option asks to only display the bit 38.)</p> <p>If the command doesn't work, you may have to load the <tt class="docutils literal">msr</tt> kernel module:</p> <pre class="literal-block"> sudo modprobe msr </pre> <p>Note: I'm not sure that all Intel CPU uses the same MSR.</p> </div> <div class="section" id="intel-state-no-turbo"> <h3>intel_state/no_turbo</h3> <p>Turbo Boost can also be disabled at runtime in the <tt class="docutils literal">intel_pstate</tt> driver.</p> <p>Check if Turbo Boost is enabled:</p> <pre class="literal-block"> selma$ cat /sys/devices/system/cpu/intel_pstate/no_turbo 0 </pre> <p>where <tt class="docutils literal">0</tt> means that Turbo Boost is enabled. Disable Turbo Boost:</p> <pre class="literal-block"> selma$ echo 1|sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo </pre> </div> <div class="section" id="cpu-flag-ida"> <h3>CPU flag &quot;ida&quot;</h3> <p>It looks like the Turbo Boost status (supported or not) can also be read by the CPUID(6): &quot;Thermal/Power Management&quot;. It gives access to the flag <a class="reference external" href="https://en.wikipedia.org/wiki/Intel_Dynamic_Acceleration">Intel Dynamic Acceleration (IDA)</a>.</p> <p>The <tt class="docutils literal">ida</tt> flag can also be seen in CPU flags of <tt class="docutils literal">/proc/cpuinfo</tt>.</p> </div> </div> <div class="section" id="read-the-cpu-frequency"> <h2>Read the CPU frequency</h2> <p>General information using <tt class="docutils literal">cpupower <span class="pre">frequency-info</span></tt>:</p> <pre class="literal-block"> selma$ cpupower -c 0 frequency-info analyzing CPU 0: driver: intel_pstate ... hardware limits: 1.20 GHz - 3.60 GHz ... </pre> <p>The frequency of CPUs is between 1.2 GHz and 3.6 GHz (the base frequency is 2.9 GHz on this CPU).</p> <div class="section" id="get-the-frequency-of-cpus-turbostat"> <h3>Get the frequency of CPUs: turbostat</h3> <p>It looks like the most reliable way to get a relialistic estimation of the CPUs frequency is to use the tool <tt class="docutils literal">turbostat</tt>:</p> <pre class="literal-block"> selma$ sudo turbostat CPU Avg_MHz Busy% Bzy_MHz TSC_MHz - 224 7.80 2878 2893 0 448 15.59 2878 2893 1 0 0.01 2762 2893 CPU Avg_MHz Busy% Bzy_MHz TSC_MHz - 139 5.65 2469 2893 0 278 11.29 2469 2893 1 0 0.01 2686 2893 ... </pre> <ul class="simple"> <li><tt class="docutils literal">Avg_MHz</tt>: average frequency, based on APERF</li> <li><tt class="docutils literal">Busy%</tt>: CPU usage in percent</li> <li><tt class="docutils literal">Bzy_MHz</tt>: busy frequency, based on MPERF</li> <li><tt class="docutils literal">TSC_MHz</tt>: fixed frequency, TSC stands for <a class="reference external" href="https://en.wikipedia.org/wiki/Time_Stamp_Counter">Time Stamp Counter</a></li> </ul> <p>APERF (average) and MPERF (maximum) are MSR registers that can provide feedback on current CPU frequency.</p> </div> <div class="section" id="other-tools-to-get-the-cpu-frequency"> <h3>Other tools to get the CPU frequency</h3> <p>It looks like the following tools are less reliable to estimate the CPU frequency.</p> <p>cpuinfo:</p> <pre class="literal-block"> selma$ grep MHz /proc/cpuinfo cpu MHz : 1372.289 cpu MHz : 3401.042 </pre> <p>In April 2016, Len Brown proposed a patch modifying cpuinfo to use APERF and MPERF MSR to estimate the CPU frequency: <a class="reference external" href="https://lkml.org/lkml/2016/4/1/7">x86: Calculate MHz using APERF/MPERF for cpuinfo and scaling_cur_freq</a>.</p> <p>The <tt class="docutils literal">tsc</tt> clock source logs the CPU frequency in kernel logs:</p> <pre class="literal-block"> selma$ dmesg|grep 'MHz processor' [ 0.000000] tsc: Detected 2893.331 MHz processor </pre> <p>cpupower frequency-info:</p> <pre class="literal-block"> selma$ for core in $(seq 0 1); do sudo cpupower -c $core frequency-info|grep 'current CPU'; done current CPU frequency: 3.48 GHz (asserted by call to hardware) current CPU frequency: 3.40 GHz (asserted by call to hardware) </pre> <p>cpupower monitor:</p> <pre class="literal-block"> selma$ sudo cpupower monitor -m 'Mperf' |Mperf CPU | C0 | Cx | Freq 0| 4.77| 95.23| 1924 1| 0.01| 99.99| 1751 </pre> </div> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>Modern Intel CPUs use various technologies to provide best performances without killing the power consumption. It became harder to monitor and understand CPU performances, than with older CPUs, since the performance now depends on much more factors.</p> <p>It also becomes common to get an integrated graphics processor (IGP) in the same package, which makes the exact performance even more complex to predict, since the IGP produces heat and so has an impact on the CPU P-state.</p> <p>I should also explain that P-state are &quot;voted&quot; between CPU cores, but I didn't understand this part. I'm not sure that understanding the exact algorithm matters much. I tried to not give too much information.</p> </div> <div class="section" id="annex-amt-and-the-me-power-management-coprocessor"> <h2>Annex: AMT and the ME (power management coprocessor)</h2> <p>Computers with Intel vPro technology includes <a class="reference external" href="https://en.wikipedia.org/wiki/Intel_Active_Management_Technology">Intel Active Management Technology (AMT)</a>: &quot;hardware and firmware technology for remote out-of-band management of personal computers&quot;. AMT has many features which includes power management.</p> <p><a class="reference external" href="https://en.wikipedia.org/wiki/Intel_Active_Management_Technology#Hardware">Management Engine (ME)</a> is the hardware part: an isolated and protected coprocessor, embedded as a non-optional part in all current (as of 2015) Intel chipsets. The coprocessor is a special 32-bit ARC microprocessor (RISC architecture) that's physically located inside the PCH chipset (or MCH on older chipsets). The coprocessor can for example be found on Intel MCH chipsets Q35 and Q45.</p> <p>See <a class="reference external" href="https://boingboing.net/2016/06/15/intel-x86-processors-ship-with.html">Intel x86s hide another CPU that can take over your machine (you can't audit it)</a> for more information on the coprocessor.</p> <p>More recently, the Intel Xeon Phi CPU (2016) also includes a coprocessor for power management. I didn't understand if it is the same coprocessor or not.</p> </div> Visualize the system noise using perf and CPU isolation2016-06-16T13:30:00+02:002016-06-16T13:30:00+02:00Victor Stinnertag:vstinner.github.io,2016-06-16:/perf-visualize-system-noise-with-cpu-isolation.html<p>I developed a new <a class="reference external" href="http://perf.readthedocs.io/">perf module</a> designed to run stable benchmarks, give fine control on benchmark parameters and compute statistics on results. With such tool, it becomes simple to <em>visualize</em> sources of noise. The CPU isolation will be used to visualize the system noise. Running a benchmark on isolated CPUs …</p><p>I developed a new <a class="reference external" href="http://perf.readthedocs.io/">perf module</a> designed to run stable benchmarks, give fine control on benchmark parameters and compute statistics on results. With such tool, it becomes simple to <em>visualize</em> sources of noise. The CPU isolation will be used to visualize the system noise. Running a benchmark on isolated CPUs isolates it from the system noise.</p> <div class="section" id="isolate-cpus"> <h2>Isolate CPUs</h2> <p>My computer has 4 physical CPU cores. I isolated half of them using <tt class="docutils literal">isolcpus=2,3</tt> parameter of the Linux kernel. I modified manually the command line in GRUB to add this parameter.</p> <p>Check that CPUs are isolated:</p> <pre class="literal-block"> $ cat /sys/devices/system/cpu/isolated 2-3 </pre> <p>The CPU supports HyperThreading, but I disabled it in the BIOS.</p> </div> <div class="section" id="run-a-benchmark"> <h2>Run a benchmark</h2> <p>The <tt class="docutils literal">perf</tt> module automatically detects and uses isolated CPU cores. I will use the <tt class="docutils literal"><span class="pre">--affinity=0,1</span></tt> option to force running the benchmark on the CPUs which are not isolated.</p> <p>Microbenchmark with and without CPU isolation:</p> <pre class="literal-block"> $ python3 -m perf.timeit --json-file=timeit_isolcpus.json --verbose -s 'x=1; y=2' 'x+y' Pin process to isolated CPUs: 2-3 ......................... Median +- std dev: 36.6 ns +- 0.1 ns (25 runs x 3 samples x 10^7 loops; 1 warmup) $ python3 -m perf.timeit --affinity=0,1 --json-file=timeit_no_isolcpus.json --verbose -s 'x=1; y=2' 'x+y' Pin process to CPUs: 0-1 ......................... Median +- std dev: 36.7 ns +- 1.3 ns (25 runs x 3 samples x 10^7 loops; 1 warmup) </pre> <p>My computer was not 100% idle, I was using it while the benchmarks were running.</p> <p>The median is almost the same (36.6 ns and 36.7 ns). The first major difference is the standard deviation: it is much larger without CPU isolation: 0.1 ns =&gt; 1.3 ns (13x larger).</p> <p>Just in case, check manually CPU affinity in metadata:</p> <pre class="literal-block"> $ python3 -m perf show timeit_isolcpus.json --metadata | grep cpu - cpu_affinity: 2-3 (isolated) - cpu_count: 4 - cpu_model_name: Intel(R) Core(TM) i7-2600 CPU &#64; 3.40GHz $ python3 -m perf show timeit_no_isolcpus.json --metadata | grep cpu_affinity - cpu_affinity: 0-1 </pre> </div> <div class="section" id="statistics"> <h2>Statistics</h2> <p>The <tt class="docutils literal">perf stats</tt> command computes statistics on the distribution of samples:</p> <pre class="literal-block"> $ python3 -m perf stats timeit_isolcpus.json Number of samples: 75 Minimum: 36.5 ns (-0.1%) Median +- std dev: 36.6 ns +- 0.1 ns (36.5 ns .. 36.7 ns) Maximum: 36.7 ns (+0.4%) $ python3 -m perf stats timeit_no_isolcpus.json Number of samples: 75 Minimum: 36.5 ns (-0.5%) Median +- std dev: 36.7 ns +- 1.3 ns (35.4 ns .. 38.0 ns) Maximum: 43.0 ns (+17.0%) </pre> <p>The minimum is the same. The second major difference is the maximum: it is much larger without CPU isolation: 36.7 ns (+0.4%) =&gt; 43.0 ns (+17.0%).</p> <p>The difference between the maximum and the median is 63x larger without CPU isolation: 0.1 ns (<tt class="docutils literal">36.7 - 36.6</tt>) =&gt; 6.3 ns (<tt class="docutils literal">43.0 - 36.7</tt>).</p> <p>Depending on the system load, a single sample of the microbenchmark is up to 17% slower (maximum of 43.0 ns with a median of 36.7 ns) without CPU isolation. The difference is smaller with CPU isolation: only 0.4% slower (for the maximum, and 0.1% faster for the minimum).</p> </div> <div class="section" id="histogram"> <h2>Histogram</h2> <p>Another way to analyze the distribution of samples is to render an histogram:</p> <pre class="literal-block"> $ python3 -m perf hist --bins=8 timeit_isolcpus.json timeit_no_isolcpus.json [ timeit_isolcpus ] 36.1 ns: 75 ################################################ 36.9 ns: 0 | 37.7 ns: 0 | 38.5 ns: 0 | 39.3 ns: 0 | 40.1 ns: 0 | 40.9 ns: 0 | 41.7 ns: 0 | 42.5 ns: 0 | [ timeit_no_isolcpus ] 36.1 ns: 52 ################################################ 36.9 ns: 13 ############ 37.7 ns: 1 # 38.5 ns: 4 #### 39.3 ns: 2 ## 40.1 ns: 0 | 40.9 ns: 1 # 41.7 ns: 0 | 42.5 ns: 2 ## </pre> <p>I choose the number of bars to get a small histogram and to get all samples of the first benchmark on the same bar. With 8 bars, each bar is a range of 0.8 ns.</p> <p>The last major difference is the shape of these histogram. Without CPU isolation, there is a &quot;long tail&quot; at the right of the median: <a class="reference external" href="https://en.wikipedia.org/wiki/Outlier">outliers</a> in the range [37.7 ns; 42.5 ns]. The outliers come from the &quot;noise&quot; caused by the multitasking system.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>The <tt class="docutils literal">perf</tt> module provides multiple tools to analyze the distribution of benchmark samples. Three tools show a major difference without CPU isolation compared to results with CPU isolation:</p> <ul class="simple"> <li>Standard deviation: 13x larger without isolation</li> <li>Maximum: difference to median 63x larger without isolation</li> <li>Shape of the histogram: long tail at the right of the median</li> </ul> <p>It explains why CPU isolation helps to make benchmarks more stable.</p> </div> My journey to stable benchmark, part 3 (average)2016-05-23T23:00:00+02:002016-05-23T23:00:00+02:00Victor Stinnertag:vstinner.github.io,2016-05-23:/journey-to-stable-benchmark-average.html<p class="first last">My journey to stable benchmark, part 3 (average)</p> <a class="reference external image-reference" href="https://www.flickr.com/photos/stanzim/11100202065/"> <img alt="Fog" src="https://vstinner.github.io/images/fog.jpg" /> </a> <p><em>Stable benchmarks are so close, but ...</em></p> <div class="section" id="address-space-layout-randomization"> <h2>Address Space Layout Randomization</h2> <p>When I started to work on removing the noise of the system, I was told that disabling <a class="reference external" href="https://en.wikipedia.org/wiki/Address_space_layout_randomization">Address Space Layout Randomization (ASLR)</a> makes benchmarks more stable.</p> <p>I followed this advice without trying to understand it. We will see in this article that it was a bad idea, but I had to hit other issues to really understand the root issue with disabling ASLR.</p> <p>Example of command to see the effect of ASLR, the first number of the output is the start address of the heap memory:</p> <pre class="literal-block"> $ python -c 'import os; os.system(&quot;grep heap /proc/%s/maps&quot; % os.getpid())' 55e6a716c000-55e6a7235000 rw-p 00000000 00:00 0 [heap] </pre> <p>Heap address of 3 runs with ASLR enabled (random):</p> <ul class="simple"> <li>55e6a716c000</li> <li>561c218eb000</li> <li>55e6f628f000</li> </ul> <p>Disable ASLR:</p> <pre class="literal-block"> sudo bash -c 'echo 0 &gt;| /proc/sys/kernel/randomize_va_space' </pre> <p>Heap addresses of 3 runs with ASLR disabled (all the same):</p> <ul class="simple"> <li>555555756000</li> <li>555555756000</li> <li>555555756000</li> </ul> <p>Note: To reenable ASLR, it's better to use the value 2, the value 1 only partially enables the feature:</p> <pre class="literal-block"> sudo bash -c 'echo 2 &gt;| /proc/sys/kernel/randomize_va_space' </pre> </div> <div class="section" id="python-randomized-hash-function"> <h2>Python randomized hash function</h2> <p>With <a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-system.html">system tuning (part 1)</a>, a <a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html">Python compiled with PGO (part 2)</a> and ASLR disabled, I still I failed to get the same result when running manually <tt class="docutils literal">bm_call_simple.py</tt>.</p> <p>On Python 3, the hash function is now randomized by default: <a class="reference external" href="http://bugs.python.org/issue13703">issue #13703</a>. The problem is that for a microbenchmark, the number of hash collisions of an &quot;hot&quot; dictionary has a non-negligible impact on performances.</p> <p>The <tt class="docutils literal">PYTHONHASHSEED</tt> environment variable can be used to get a fixed hash function. Example with the patch:</p> <pre class="literal-block"> $ PYTHONHASHSEED=1 taskset -c 1 ./python bm_call_simple.py -n 1 0.198 $ PYTHONHASHSEED=2 taskset -c 1 ./python bm_call_simple.py -n 1 0.201 $ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 0.207 $ PYTHONHASHSEED=4 taskset -c 1 ./python bm_call_simple.py -n 1 0.187 $ PYTHONHASHSEED=5 taskset -c 1 ./python bm_call_simple.py -n 1 0.180 </pre> <p>Timings of the reference python:</p> <pre class="literal-block"> $ PYTHONHASHSEED=1 taskset -c 1 ./ref_python bm_call_simple.py -n 1 0.204 $ PYTHONHASHSEED=2 taskset -c 1 ./ref_python bm_call_simple.py -n 1 0.206 $ PYTHONHASHSEED=3 taskset -c 1 ./ref_python bm_call_simple.py -n 1 0.195 $ PYTHONHASHSEED=4 taskset -c 1 ./ref_python bm_call_simple.py -n 1 0.192 $ PYTHONHASHSEED=5 taskset -c 1 ./ref_python bm_call_simple.py -n 1 0.187 </pre> <p>The minimums is 180 ms for the reference and 186 ms for the patch. The patched Python is 3% faster, yeah!</p> <p>Wait. What if we only test PYTHONHASHSEED from 1 to 3? In this case, the minimum is 195 ms for the reference and 198 ms for the patch. The patched Python becomes 2% slower, oh no!</p> <p>Faster? Slower? Who is right?</p> <p>Maybe I should write a script to find a <tt class="docutils literal">PYTHONHASHSEED</tt> value for which my patch is always faster :-)</p> </div> <div class="section" id="command-line-and-environment-variables"> <h2>Command line and environment variables</h2> <p>Well, let's say that we will use a fixed PYTHONHASHSEED value. Anyway, my patch doesn't touch the hash function. So it doesn't matter.</p> <p>While running benchmarks, I noticed differences when running the benchmark from a different directory:</p> <pre class="literal-block"> $ cd /home/haypo/prog/python/fastcall $ PYTHONHASHSEED=3 taskset -c 1 pgo/python ../benchmarks/performance/bm_call_simple.py -n 1 0.215 $ cd /home/haypo/prog/python/benchmarks $ PYTHONHASHSEED=3 taskset -c 1 ../fastcall/pgo/python ../benchmarks/performance/bm_call_simple.py -n 1 0.203 $ cd /home/haypo/prog/python $ PYTHONHASHSEED=3 taskset -c 1 fastcall/pgo/python benchmarks/performance/bm_call_simple.py -n 1 0.200 </pre> <p>In fact, a different command line is enough so get different results (added arguments are ignored):</p> <pre class="literal-block"> $ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 0.201 $ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 arg1 0.198 $ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 arg1 arg2 arg3 0.203 $ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 arg1 arg2 arg3 arg4 arg5 0.206 $ PYTHONHASHSEED=3 taskset -c 1 ./python bm_call_simple.py -n 1 arg1 arg2 arg3 arg4 arg5 arg6 0.210 </pre> <p>I also noticed minor differences when the environment changes (added variables are ignored):</p> <pre class="literal-block"> $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py -n 1 0.201 $ taskset -c 1 env -i PYTHONHASHSEED=3 VAR1=1 VAR2=2 VAR3=3 VAR4=4 ./python bm_call_simple.py -n 1 0.202 $ taskset -c 1 env -i PYTHONHASHSEED=3 VAR1=1 VAR2=2 VAR3=3 VAR4=4 VAR5=5 ./python bm_call_simple.py -n 1 0.198 </pre> <p>Using <tt class="docutils literal">strace</tt> and <tt class="docutils literal">ltrace</tt>, I saw the memory addresses are different when something (command line, env var, etc.) changes.</p> </div> <div class="section" id="average-and-standard-deviation"> <h2>Average and standard deviation</h2> <p>Basically, it looks like a lot of &quot;external factors&quot; have an impact on the exact memory addresses, even if ASRL is disabled and PYTHONHASHSEED is set. I started to think how to get <em>exactly</em> the same command line, the same environment (easy), the same current directory (easy), etc. The problem is that it's just not possible to control all external factors (having an effect on the exact memory addresses).</p> <p>Maybe I was plain wrong from the beginning and ASLR must be enabled, as the default on Linux:</p> <pre class="literal-block"> $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py 0.198 $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py 0.202 $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py 0.199 $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py 0.207 $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py 0.200 $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py 0.201 </pre> <p>These results look &quot;random&quot;. Yes, they are. It's exactly the purpose of ASLR.</p> <p>But how can we compare performances if results are random? Take the minimum?</p> <p>No! You must never (ever again) use the minimum for benchmarking! Compute the average and some statistics like the standard deviation:</p> <pre class="literal-block"> $ python3 Python 3.4.3 &gt;&gt;&gt; timings=[0.198, 0.202, 0.199, 0.207, 0.200, 0.201] &gt;&gt;&gt; import statistics &gt;&gt;&gt; statistics.mean(timings) 0.2011666666666667 &gt;&gt;&gt; statistics.stdev(timings) 0.0031885210782848245 </pre> <p>On this example, the average is 201 ms +/- 3 ms. IMHO the standard deviation is quite small (reliable) which means that my benchmark is stable. To get a good distribution, it's better to have many samples. It looks like at least 25 processes are needed. Each process tests a different memory layout and a different hash function.</p> <p>Result of 5 runs, each run uses 25 processes (ASLR enabled, random hash function):</p> <ul class="simple"> <li>Average: 205.2 ms +/- 3.0 ms (min: 201.1 ms, max: 214.9 ms)</li> <li>Average: 205.6 ms +/- 3.3 ms (min: 201.4 ms, max: 216.5 ms)</li> <li>Average: 206.0 ms +/- 3.9 ms (min: 201.1 ms, max: 215.3 ms)</li> <li>Average: 205.7 ms +/- 3.6 ms (min: 201.5 ms, max: 217.8 ms)</li> <li>Average: 206.4 ms +/- 3.5 ms (min: 201.9 ms, max: 214.9 ms)</li> </ul> <p>While memory layout and hash functions are random again, the result looks <em>less</em> random, and so more reliable, than before!</p> <p>With ASLR enabled, the effect of the environment variables, command line and current directory is negligible on the (average) result.</p> </div> <div class="section" id="the-average-solves-issues-with-uniform-random-noises"> <h2>The average solves issues with uniform random noises</h2> <p>The user will run the application with default system settings which means ASLR enabled and Python hash function randomized. Running a benchmark in one specific environment is a mistake because it is not representative of the performance in practice.</p> <p>Computing the average and standard deviation &quot;fixes&quot; the issue with hash randomization. It's much better to use random hash functions and compute the average, than using a fixed hash function (setting <tt class="docutils literal">PYTHONHASHSEED</tt> variable to a value).</p> <p>Oh wow, already 3 big articles explaing how to get stable benchmarks. Please tell me that it was the last one! Nope, more is coming...</p> </div> <div class="section" id="annex-why-only-n1"> <h2>Annex: why only -n1?</h2> <p>In this article, I ran <tt class="docutils literal">bm_call_simple.py</tt> with <tt class="docutils literal"><span class="pre">-n</span> 1</tt> with only run one iteration.</p> <p>Usually, a single iteration is not reliable at all, at least 50 iterations are needed. But thanks to system tuning, compilation with PGO, ASRL disabled and <tt class="docutils literal">PYTHONHASHSEED</tt> set, a single iteration is enough.</p> <p>Example of 3 runs, each with 3 iterations:</p> <pre class="literal-block"> $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py -n 3 0.201 0.201 0.201 $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py -n 3 0.201 0.201 0.201 $ taskset -c 1 env -i PYTHONHASHSEED=3 ./python bm_call_simple.py -n 3 0.201 0.201 0.201 </pre> <p>Always the same timing!</p> </div> My journey to stable benchmark, part 2 (deadcode)2016-05-22T22:00:00+02:002016-05-22T22:00:00+02:00Victor Stinnertag:vstinner.github.io,2016-05-22:/journey-to-stable-benchmark-deadcode.html<p class="first last">My journey to stable benchmark, part 2 (deadcode)</p> <a class="reference external image-reference" href="https://www.flickr.com/photos/uw67/16875152403/"> <img alt="Snail" src="https://vstinner.github.io/images/snail.jpg" /> </a> <p>With <a class="reference external" href="https://vstinner.github.io/journey-to-stable-benchmark-system.html">the system tuning (part 1)</a>, I expected to get very stable benchmarks and so I started to benchmark seriously my <a class="reference external" href="https://bugs.python.org/issue26814">FASTCALL branch</a> of CPython (a new calling convention avoiding temporary tuples).</p> <p>I was disappointed to get many slowdowns in the CPython benchmark suite. I started to analyze why my change introduced performance regressions.</p> <p>I took my overall patch and slowly reverted more and more code to check which changes introduced most of the slowdowns.</p> <p>I focused on the <tt class="docutils literal">call_simple</tt> benchmark which does only one thing: call Python functions which do nothing. Making Python function calls slower would be a big and inacceptable mistake of my work.</p> <div class="section" id="linux-perf"> <h2>Linux perf</h2> <p>I started to learn how to use the great <a class="reference external" href="https://perf.wiki.kernel.org/index.php/Main_Page">Linux perf</a> tool to analyze why <tt class="docutils literal">call_simple</tt> was slower. I tried to find a major difference between my reference python and the patched python.</p> <p>I analyzed cache misses on L1 instruction and data caches. I analyzed stallen CPU cycles. I analyzed all memory events, branch events, etc. Basically, I tried all perf events and spent a lot of time to run benchmarks multiple times.</p> <p>By the way, I strongly suggest to use <tt class="docutils literal">perf stat</tt> using the <tt class="docutils literal"><span class="pre">--repeat</span></tt> command line option to get an average on multiple runs and see the standard deviation. It helps to get more reliable numbers. I even wrote a Python script implementing <tt class="docutils literal"><span class="pre">--repeat</span></tt> (run perf multiple times, parse the output), before seeing that it was already a builtin feature!</p> <p>Use <tt class="docutils literal">perf list</tt> to list all available (pre-defined) events.</p> <p>After many days, I decided to give up with perf.</p> </div> <div class="section" id="cachegrind"> <h2>Cachegrind</h2> <a class="reference external image-reference" href="http://valgrind.org/"> <img alt="Logo of the Valgrind project" src="https://vstinner.github.io/images/valgrind.png" /> </a> <p><a class="reference external" href="http://valgrind.org/">Valgrind</a> is a great tool known to detect memory leaks, but it also contains gems like the <a class="reference external" href="http://valgrind.org/docs/manual/cg-manual.html">Cachegrind tool</a> which <em>simulates</em> the CPU caches.</p> <p>I used Cachegrind with the nice <a class="reference external" href="http://kcachegrind.sourceforge.net/">Kcachegrind GUI</a>. Sadly, I also failed to see anything obvious in cache misses between the reference python and the patched python.</p> </div> <div class="section" id="strace-and-ltrace"> <h2>strace and ltrace</h2> <img alt="strace and ltrace" src="https://vstinner.github.io/images/strace_ltrace.png" /> <p>I also tried <tt class="docutils literal">strace</tt> and <tt class="docutils literal">ltrace</tt> tools to try to see a difference in the execution of the reference and the patched pythons. I saw different memory addresses, but no major difference which can explain a difference of the timing.</p> <p>Morever, the hotcode simply does not call any syscall nor library function. It's pure CPU-bound code.</p> </div> <div class="section" id="compiler-options"> <h2>Compiler options</h2> <a class="reference external image-reference" href="https://gcc.gnu.org/"> <img alt="GCC logo" class="align-right" src="https://vstinner.github.io/images/gcc.png" /> </a> <p>I used <a class="reference external" href="https://gcc.gnu.org/">GCC</a> to build to code. Just in case, I tried LLVM compiler, but it didn't &quot;fix&quot; the issue.</p> <p>I also tried different optimization levels: <tt class="docutils literal"><span class="pre">-O0</span></tt>, <tt class="docutils literal"><span class="pre">-O1</span></tt>, <tt class="docutils literal"><span class="pre">-O2</span></tt> and <tt class="docutils literal"><span class="pre">-O3</span></tt>.</p> <p>I read that the exact address of functions can have an impact on the CPU L1 cache: <a class="reference external" href="https://stackoverflow.com/questions/19470873/why-does-gcc-generate-15-20-faster-code-if-i-optimize-for-size-instead-of-speed">Why does gcc generate 15-20% faster code if I optimize for size instead of speed?</a>. I tried various values of the <tt class="docutils literal"><span class="pre">-falign-functions=N</span></tt> option (1, 2, 6, 12).</p> <p>I also tried <tt class="docutils literal"><span class="pre">-fomit-pointer</span></tt> (omit frame pointer) to record the callgraph with <tt class="docutils literal">perf record</tt>.</p> <p>I also tried <tt class="docutils literal"><span class="pre">-flto</span></tt>: Link Time Optimization (LTO).</p> <p>These compiler options didn't fix the issue.</p> <p>The truth is out there.</p> <p><strong>UPDATE:</strong> See also <a class="reference external" href="https://lwn.net/Articles/534735/">Rethinking optimization for size</a> article on Linux Weekly News (LWN): <em>&quot;Such an option has obvious value if one is compiling for a space-constrained environment like a small device. But it turns out that, in some situations, optimizing for space can also produce faster code.&quot;</em></p> </div> <div class="section" id="when-cpython-performance-depends-on-dead-code"> <h2>When CPython performance depends on dead code</h2> <p>I continued to revert changes. At the end, my giant patch was reduced to very few changes only adding code which was never called (at least, I was sure that it was not called in the <tt class="docutils literal">call_simple</tt> benchmark).</p> <p>Let me rephase: <em>adding dead code</em> makes Python slower. What?</p> <p>A colleague suggested me to remove the body (replace it with <tt class="docutils literal">return;</tt>) of added function: the code became faster. Ok, now I'm completely lost. To be clear, I don't expect that adding dead code would have <em>any</em> impact on the performance.</p> <p>My email <a class="reference external" href="https://mail.python.org/pipermail/speed/2016-April/000341.html">When CPython performance depends on dead code...</a> explains how to reproduce the issue and contains many information.</p> </div> <div class="section" id="solution-pgo"> <h2>Solution: PGO</h2> <p>The solution is called Profiled Guided Optimization, &quot;PGO&quot;. Python build system supports it in a single command: <tt class="docutils literal">make <span class="pre">profile-opt</span></tt>. It profiles the execution of the Python test suite.</p> <p>Using PGO, adding dead code has no more impact on the performance.</p> <p>With system tuning and PGO compilation, benchmarks must now be stable this time, no? ... No, sorry, not yet. We will see more sources of noise in following articles ;-)</p> </div> My journey to stable benchmark, part 1 (system)2016-05-21T16:50:00+02:002016-05-21T16:50:00+02:00Victor Stinnertag:vstinner.github.io,2016-05-21:/journey-to-stable-benchmark-system.html<p class="first last">My journey to stable benchmark, part 1</p> <div class="section" id="background"> <h2>Background</h2> <p>In the CPython development, it became common to require the result of the <a class="reference external" href="https://hg.python.org/benchmarks">CPython benchmark suite</a> (&quot;The Grand Unified Python Benchmark Suite&quot;) to evaluate the effect of an optimization patch. The minimum requirement is to not introduce performance regressions.</p> <p>I used the CPython benchmark suite and I had many bad surprises when trying to analyze (understand) results. A change expected to be faster makes some benchmarks slower without any obvious reason. At least, the change is expected to be faster on some specific benchmarks, but have no impact on the other benchmarks. The slowdown is usually between 5% and 10% slower. I am not confortable with any kind of slowdown.</p> <p>Many benchmarks look unstable. The problem is to trust the overall report. Some developers started to say that they learnt to ignore some benchmarks known to be unstable.</p> <p>It's not the first time that I am totally disappointed by microbenchmark results, so I decided to analyze completely the issue and go as deep as possible to really understand the problem.</p> </div> <div class="section" id="how-to-get-stable-benchmarks-on-a-busy-linux-system"> <h2>How to get stable benchmarks on a busy Linux system</h2> <p>A common advice to get stable benchmark is to stay away the keyboard (&quot;freeze!&quot;) and stop all other applications to only run one application, the benchmark.</p> <p>Well, I'm working on a single computer and the full CPython benchmark suite take up to 2 hours in rigorous mode. I just cannot stop working during 2 hours to wait for the result of the benchmark. I like running benchmarks locally. It is convenient to run benchmarks on the same computer used to develop.</p> <p>The goal here is to &quot;remove the noise of the system&quot;. Get the same result on a busy system than an idle system. My simple <a class="reference external" href="https://github.com/vstinner/misc/blob/master/bin/system_load.py">system_load.py</a> program can be used to increase the system load. For example, run <tt class="docutils literal">system_load.py 10</tt> in a terminal to get at least a system load of 10 (busy system) and run the benchmark in a different terminal. Use CTRL+c to stop <tt class="docutils literal">system_load.py</tt>.</p> </div> <div class="section" id="cpu-isolation"> <h2>CPU isolation</h2> <p>In 2016, it is common to get a CPU with multiple physical cores. For example, my Intel CPU has 4 physical cores and 8 logical cores thanks to <a class="reference external" href="https://en.wikipedia.org/wiki/Hyper-threading">Hyper-Threading</a>. It is possible to configure the Linux kernel to not schedule processes on some CPUs using the &quot;CPU isolation&quot; feature. It is the <tt class="docutils literal">isolcpus</tt> parameter of the Linux command line, the value is a list of CPUs. Example:</p> <pre class="literal-block"> isolcpus=2,3,6,7 </pre> <p>Check with:</p> <pre class="literal-block"> $ cat /sys/devices/system/cpu/isolated 2-3,6-7 </pre> <p>If you have Hyper-Threading, you must isolate the two logicial cores of each isolated physical core. You can use the <tt class="docutils literal">lscpu <span class="pre">--all</span> <span class="pre">--extended</span></tt> command to identify physical cores. Example:</p> <pre class="literal-block"> $ lscpu -a -e CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ 0 0 0 0 0:0:0:0 yes 5900,0000 1600,0000 1 0 0 1 1:1:1:0 yes 5900,0000 1600,0000 2 0 0 2 2:2:2:0 yes 5900,0000 1600,0000 3 0 0 3 3:3:3:0 yes 5900,0000 1600,0000 4 0 0 0 0:0:0:0 yes 5900,0000 1600,0000 5 0 0 1 1:1:1:0 yes 5900,0000 1600,0000 6 0 0 2 2:2:2:0 yes 5900,0000 1600,0000 7 0 0 3 3:3:3:0 yes 5900,0000 1600,0000 </pre> <p>The physical core <tt class="docutils literal">0</tt> (CORE column) is made of two logical cores (CPU column): <tt class="docutils literal">0</tt> and <tt class="docutils literal">4</tt>.</p> </div> <div class="section" id="nohz-mode"> <h2>NOHZ mode</h2> <p>By default, the Linux kernel uses a scheduling-clock which interrupts the running application <tt class="docutils literal">HZ</tt> times per second to run the scheduler. <tt class="docutils literal">HZ</tt> is usually between 100 and 1000: time slice between 1 ms and 10 ms.</p> <p>Linux supports a <a class="reference external" href="https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt">NOHZ mode</a> which is able to disable the scheduling-clock when the system is idle to reduce the power consumption. Linux 3.10 introduces a <a class="reference external" href="https://lwn.net/Articles/549580/">full ticketless mode</a>, NOHZ full, which is able to disable the scheduling-clock when only one application is running on a CPU.</p> <p>NOHZ full is disabled by default. It can be enabled with the <tt class="docutils literal">nohz_full</tt> parameter of the Linux command line, the value is a list of CPUs. Example:</p> <pre class="literal-block"> nohz_full=2,3,6,7 </pre> <p>Check with:</p> <pre class="literal-block"> $ cat /sys/devices/system/cpu/nohz_full 2-3,6-7 </pre> </div> <div class="section" id="interrupts-irq"> <h2>Interrupts (IRQ)</h2> <p>The Linux kernel can also be configured to not run <a class="reference external" href="https://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29">interruptions (IRQ)</a> handlers on some CPUs using <tt class="docutils literal">/proc/irq/default_smp_affinity</tt> and <tt class="docutils literal"><span class="pre">/proc/irq/&lt;number&gt;/smp_affinity</span></tt> files. The value is not a list of CPUs but a bitmask.</p> <p>The <tt class="docutils literal">/proc/interrupts</tt> file can be read to see the number of interruptions per CPU.</p> <p>Read the <a class="reference external" href="https://www.kernel.org/doc/Documentation/IRQ-affinity.txt">Linux SMP IRQ affinity</a> documentation.</p> </div> <div class="section" id="example-of-effect-of-cpu-isolation-on-a-microbenchmark"> <h2>Example of effect of CPU isolation on a microbenchmark</h2> <p>Example with Linux parameters:</p> <pre class="literal-block"> isolcpus=2,3,6,7 nohz_full=2,3,6,7 </pre> <p>Microbenchmark on an idle system (without CPU isolation):</p> <pre class="literal-block"> $ python3 -m timeit 'sum(range(10**7))' 10 loops, best of 3: 229 msec per loop </pre> <p>Result on a busy system using <tt class="docutils literal">system_load.py 10</tt> and <tt class="docutils literal">find /</tt> commands running in other terminals:</p> <pre class="literal-block"> $ python3 -m timeit 'sum(range(10**7))' 10 loops, best of 3: 372 msec per loop </pre> <p>The microbenchmark is 56% slower because of the high system load!</p> <p>Result on the same busy system but using isolated CPUs. The <tt class="docutils literal">taskset</tt> command allows to pin an application to specific CPUs:</p> <pre class="literal-block"> $ taskset -c 1,3 python3 -m timeit 'sum(range(10**7))' 10 loops, best of 3: 230 msec per loop </pre> <p>Just to check, new run without CPU isolation:</p> <pre class="literal-block"> $ python3 -m timeit 'sum(range(10**7))' 10 loops, best of 3: 357 msec per loop </pre> <p>The result with CPU isolation on a busy system is the same than the result an idle system! CPU isolation removes most of the noise of the system.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>Great job Linux!</p> <p>Ok! Now, the benchmark is super stable, no? ... Sorry, no, it's not stable yet. I found a lot of other sources of &quot;noise&quot;. We will see them in the following articles ;-)</p> </div> Status of Python 3 in OpenStack Mitaka2016-03-02T14:00:00+01:002016-03-02T14:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-03-02:/openstack_mitaka_python3.html<p class="first last">Status of Python 3 in OpenStack Mitaka</p> <p>Now that most OpenStack services have reached feature freeze for the Mitaka cycle (November 2015-April 2016), it's time to look back on the progress made for Python 3 support.</p> <p>Previous status update: <a class="reference external" href="http://techs.enovance.com/7807/python-3-status-openstack-liberty">Python 3 Status in OpenStack Liberty</a> (September 2015).</p> <div class="section" id="services-ported-to-python-3"> <h2>Services ported to Python 3</h2> <p>13 services were ported to Python 3 during the Mitaka cycle:</p> <ul class="simple"> <li>Cinder</li> <li>Congress</li> <li>Designate</li> <li>Glance</li> <li>Heat</li> <li>Horizon</li> <li>Manila</li> <li>Mistral</li> <li>Octavia</li> <li>Searchlight</li> <li>Solum</li> <li>Watcher</li> <li>Zaqar</li> </ul> <p>Red Hat contributed to the Cinder, Designate, Glance and Horizon service porting efforts.</p> <p>&quot;Ported to Python 3&quot; means that all unit tests pass on Python 3.4 which is verified by a voting gate job. It is not enough to run applications in production with Python 3. Integration and functional tests are not run on Python 3 yet. See the section dedicated to these tests below.</p> <p>See the <a class="reference external" href="https://wiki.openstack.org/wiki/Python3">Python 3 wiki page</a> for the current status of the OpenStack port to Python 3; especially the list of services ported to Python 3.</p> </div> <div class="section" id="services-not-ported-yet"> <h2>Services not ported yet</h2> <p>It's become easier to list services which are not compatible with Python 3 than listing services already ported to Python 3!</p> <p>9 services still need to be ported:</p> <ul class="simple"> <li>Work-in-progress:<ul> <li>Magnum: 83% (959 unit tests/1,161)</li> <li>Cue: 81% (208 unit tests/257)</li> <li>Nova: 74% (10,859 unit tests/14,726)</li> <li>Barbican: 34% (392 unit tests/1168)</li> <li>Murano: 29% (133 unit tests/455)</li> <li>Keystone: 27% (1200 unit tests/4455)</li> <li>Swift: 0% (3 unit tests/4,435)</li> <li>Neutron-LBaaS: 0% (1 unit test/806)</li> </ul> </li> <li>Port not started yet:<ul> <li>Trove: no python34 gate</li> </ul> </li> </ul> <p>Red Hat contributed Python 3 patches to Cue, Neutron-LBaaS, Swift and Trove during the Mitaka cycle.</p> <p>Trove developers are ready to start the port at the beginning of the next cycle (Newton). The py34 test environment was blocked by the MySQL-Python dependency (it was not possible to build the test environment), but this dependency is now skipped on Python 3. Later, it will be <a class="reference external" href="https://review.openstack.org/#/c/225915/">replaced with PyMySQL</a> on Python 2 and Python 3.</p> </div> <div class="section" id="python-3-issues-in-eventlet"> <h2>Python 3 issues in Eventlet</h2> <p>Four Python 3 issues were fixed in Eventlet:</p> <ul class="simple"> <li><a class="reference external" href="https://github.com/eventlet/eventlet/issues/295">Issue #295: Python 3: wsgi doesn't handle correctly partial write of socket send() when using writelines()</a></li> <li>PR #275: <a class="reference external" href="https://github.com/eventlet/eventlet/pull/275">Issue #274: Fix GreenSocket.recv_into()</a>. Issue: <a class="reference external" href="https://github.com/eventlet/eventlet/issues/274">On Python 3, sock.makefile('rb').readline() doesn't handle blocking errors correctly</a></li> <li>PR #257: <a class="reference external" href="https://github.com/eventlet/eventlet/pull/257">Fix GreenFileIO.readall() for regular file</a></li> <li><a class="reference external" href="https://github.com/eventlet/eventlet/issues/248">Issue #248: eventlet.monkey_patch() on Python 3.4 makes stdout non-blocking</a>: pull request <a class="reference external" href="https://github.com/eventlet/eventlet/pull/250">Fix GreenFileIO.write()</a></li> </ul> </div> <div class="section" id="next-milestone-functional-and-integration-tests"> <h2>Next Milestone: Functional and integration tests</h2> <p>The next major milestone will be to run functional and integration tests on Python 3.</p> <ul class="simple"> <li>functional tests are restricted to one component (ex: only Glance)</li> <li>integration tests, like Tempest, test the integration of multiple components</li> </ul> <p>It is now possible to install some packages on Python 3 in DevStack using <tt class="docutils literal">USE_PYTHON3</tt> and <tt class="docutils literal">PYTHON3_VERSION</tt> variables: <a class="reference external" href="https://review.openstack.org/#/c/181165/">Enable optional Python 3 support</a>. It means that it is possible to run tests with some services running on Python 3, and the remaining services on Python 2.</p> <p>The port to Python 3 of Glance, Heat and Neutron functional and integration tests have already started.</p> <p>For Glance, 159 functional tests already pass on Python 3.4.</p> <p>Heat:</p> <ul class="simple"> <li>project-config: <a class="reference external" href="https://review.openstack.org/#/c/228194/">Add python34 integration test job for Heat</a> (WIP)</li> <li>heat: <a class="reference external" href="https://review.openstack.org/#/c/188033/">py34: integration tests</a> (WIP)</li> </ul> <p>Neutron: the <a class="reference external" href="https://review.openstack.org/#/c/231897/">Add the functional-py34 and dsvm-functional-py34 targets to tox.ini</a> change was merged, but a gate job hasn't been added for it yet.</p> <p>Another pending project is to fix issues specific to Python 3.5, but the gate doesn’t use Python 3.5 yet. There are some minor issues, probably easy to fix.</p> </div> <div class="section" id="how-to-port-remaining-code"> <h2>How to port remaining code?</h2> <p>The <a class="reference external" href="https://wiki.openstack.org/wiki/Python3">Python 3 wiki page</a> contains a lot of information about adding Python 3 support to Python 2 code.</p> <p>Join us in the <tt class="docutils literal"><span class="pre">#openstack-python3</span></tt> IRC channel on Freenode to discuss Python 3!</p> </div> Fast _PyAccu, _PyUnicodeWriter and_PyBytesWriter APIs to produce strings in CPython2016-03-01T16:00:00+01:002016-03-01T16:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-03-01:/pybyteswriter.html<p class="first last">_PyBytesWriter API</p> <p>This article described the _PyBytesWriter and _PyUnicodeWriter private APIs of CPython. These APIs are design to optimize code producing strings when the ouput size is not known in advance.</p> <p>I created the _PyUnicodeWriter API to reply to complains that Python 3 was much slower than Python 2, especially with the new Unicode implementation (PEP 393).</p> <div class="section" id="pyaccu-api"> <h2>_PyAccu API</h2> <p>Issue #12778: In 2011, Antoine Pitrou found a performance issue in the JSON serializer when serializing many small objects: it used way too much memory for temporary objects compared to the final output string.</p> <p>The JSON serializer used a list of strings and joined all strings at the end of create a final output string. Pseudocode:</p> <pre class="literal-block"> def serialize(): pieces = [serialize(item) for item in self] return ''.join(pieces) </pre> <p>Antoine introduced an accumulator compacting the temporary list of &quot;small&quot; strings and put the result in a second list of &quot;large&quot; strings. At the end, the list of &quot;large&quot; strings was also compacted to build the final output string. Pseudo-code:</p> <pre class="literal-block"> def serialize(): small = [] large = [] for item in self: small.append(serialize(item)) if len(small) &gt; 10000: large.append(''.join(small)) small.clear() if small large.append(''.join(small)) return ''.join(large) </pre> <p>The threshold of 10,000 strings is justified by this comment:</p> <pre class="literal-block"> /* Each item in a list of unicode objects has an overhead (in 64-bit * builds) of: * - 8 bytes for the list slot * - 56 bytes for the header of the unicode object * that is, 64 bytes. 100000 such objects waste more than 6MB * compared to a single concatenated string. */ </pre> <p>Issue #12911: Antoine Pitrou found a similar performance issue in repr(list), and so proposed to convert its accumular code into a new private _PyAccu API. He added the _PyAccu API to Python 2.7.5 and 3.2.3. Title of te repr(list) change: &quot;Fix memory consumption when calculating the repr() of huge tuples or lists&quot;.</p> </div> <div class="section" id="the-pyunicodewriter-api"> <h2>The _PyUnicodeWriter API</h2> <div class="section" id="inefficient-implementation-of-the-pep-393"> <h3>Inefficient implementation of the PEP 393</h3> <p>In 2010, Python 3.3 got a completly new Unicode implementation, the Python type <tt class="docutils literal">str</tt>, with the PEP 393. The implementation of the PEP was the topic of a Google Summer of Code 2011 with the student Torsten Becker menthored by Martin v. Löwis (author of the PEP). The project was successful: the PEP 393 was implemented, it worked!</p> <p>The first implementation of the PEP 393 used a lot of 32-bit character buffers (<tt class="docutils literal">Py_UCS4</tt>) which uses a lot of memory and requires expensive conversion to 8-bit (<tt class="docutils literal">Py_UCS1</tt>, ASCII and Latin1) or 16-bit (<tt class="docutils literal">Py_UCS2</tt>, BMP) characters.</p> <p>The new internal structures for Unicode strings are now very complex and require to be smart when building a new string to avoid memory copies. I created the _PyUnicodeWriter API to try to reduce expensive memory copies, and even completly avoid memory copies in best cases.</p> </div> <div class="section" id="design-of-the-pyunicodewriter-api"> <h3>Design of the _PyUnicodeWriter API</h3> <p>According to benchmarks, creating a <tt class="docutils literal">Py_UCS1*</tt> buffer and then expand it to <tt class="docutils literal">Py_UCS2*</tt> or <tt class="docutils literal">Py_UCS4*</tt> is more efficient, since <tt class="docutils literal">Py_UCS1*</tt> is the most common format.</p> <p>Python <tt class="docutils literal">str</tt> type is used for a wide range of usages. For example, it is used for the name of variable names in the Python language itself. Variable names are almost always ASCII.</p> <p>The worst case for _PyUnicodeWriter is when a long <tt class="docutils literal">Py_UCS1*</tt> buffer must be converted to <tt class="docutils literal">Py_UCS2*</tt>, and then converted to <tt class="docutils literal">Py_UCS4*</tt>. Each conversion is expensive: need to allocate a second memory block and convert characters to the new format.</p> <p>_PyUnicodeWriter features:</p> <ul class="simple"> <li>Optional overallocation: overallocate the buffer by 50% on Windows and 25% on Linux. The ratio changes depending on the OS, it is a raw heuristic to get the best performances depending on the <tt class="docutils literal">malloc()</tt> memory allocator.</li> <li>The buffer can be a shared read-only string if the buffer was only created from a single string. Micro-optimization for <tt class="docutils literal">&quot;%s&quot; % str</tt>.</li> </ul> <p>The API allows to disable overallocation before the last write. For example, <tt class="docutils literal">&quot;%s%s&quot; % ('abc', 'def')</tt> disables the overallocation before writing <tt class="docutils literal">'def'</tt>.</p> <p>The _PyUnicodeWriter was introduced by the issue #14716 (change 7be716a47e9d):</p> <blockquote> Close #14716: str.format() now uses the new &quot;unicode writer&quot; API instead of the PyAccu API. For example, it makes str.format() from 25% to 30% faster on Linux.</blockquote> </div> <div class="section" id="fast-path-for-ascii"> <h3>Fast-path for ASCII</h3> <p>The cool and <em>unexpected</em> side-effect of the _PyUnicodeWriter is that many intermediate operations got a fast-path for <tt class="docutils literal">Py_UCS1*</tt>, especially for ASCII strings. For example, padding a number with spaces on <tt class="docutils literal">'%10i' % 123</tt> is implemented with <tt class="docutils literal">memset()</tt>.</p> <p>Formating a floating point number uses the <tt class="docutils literal">PyOS_double_to_string()</tt> function which creates an ASCII buffer. If the writer buffer uses Py_UCS1, a <tt class="docutils literal">memcpy()</tt> is enough to copy the formatted number.</p> </div> <div class="section" id="avoid-temporary-buffers"> <h3>Avoid temporary buffers</h3> <p>Since the beginning, I had the idea of avoiding temporary buffers thanks to an unified API to handle a &quot;Unicode buffer&quot;. Slowly, I spread my changes to all functions producing Unicode strings.</p> <p>The obvious target were <tt class="docutils literal">str % args</tt> and <tt class="docutils literal">str.format(args)</tt>. Both instructions use very different code, but it was possible to share a few functions especially the code to format integers in bases 2 (binary), 8 (octal), 10 (decimal) and 16 (hexadecimal).</p> <p>The function formatting an integer computes the exact size of the output, requests a number of characters and then write characters. The characters are written directly in the writer buffer. No temporary memory block is needed anymore, and moreover no Py_UCS conversion is need: <tt class="docutils literal">_PyLong_Format()</tt> writes directly characters into the character format (PyUCS1, Py_UCS2 or Py_UCS4) of the buffer.</p> </div> <div class="section" id="performance-compared-to-python-2"> <h3>Performance compared to Python 2</h3> <p>The PEP 393 uses a complex storage for strings, so the exact performances now depends on the character set used in the benchmark. For benchmarks using a character set different than ASCII, the result are more tricky to understand.</p> <p>To compare performances with Python 2, I focused my benchmarks on ASCII. I compared Python 3 str with Python 2 unicode, but also sometimes to Python 2 str (bytes). On ASCII, Python 3.3 was as fast as Python 2, or even faster on some very specific cases, but these cases are probably artificial and never seen in real applications.</p> <p>In the best case, Python 3 str (Unicode) was faster than Python 2 bytes.</p> </div> </div> <div class="section" id="pybyteswriter-api-first-try-big-fail"> <h2>_PyBytesWriter API: first try, big fail</h2> <p>Since Python was <em>much</em> faster with _PyUnicodeWriter, I expected to get good speedup with a similar API for bytes. The graal would be to share code for bytes and Unicode (Spoiler alert! I reached this goal, but only for a single function: format an integer to decimal).</p> <p>My first attempt of a _PyBytesWriter API was in 2013: <a class="reference external" href="https://bugs.python.org/issue17742">Issue #17742: Add _PyBytesWriter API</a>. But quickly, I noticed with microbenchmarks that my change made Python slower! I spent hours to understand why GCC produced less efficient machine code. When I started to dig the &quot;strict aliasing&quot; optimization issue, I realized that I reached a deadend.</p> <p>Extract of the _PyBytesWriter structure:</p> <pre class="literal-block"> typedef struct { /* Current position in the buffer */ char *str; /* Start of the buffer */ char *start; /* End of the buffer */ char *end; ... } _PyBytesWriter; </pre> <p>The problem is that GCC emited less efficient machine code for the C code (see my <a class="reference external" href="https://bugs.python.org/issue17742#msg187595">msg187595</a>):</p> <pre class="literal-block"> while (collstart++&lt;collend) *writer.str++ = '?'; </pre> <p>For the <tt class="docutils literal">writer.str++</tt> instruction, the new pointer value is written immediatly in the structure. The pointer value is read again at each iteration. So we have 1 LOAD and 1 STORE per iteration.</p> <p>GCC emits better code for the original C code:</p> <pre class="literal-block"> while (collstart++&lt;collend) *str++ = '?'; </pre> <p>The <tt class="docutils literal">str</tt> variable is stored in a register and the new value of <tt class="docutils literal">str</tt> is only written <em>once</em>, at the end of loop (instead of writing it at each iteration). The pointer value is <em>only read once</em> before the loop. So we have 0 LOAD and 0 STORE (related to the pointer value) in the loop body.</p> <p>It looks like an aliasing issue, but I didn't find how to say to GCC that the new value of <tt class="docutils literal">writer.str</tt> can be written only once at the end of the loop. I tried the <tt class="docutils literal">__restrict__</tt> keyword: the LOAD (get the pointer value) was moved out of the loop. But the STORE was still in the loop body.</p> <p>I wrote to gcc-help: <a class="reference external" href="https://gcc.gnu.org/ml/gcc-help/2013-04/msg00192.html">Missed optimization when using a structure</a>, but I didn't get any reply. I just gave up.</p> </div> <div class="section" id="pybyteswriter-api-new-try-the-good-one"> <h2>_PyBytesWriter API: new try, the good one</h2> <p>In 2015, I created the <a class="reference external" href="https://bugs.python.org/issue25318">Issue #25318: Add _PyBytesWriter API to optimize Unicode encoders</a>. I redesigned the API to avoid the aliasing issue.</p> <p>The new _PyBytesWriter doesn't contain the <tt class="docutils literal">char*</tt> pointers anymore: they are now local variables in functions. Instead, functions of API requires two parameters: the bytes writer and a <tt class="docutils literal">char*</tt> parameter. Example:</p> <pre class="literal-block"> PyObject * _PyBytesWriter_Finish(_PyBytesWriter *writer, char *str) </pre> <p>The idea is to keep <tt class="docutils literal">char*</tt> pointers in functions to keep the most efficient machine code in loops. The compiler doesn't have to compute complex aliasing rules to decide if a CPU register can be used or not.</p> <p>_PyBytesWriter features:</p> <ul class="simple"> <li>Optional overallocation: overallocate the buffer by 25% on Windows and 50% on Linux. Same idea than _PyUnicodeWriter.</li> <li>Support <tt class="docutils literal">bytes</tt> and <tt class="docutils literal">bytearray</tt> type as output format to avoid an expensive memory copy from <tt class="docutils literal">bytes</tt> to <tt class="docutils literal">bytearray</tt>.</li> <li>Small buffer of 512 bytes allocated on the stack to avoid the need of a buffer allocated on the heap, before creating the final <tt class="docutils literal">bytes</tt>/<tt class="docutils literal">bytearray</tt> object.</li> </ul> <p>A _PyBytesWriter structure must always be allocated on the stack (to get fast memory allocation of the smaller buffer).</p> <p>While _PyUnicodeWriter has a 5 functions and 1 macro to write a single character, write strings, write a substring, etc. _PyBytesWriter has a single _PyBytesWriter_WriteBytes() function to write a string, since all other writes are done directly with regular C code on <tt class="docutils literal">char*</tt> pointers.</p> <p>The API itself doesn't make the code faster. Disabling overallocation on the last write and the usage of the small buffer allocated on the stack may be faster.</p> <p>In Python 3.6, I optimized error handlers on various codecs: ASCII, Latin1 and UTF-8. For example, the UTF-8 encoder is now up to 75 times as fast for error handlers: <tt class="docutils literal">ignore</tt>, <tt class="docutils literal">replace</tt>, <tt class="docutils literal">surrogateescape</tt>, <tt class="docutils literal">surrogatepass</tt>. The <tt class="docutils literal">bytes % int</tt> instruction became between 30% and 50% faster on a microbenchmark.</p> <p>Later, I replaced <tt class="docutils literal">char*</tt> type with <tt class="docutils literal">void*</tt> to avoid compiler warnings in functions using <tt class="docutils literal">Py_UCS1*</tt> or <tt class="docutils literal">unsigned char*</tt>, unsigned types.</p> </div> My contributions to CPython during 2015 Q42016-03-01T15:00:00+01:002016-03-01T15:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-03-01:/contrib-cpython-2015q4.html<p class="first last">My contributions to CPython during 2015 Q4</p> <p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2015 Q4 (october, november, december):</p> <pre class="literal-block"> hg log -r 'date(&quot;2015-10-01&quot;):date(&quot;2015-12-31&quot;)' --no-merges -u Stinner </pre> <p>Statistics: 100 non-merge commits + 25 merge commits (total: 125 commits).</p> <p>As usual, I pushed changes of various contributors and helped them to polish their change.</p> <p>I fighted against a recursion error, a regression introduced by my recent work on the Python test suite.</p> <p>I focused on optimizing the bytes type during this quarter. It started with the issue #24870 opened by <strong>INADA Naoki</strong> who works on PyMySQL: decoding bytes using the surrogateescape error handler was the bottleneck of this benchmark. For me, it was an opportunity for a new attempt to implement a fast &quot;bytes writer API&quot;.</p> <p>I pushed my first change related to <a class="reference external" href="http://faster-cpython.readthedocs.org/fat_python.html">FAT Python</a>! Fix parser and AST: fill lineno and col_offset of &quot;arg&quot; node when compiling AST from Python objects.</p> <p>Previous report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2015q3.html">My contributions to CPython during 2015 Q3</a>. Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2016q1.html">My contributions to CPython during 2016 Q1</a>.</p> <div class="section" id="recursion-error"> <h2>Recursion error</h2> <div class="section" id="the-bug-issue-25274"> <h3>The bug: issue #25274</h3> <p>During the previous quarter, I refactored Lib/test/regrtest.py huge file (1,600 lines) into a new Lib/test/libregrtest/ library (8 files). The problem is that test_sys started to crash with &quot;Fatal Python error: Cannot recover from stack overflow&quot; on test_recursionlimit_recovery(). The regression was introduced by a change on regrtest which indirectly added one more Python frame in the code executing test_sys.</p> <p>CPython has a limit on the depth of a call stack: <tt class="docutils literal">sys.getrecursionlimit()</tt>, 1000 by default. The limit is a weak protection against overflow of the C stack. Weak because it only counts Python frames, intermediate C functions may allocate a lot of memory on the stack.</p> <p>When we reach the limit, an &quot;overflow&quot; flag is set, but we still allow up to limit+50 frames, because handling a RecursionError may need a few more frames. The overflow flag is cleared when the stack level goes below a &quot;low-water mark&quot;.</p> <p>After the regrtest change, test_recursionlimit_recovery() was called at stack level 36. Before, it was called at level 35. The test triggers a RecursionError. The problem is that we never goes again below the low-water mark, so the overflow flag is never cleared.</p> </div> <div class="section" id="the-fix"> <h3>The fix</h3> <p>Another problem is that the function used to compute the &quot;low-level mark&quot; was not monotonic:</p> <pre class="literal-block"> if limit &gt; 100: low_water_mark = limit - 50 else: low_water_mark = 3 * limit // 4 </pre> <p>The gap occurs near a limit of 100 frames:</p> <ul class="simple"> <li>limit = 99 =&gt; low_level_mark = 74</li> <li>limit = 100 =&gt; low_level_mark = 75</li> <li>limit = 101 =&gt; low_level_mark = 51</li> </ul> <p>The formula was replaced with:</p> <pre class="literal-block"> if limit &gt; 200: low_water_mark = limit - 50 else: low_water_mark = 3 * limit // 4 </pre> <p>The fix (<a class="reference external" href="https://hg.python.org/cpython/rev/eb0c76442cee">change eb0c76442cee</a>) modified the <tt class="docutils literal">sys.setrecursionlimit()</tt> function to raise a <tt class="docutils literal">RecursionError</tt> exception if the new limit is too low depending on the <em>current</em> stack depth.</p> </div> </div> <div class="section" id="optimizations"> <h2>Optimizations</h2> <p>As usual for performance, Serhiy Storchaka was very helpful on reviews, to run independant benchmarks, etc.</p> <p>Optimizations on the <tt class="docutils literal">bytes</tt> type, ASCII, Latin1 and UTF-8 codecs:</p> <ul class="simple"> <li>Issue #25318: Add _PyBytesWriter API. Add a new private API to optimize Unicode encoders. It uses a small buffer of 512 bytes allocated on the stack and supports configurable overallocation.</li> <li>Use _PyBytesWriter API for UCS1 (ASCII and Latin1) and UTF-8 encoders. Enable overallocation for the UTF-8 encoder with error handlers.</li> <li>unicode_encode_ucs1(): initialize collend to collstart+1 to not check the current character twice, we already know that it is not ASCII.</li> <li>Issue #25267: The UTF-8 encoder is now up to 75 times as fast for error handlers: <tt class="docutils literal">ignore</tt>, <tt class="docutils literal">replace</tt>, <tt class="docutils literal">surrogateescape</tt>, <tt class="docutils literal">surrogatepass</tt>. Patch co-written with <strong>Serhiy Storchaka</strong>.</li> <li>Issue #25301: The UTF-8 decoder is now up to 15 times as fast for error handlers: <tt class="docutils literal">ignore</tt>, <tt class="docutils literal">replace</tt> and <tt class="docutils literal">surrogateescape</tt>.</li> <li>Issue #25318: Optimize backslashreplace and xmlcharrefreplace error handlers in UTF-8 encoder. Optimize also backslashreplace error handler for ASCII and Latin1 encoders.</li> <li>Issue #25349: Optimize bytes % args using the new private _PyBytesWriter API</li> <li>Optimize error handlers of ASCII and Latin1 encoders when the replacement string is pure ASCII: use _PyBytesWriter_WriteBytes(), don't check individual character.</li> <li>Issue #25349: Optimize bytes % int. Formatting is between 30% and 50% faster on a microbenchmark.</li> <li>Issue #25357: Add an optional newline paramer to binascii.b2a_base64(). base64.b64encode() uses it to avoid a memory copy.</li> <li>Issue #25353: Optimize unicode escape and raw unicode escape encoders: use the new _PyBytesWriter API.</li> <li>Rewrite PyBytes_FromFormatV() using _PyBytesWriter API</li> <li>Issue #25399: Optimize bytearray % args. Most formatting operations are now between 2.5 and 5 times faster.</li> <li>Issue #25401: Optimize bytes.fromhex() and bytearray.fromhex(): they are now between 2x and 3.5x faster.</li> </ul> </div> <div class="section" id="changes"> <h2>Changes</h2> <ul class="simple"> <li>Issue #25003: On Solaris 11.3 or newer, os.urandom() now uses the getrandom() function instead of the getentropy() function. The getentropy() function is blocking to generate very good quality entropy, os.urandom() doesn't need such high-quality entropy.</li> <li>Issue #22806: Add <tt class="docutils literal">python <span class="pre">-m</span> test <span class="pre">--list-tests</span></tt> command to list tests.</li> <li>Issue #25670: Remove duplicate getattr() in ast.NodeTransformer</li> <li>Issue #25557: Refactor _PyDict_LoadGlobal(). Don't fallback to PyDict_GetItemWithError() if the hash is unknown: compute the hash instead. Add also comments to explain the _PyDict_LoadGlobal() optimization.</li> <li>Issue #25868: Try to make test_eintr.test_sigwaitinfo() more reliable especially on slow buildbots</li> </ul> </div> <div class="section" id="changes-specific-to-python-2-7"> <h2>Changes specific to Python 2.7</h2> <ul class="simple"> <li>Closes #25742: locale.setlocale() now accepts a Unicode string for its second parameter.</li> </ul> </div> <div class="section" id="bugfixes"> <h2>Bugfixes</h2> <ul class="simple"> <li>Fix regrtest --coverage on Windows</li> <li>Fix pytime on OpenBSD</li> <li>More fixes for test_eintr on FreeBSD</li> <li>Close #25373: Fix regrtest --slow with interrupted test</li> <li>Issue #25555: Fix parser and AST: fill lineno and col_offset of &quot;arg&quot; node when compiling AST from Python objects. First contribution related to FAT Python ;-)</li> <li>Issue #25696: Fix installation of Python on UNIX with make -j9.</li> </ul> </div> My contributions to CPython during 2015 Q32016-02-18T01:00:00+01:002016-02-18T01:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-02-18:/contrib-cpython-2015q3.html<p class="first last">My contributions to CPython during 2015 Q3</p> <p>A few years ago, someone asked me: &quot;Why do you contribute to CPython? Python is perfect, there are no more bugs, right?&quot;. The article list most of my contributions to CPython during 2015 Q3 (july, august, september). It gives an idea of which areas of Python are not perfect yet :-)</p> <p>My contributions to <a class="reference external" href="https://www.python.org/">CPython</a> during 2015 Q3 (july, august, september):</p> <pre class="literal-block"> hg log -r 'date(&quot;2015-07-01&quot;):date(&quot;2015-09-30&quot;)' --no-merges -u Stinner </pre> <p>Statistics: 153 non-merge commits + 75 merge commits (total: 228 commits).</p> <p>The major event in Python of this quarter was the release of Python 3.5.0.</p> <p>As usual, I helped various contributors to refine their changes and I pushed their final changes.</p> <p>Next report: <a class="reference external" href="https://vstinner.github.io/contrib-cpython-2015q4.html">My contributions to CPython during 2015 Q4</a>.</p> <div class="section" id="freebsd-kernel-bug"> <h2>FreeBSD kernel bug</h2> <p>It took me a while to polish the implementation of the <a class="reference external" href="https://www.python.org/dev/peps/pep-0475/">PEP 475 (retry syscall on EINTR)</a> especially its unit test <tt class="docutils literal">test_eintr</tt>. The unit test is supposed to test Python, but as usual, it also tests indirectly the operating system.</p> <p>I spent some days investigating a random hang on the FreeBSD buildbots: <a class="reference external" href="https://bugs.python.org/issue25122">issue #25122</a>. I quickly found the guilty test (test_eintr.test_open), but it took me a while to understand that it was a kernel bug in the FIFO driver. Hopefully at the end, I was able to reproduce the bug with a short C program in my FreeBSD VM. It is the best way to ask a fix upstream.</p> <p>My <a class="reference external" href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203162">FreeBSD bug report #203162</a> (&quot;when close(fd) on a fifo fails with EINTR, the file descriptor is not really closed&quot;) was quickly fixed. The FreeBSD team is reactive!</p> <p>I like free softwares because it's possible to investigate bugs deep in the code, and it's usually quick to get a fix.</p> </div> <div class="section" id="timestamp-rounding-issue"> <h2>Timestamp rounding issue</h2> <p>Even if the <a class="reference external" href="http://bugs.python.org/issue23517">issue #23517</a> is well defined and simple to fix, it took me days (weeks?) to understand exactly how timestamps are supposed to be rounded and agree on the &quot;right&quot; rounding method. Alexander Belopolsky reminded me the important property:</p> <pre class="literal-block"> (datetime(1970,1,1) + timedelta(seconds=t)) == datetime.utcfromtimestamp(t) </pre> <p>Tim Peters helped me to understand why Python rounds to nearest with ties going away from zero (ROUND_HALF_UP) in <tt class="docutils literal">round(float)</tt> and other functions. At the first look, the rounding method doesn't look natural nor logical:</p> <pre class="literal-block"> &gt;&gt;&gt; round(0.5) 0 &gt;&gt;&gt; round(1.5) 2 </pre> <p>See my previous article on the _PyTime API for the long story of rounding methods between Python 3.2 and Python 3.6: <a class="reference external" href="https://vstinner.github.io/pytime.html">History of the Python private C API _PyTime</a>.</p> </div> <div class="section" id="enhancements"> <h2>Enhancements</h2> <ul class="simple"> <li>type_call() now detect C bugs in type __new__() and __init__() methods.</li> <li>Issue #25220: Enhancements of the test runner: add more info when regrtest runs tests in parallel, fix some features of regrtest, add functional tests to test_regrtest.</li> </ul> </div> <div class="section" id="optimizations"> <h2>Optimizations</h2> <ul class="simple"> <li>Issue #25227: Optimize ASCII and latin1 encoders with the <tt class="docutils literal">surrogateescape</tt> error handler: the encoders are now up to 3 times as fast.</li> </ul> </div> <div class="section" id="changes"> <h2>Changes</h2> <ul class="simple"> <li>Polish the implementation of the PEP 475 (retry syscall on EINTR)</li> <li>Work on the &quot;What's New in Python 3.5&quot; document: add my changes (PEP 475, socket timeout, os.urandom)</li> <li>Work on asyncio: fix ResourceWarning warnings, fixes specific to Windows</li> <li>test_time: rewrite rounding tests of the private pytime API</li> <li>Issue #24707: Remove an assertion in monotonic clock. Don't check anymore at runtime that the monotonic clock doesn't go backward. Yes, it happens! It occurs sometimes each month on a Debian buildbot slave running in a VM.</li> <li>test_eintr: replace os.fork() with subprocess (fork+exec) to make the test more reliable</li> </ul> </div> <div class="section" id="changes-specific-to-python-2-7"> <h2>Changes specific to Python 2.7</h2> <ul class="simple"> <li>Backport python-gdb.py changes: enhance py-bt command</li> <li>Issue #23375: Fix test_py3kwarn for modules implemented in C</li> </ul> </div> <div class="section" id="bug-fixes"> <h2>Bug fixes</h2> <ul class="simple"> <li>Closes #23247: Fix a crash in the StreamWriter.reset() of CJK codecs</li> <li>Issue #24732, #23834: Fix sock_accept_impl() on Windows. Regression of the PEP 475 (retry syscall on EINTR)</li> <li>test_gdb: fix regex to parse the GDB version and fix ResourceWarning on error</li> <li>Fix test_warnings: don't modify warnings.filters to fix random failures of the test.</li> <li>Issue #24891: Fix a race condition at Python startup if the file descriptor of stdin (0), stdout (1) or stderr (2) is closed while Python is creating sys.stdin, sys.stdout and sys.stderr objects.</li> <li>Issue #24684: socket.socket.getaddrinfo() now calls PyUnicode_AsEncodedString() instead of calling the encode() method of the host, to handle correctly custom string with an encode() method which doesn't return a byte string. The encoder of the IDNA codec is now called directly instead of calling the encode() method of the string.</li> <li>Issue #25118: Fix a regression of Python 3.5.0 in os.waitpid() on Windows. Add an unit test on os.waitpid()</li> <li>Issue #25122: Fix test_eintr, kill child process on error</li> <li>Issue #25155: Add _PyTime_AsTimevalTime_t() function to fix a regression: support again years after 2038.</li> <li>Issue #25150: Hide the private _Py_atomic_xxx symbols from the public Python.h header to fix a compilation error with OpenMP. PyThreadState_GET() becomes an alias to PyThreadState_Get() to avoid ABI incompatibilies.</li> <li>Issue #25003: On Solaris 11.3 or newer, os.urandom() now uses the getrandom() function instead of the getentropy() function.</li> </ul> </div> History of the Python private C API _PyTime2016-02-17T22:00:00+01:002016-02-17T22:00:00+01:00Victor Stinnertag:vstinner.github.io,2016-02-17:/pytime.html<p class="first last">History of the Python private C API _PyTime</p> <p>I added functions to the private &quot;pytime&quot; library to convert timestamps from/to various formats. I expected to spend a few days, at the end I spent 3 years (2012-2015) on them!</p> <div class="section" id="python-3-3"> <h2>Python 3.3</h2> <p>In 2012, I proposed the <a class="reference external" href="https://www.python.org/dev/peps/pep-0410/">PEP 410 -- Use decimal.Decimal type for timestamps</a> because storing timestamps as floating point numbers looses precision. The PEP was rejected because it modified many functions and had a bad API. At least, os.stat() got 3 new fields (atime_ns, mtime_ns, ctime_ns): timestamps as a number of nanoseconds (<tt class="docutils literal">int</tt>).</p> <p>My <a class="reference external" href="https://www.python.org/dev/peps/pep-0418/">PEP 418 -- Add monotonic time, performance counter, and process time functions</a> was accepted, Python 3.3 got a new <tt class="docutils literal">time.monotonic()</tt> function (and a few others). Again, I spent much more time than I expected on a problem which looked simple at the first look.</p> <p>With the <a class="reference external" href="http://bugs.python.org/issue14180">issue #14180</a>, I added functions to convert timestamps to the private &quot;pytime&quot; API to factorize the code of various modules. Timestamps were rounded towards +infinity (ROUND_CEILING), but it was not a deliberate choice.</p> </div> <div class="section" id="python-3-4"> <h2>Python 3.4</h2> <p>To fix correctly a performance issue in asyncio (<a class="reference external" href="https://bugs.python.org/issue20311">issue20311</a>), I added two rounding modes to the pytime API: _PyTime_ROUND_DOWN (round towards zero), and _PyTime_ROUND_UP (round away from zero). Polling for events (ex: using <tt class="docutils literal">select.select()</tt>) with a non-zero timestamp must not call the underlying C level in non-blocking mode.</p> </div> <div class="section" id="python-3-5"> <h2>Python 3.5</h2> <p>When working on the <a class="reference external" href="https://bugs.python.org/issue22117">issue #22117</a>, I noticed that the implementation of rounding methods was buggy for negative timestamps. I replaced the _PyTime_ROUND_DOWN with _PyTime_ROUND_FLOOR (round towards minus infinity), and _PyTime_ROUND_UP with _PyTime_ROUND_CEILING (round towards infinity).</p> <p>This issue also introduced a new private <tt class="docutils literal">_PyTime_t</tt> type to support nanosecond resolution. The type is an opaque integer type to store timestamps. In practice, it's a signed 64-bit integer. Since it's an integer, it's easy and natural to compute the sum or differecence of two timestamps: <tt class="docutils literal">t1 + t2</tt> and <tt class="docutils literal">t2 - t1</tt>. I added _PyTime_XXX() functions to create a timestamp and _PyTime_AsXXX() functions to convert a timestamp to a different format.</p> <p>I had to keep three _PyTime_ObjectToXXX() functions for fromtimestamp() methods of the datetime module. These methods must support extreme timestamps (year 1..9999), whereas _PyTime_t is &quot;limited&quot; to a delta of +/- 292 years (year 1678..2262).</p> </div> <div class="section" id="python-3-6"> <h2>Python 3.6</h2> <p>In 2015, the <a class="reference external" href="http://bugs.python.org/issue23517">issue #23517</a> reported that Python 2 and Python 3 don't use the same rounding method in datetime.datetime.fromtimestamp(): there was a difference of 1 microsecond.</p> <p>After a long discussion, I modified fromtimestamp() methods of the datetime module to round to nearest with ties going away from zero (ROUND_HALF_UP), as done in Python 2.7, as round() in all Python versions.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>It took me three years to stabilize the API and fix all issues. Well, I didn't spend all my days on it, but it shows that handling time is not a simple issue.</p> <p>At the Python level, nothing changed, timestamps are still stored as float (except of the 3 new fieleds of os.stat()).</p> <p>Python 3.5 only supports timezones with fixed offset, it does not support the locale timestamp for example. Timezones are still an hot topic: the <a class="reference external" href="https://mail.python.org/mailman/listinfo/datetime-sig">datetime-sig mailing list</a> was created to enhance timezone support in Python.</p> </div> Status of the FAT Python project, January 12, 20162016-01-12T13:42:00+01:002016-01-12T13:42:00+01:00Victor Stinnertag:vstinner.github.io,2016-01-12:/fat-python-status-janv12-2016.html<p class="first last">Status of the FAT Python project, January 12, 2016</p> <a class="reference external image-reference" href="http://faster-cpython.readthedocs.org/fat_python.html"> <img alt="FAT Python project" class="align-right" src="https://vstinner.github.io/images/fat_python.jpg" /> </a> <p>Previous status: <a class="reference external" href="https://vstinner.github.io/fat-python-status-nov26-2015.html">Status of the FAT Python project, November 26, 2015</a>.</p> <div class="section" id="summary"> <h2>Summary</h2> <ul class="simple"> <li>New optimizations implemented:<ul> <li>constant propagation</li> <li>constant folding</li> <li>dead code elimination</li> <li>simplify iterable</li> <li>replace builtin __debug__ variable with its value</li> </ul> </li> <li>Major API refactoring to make the API more generic and reusable by other projects, and maybe different use case.</li> <li>Work on 3 different Python Enhancement Proposals (PEP): API for pluggable static optimizers and function specialization</li> </ul> <p>The two previously known major bugs, &quot;Wrong Line Numbers (and Tracebacks)&quot; and &quot;exec(code, dict)&quot;, are now fixed.</p> </div> <div class="section" id="python-enhancement-proposals-pep"> <h2>Python Enhancement Proposals (PEP)</h2> <p>I proposed an API for to support function specialization and static optimizers. I splitted changes in 3 different Python Enhancement Proposals (PEP):</p> <ul class="simple"> <li><a class="reference external" href="https://www.python.org/dev/peps/pep-0509/">PEP 509 - Add a private version to dict</a>: &quot;Add a new private version to builtin <tt class="docutils literal">dict</tt> type, incremented at each change, to implement fast guards on namespaces.&quot;</li> <li><a class="reference external" href="https://www.python.org/dev/peps/pep-0510/">PEP 510 - Specialize functions</a>: &quot;Add functions to the Python C API to specialize pure Python functions: add specialized codes with guards. It allows to implement static optimizers respecting the Python semantics.&quot;</li> <li><a class="reference external" href="https://www.python.org/dev/peps/pep-0511/">PEP 511 - API for AST transformers</a>: &quot;Propose an API to support AST transformers.&quot;</li> </ul> <p>The PEP 509 was sent to the python-ideas mailing list for a first round, and then to python-dev mailing list. The PEP 510 was sent to python-ideas to a first round. The last PEP was not published yet, I'm still working on it.</p> </div> <div class="section" id="major-api-refactor"> <h2>Major API refactor</h2> <p>The API has been deeply refactored to write the Python Enhancement Proposals.</p> <p>First set of changes for function specialization (PEP 510):</p> <ul class="simple"> <li>astoptimizer now adds <tt class="docutils literal">import fat</tt> to optimized code when specialization is used</li> <li>Remove the function subtype: add directly the <tt class="docutils literal">specialize()</tt> method to functions</li> <li>Add support of any callable object to <tt class="docutils literal">func.specialize()</tt>, not only code object (bytecode)</li> <li>Create guard objects:<ul> <li>fat.Guard</li> <li>fat.GuardArgType</li> <li>fat.GuardBuiltins</li> <li>fat.GuardDict</li> <li>fat.GuardFunc</li> </ul> </li> <li>Add functions to create guards:<ul> <li>fat.GuardGlobals</li> <li>fat.GuardTypeDict</li> </ul> </li> <li>Move code.replace_consts() to fat.replace_consts()</li> </ul> <p>Second set of changes for AST transformers (PEP 511):</p> <ul class="simple"> <li>Add sys.implementation.ast_transformers and sys.implementation.optim_tag</li> <li>Rename sys.asthook to sys.ast_transformers</li> <li>Add -X fat command line option to enable the FAT mode: register the astoptimizer in AST transformers</li> <li>Replace -F command line option with -o OPTIM_TAG</li> <li>Remove sys.flags.fat (Python flag) and Py_FatPython (C variable)</li> <li>Rewrite how an AST transformer is registered</li> <li>importlib skips .py if optim_tag is not 'opt' and required AST transformers are missing. Raise ImportError if the .pyc file is missing.</li> </ul> <p>Third set of changes for dictionary versionning, updates after the first round of the PEP 509 on python-ideas:</p> <ul class="simple"> <li>Remove dict.__version__ read-only property: the version is now only accessible from the C API</li> <li>Change the type of the C field <tt class="docutils literal">ma_version</tt> from <tt class="docutils literal">size_t</tt> to <tt class="docutils literal">unsigned PY_INT64_T</tt> to also use 64-bit unsigned integer on 32-bit platforms. The risk of missing a change in a guard with a 32-bit version is too high, whereas the risk with a 64-bit version is very very low.</li> </ul> <p>Fourth set of changes for function specialization, updates after the first round of the PEP 510 on python-ideas:</p> <ul class="simple"> <li>Remove func.specialize() and func.get_specialized() at the Python level, replace them with C functions. Expose them again as fat.specialize(func, ...) and fat.get_specialized(func)</li> <li>fat.get_specialized() now returns a list of tuples, instead of a list of dict</li> <li>Make fat.Guard type private: rename it to fat._Guard</li> <li>Add fat.PyGuard: toy to implement a guard in pure Python</li> <li>Guard C API: rename first_check to init and support reporting errors</li> </ul> </div> <div class="section" id="change-log"> <h2>Change log</h2> <p>Detailed changes of the FAT Python between November 24, 2015 and January 12, 2016.</p> <div class="section" id="end-of-november"> <h3>End of november</h3> <p>Major change:</p> <ul class="simple"> <li>Add a __version__ read-only property to dict, remove the verdict subtype of dict. As a consequence, dictionary guards now hold a strong reference to the dict value</li> </ul> <p>Minor changes:</p> <ul class="simple"> <li>Allocate dynamically memory for specialized code and guards, don't use fixed-size arrays anymore</li> <li>astoptimizer: enhance scope detection</li> <li>optimize astoptimizer: don't copy a whole AST tree anymore with copy.deepcopy(), only copy modified nodes.</li> <li>Add Config.max_constant_size</li> <li>Reenable checks on cell variables: allow cell variables if they are the same</li> <li>Reenable optimizations on methods calling super(), but never copy super() builtin to constants. If super() is replaced with a string, the required free variable (reference to the current class) is not created by the compiler</li> <li>Add PureBuiltin config</li> <li>NodeVisitor now calls generic_visit() before visit_XXX()</li> <li>Loop unrolling now also optimizes tuple iterators</li> <li>At the end of Python initialization, create a copy of the builtins dictionary to be able later to detect if a builtin name was replaced.</li> <li>Implement collections.UserDict.__version__</li> </ul> </div> <div class="section" id="december-first-half"> <h3>December (first half)</h3> <p>Major changes:</p> <ul class="simple"> <li>Implement 4 new optimizations:<ul> <li>constant propagation</li> <li>constant folding</li> <li>replace builtin __debug__ variable with its value</li> <li>dead code elimination</li> </ul> </li> <li>Add support of per module configuration using an __astoptimizer__ variable</li> <li>code.co_lnotab now supports negative line number delta. Change the type of line number delta in co_lnotab from unsigned 8-bit integer to signed 8-bit integer. This change fixes almost all issues about line numbers.</li> </ul> <p>Minor changes:</p> <ul class="simple"> <li>Change .pyc magic number to 3600</li> <li>Remove unused fat.specialized_method() function</li> <li>Remove Lib/fat.py, rename Modules/_fat.c to Modules/fat.c: fat module is now only implemented in C</li> <li>Fix more tests of the Python test suite</li> <li>A builtin guard now adds a guard on globals. Ignore also the specialization if globals()[name] already exists.</li> <li>Ignore duplicated guards</li> <li>Implement namespace following the control flow for constant propagation</li> <li>Config.max_int_bits becomes a simple integer</li> <li>Fix bytecode compilation for tuple constants. Don't merge (0, 0) and (0.0, 0.0) constants, they are different.</li> <li>Call more builtin functions</li> <li>Optimize the optimizer: write a metaclass to discover visitors when the class is created, not when the class is instanciated</li> </ul> </div> <div class="section" id="december-second-half"> <h3>December (second half)</h3> <p>Major changes:</p> <ul class="simple"> <li>Implement &quot;simplify iterable&quot; optimization. The loop unrolling optimization now relies on it to replace <tt class="docutils literal">range(n)</tt>.</li> <li>Split the function optimization in two stages: first apply optimizations which don't require specialization, then apply optimizations which require specialization.</li> <li>Replace the builtin __fat__ variable with a new sys.flags.fat flag</li> </ul> <p>Minor changes:</p> <ul class="simple"> <li>Extend optimizations to optimize more cases (more builtins, more loop unrolling, remove more dead code, etc.)</li> <li>Add Config.logger attribute. astoptimize logs into sys.stderr when Python is started in verbose mode (python3 -v)</li> <li>Move func.patch_constants() to code.replace_consts()</li> <li>Enhance marshal to fix tests: call frozenset() to get the empty frozenset singleton</li> <li>Don't remove code which must raise a SyntaxError. Don't remove code containing the continue instruction.</li> <li>Restrict GlobalNonlocalVisitor to the current namespace</li> <li>Emit logs when optimizations are skipped</li> <li>Use some maths to avoid optimization pow() if result is an integer and will be larger than the configuration. For example, don't optimize 2 ** (2**100).</li> </ul> </div> <div class="section" id="january"> <h3>January</h3> <p>Major changes:</p> <ul class="simple"> <li>astoptimizer now produces a single builtin guard with all names, instead of a guard per name.</li> <li>Major API refactoring detailed in a dedicated section above</li> </ul> <p>Minor changes:</p> <ul class="simple"> <li>Start to write PEPs</li> <li>Dictionary guards now expect a list of names, instead of a single name, to reduce the cost of guards.</li> <li>GuardFunc now uses a strong reference to the function, instead of a weak reference to simplify the code</li> <li>Initialize dictionary version to 0</li> </ul> </div> </div> Status of the FAT Python project, November 26, 20152015-11-26T17:30:00+01:002015-11-26T17:30:00+01:00Victor Stinnertag:vstinner.github.io,2015-11-26:/fat-python-status-nov26-2015.html<p class="first last">Status of the FAT Python project, November 26, 2015</p> <a class="reference external image-reference" href="http://faster-cpython.readthedocs.org/fat_python.html"> <img alt="FAT Python project" class="align-right" src="https://vstinner.github.io/images/fat_python.jpg" /> </a> <p>Previous status: [python-dev] <a class="reference external" href="https://mail.python.org/pipermail/python-dev/2015-November/142113.html">Second milestone of FAT Python</a> (Nov 4, 2015).</p> <div class="section" id="documentation"> <h2>Documentation</h2> <p>I combined the documentation of various optimizations projects into a single documentation: <a class="reference external" href="http://faster-cpython.readthedocs.org/">Faster CPython</a>. My previous optimizations projects:</p> <ul class="simple"> <li><a class="reference external" href="http://faster-cpython.readthedocs.org/old_ast_optimizer.html">&quot;old&quot; astoptimizer</a> (now replaced with a &quot;new&quot; astoptimizer included in the FAT Python)</li> <li><a class="reference external" href="http://faster-cpython.readthedocs.org/registervm.html">registervm</a></li> <li><a class="reference external" href="http://faster-cpython.readthedocs.org/readonly.html">read-only Python</a></li> </ul> <p>The FAT Python project has its own page: <a class="reference external" href="http://faster-cpython.readthedocs.org/fat_python.html">FAT Python project</a>.</p> </div> <div class="section" id="copy-builtins-to-constants-optimization"> <h2>Copy builtins to constants optimization</h2> <p>The <tt class="docutils literal">LOAD_GLOBAL</tt> instruction is used to load a builtin function. The instruction requires two dictionary lookup: one in the global namespace (which almost always fail) and then in the builtin namespaces.</p> <p>It's rare to replace builtins, so the idea here is to replace the dynamic <tt class="docutils literal">LOAD_GLOBAL</tt> instruction with a static <tt class="docutils literal">LOAD_CONST</tt> instruction which loads the function from a C array, a fast O(1) lookup.</p> <p>It is not possible to inject a builtin function during the compilation. Python code objects are serialized by the marshal module which only support simple types like integers, strings and tuples, not functions. The trick is to modify the constants at runtime when the module is loaded. I added a new <tt class="docutils literal">patch_constants()</tt> method to functions.</p> <p>Example:</p> <pre class="literal-block"> def log(message): print(message) </pre> <p>This function is specialized to:</p> <pre class="literal-block"> def log(message): 'LOAD_GLOBAL print'(message) log.patch_constants({'LOAD_GLOBAL print': print}) </pre> <p>The specialized bytecode uses two guards on builtin and global namespaces to disable the optimization if the builtin function is replaced.</p> <p>See <a class="reference external" href="https://faster-cpython.readthedocs.org/fat_python.html#copy-builtin-functions-to-constants">Copy builtin functions to constants</a> for more information.</p> </div> <div class="section" id="loop-unrolling-optimization"> <h2>Loop unrolling optimization</h2> <p>A simple optimization is to &quot;unroll&quot; a loop to reduce the cost of loops. The optimization generates assignement statements (for the loop index variable) and duplicates the loop body.</p> <p>Example with a <tt class="docutils literal">range()</tt> iterator:</p> <pre class="literal-block"> def func(): for i in (1, 2, 3): print(i) </pre> <p>The function is specialized to:</p> <pre class="literal-block"> def func(): i = 1 print(i) i = 2 print(i) i = 3 print(i) </pre> <p>If the iterator uses the builtin <tt class="docutils literal">range</tt> function, two guards are required on builtin and global namespaces.</p> <p>The optimization also handles tuple iterator. No guard is needed in this case (the code is always optimized).</p> <p>See <a class="reference external" href="https://faster-cpython.readthedocs.org/fat_python.html#loop-unrolling">Loop unrolling</a> for more information.</p> </div> <div class="section" id="lot-of-enhancements-of-the-ast-optimizer"> <h2>Lot of enhancements of the AST optimizer</h2> <p>New optimizations helped to find bugs in the <a class="reference external" href="https://faster-cpython.readthedocs.org/new_ast_optimizer.html">AST optimizer</a>. Many fixes and various enhancements were done in the AST optimizer.</p> <p>The number of lines of code more than doubled: 500 to 1200 lines.</p> <p>Optimization: <tt class="docutils literal">copy.deepcopy()</tt> is no more used to duplicate a full tree. The new <tt class="docutils literal">NodeTransformer</tt> class now only copies a single node, if at least one field is modified.</p> <p>The <tt class="docutils literal">VariableVisitor</tt> class which detects local and global variables was heavily modified. It understands much more kinds of AST node: <tt class="docutils literal">For</tt>, <tt class="docutils literal">AugAssign</tt>, <tt class="docutils literal">AsyncFunctionDef</tt>, <tt class="docutils literal">ClassDef</tt>, etc. It now also detects non-local variables (<tt class="docutils literal">nonlocal</tt> keyword). The scope is now limited to the current function, it doesn't enter inside nested <tt class="docutils literal">DictComp</tt>, <tt class="docutils literal">FunctionDef</tt>, <tt class="docutils literal">Lambda</tt>, etc. These nodes create a new separated namespace.</p> <p>The optimizer is now able to optimize a function without guards: it's needed to unroll a loop using a tuple as iterator.</p> </div> <div class="section" id="known-bugs"> <h2>Known bugs</h2> <p>See the <a class="reference external" href="https://hg.python.org/sandbox/fatpython/file/0d30dba5fa64/TODO.rst">TODO.rst file</a> for known bugs.</p> <div class="section" id="wrong-line-numbers-and-tracebacks"> <h3>Wrong Line Numbers (and Tracebacks)</h3> <p>AST nodes have <tt class="docutils literal">lineno</tt> and <tt class="docutils literal">col_offset</tt> fields, so an AST optimizer is not &quot;supposed&quot; to break line numbers. In practice, line numbers, and so tracebacks, are completly wrong in FAT mode. The problem is probably that AST optimizer can copy and move instructions. Line numbers are no more motononic. CPython probably don't handle this case (negative line delta).</p> <p>It should be possible to fix it, but right now I prefer to focus on new optimizations and fix other bugs.</p> </div> <div class="section" id="exec-code-dict"> <h3>exec(code, dict)</h3> <p>In FAT mode, some optimizations require guards on the global namespace. If <tt class="docutils literal">exec()</tt> if called with a Python <tt class="docutils literal">dict</tt> for globals, an exception is raised because <tt class="docutils literal">func.specialize()</tt> requires a <tt class="docutils literal">fat.verdict</tt> for globals.</p> <p>It's not possible to convert implicitly the <tt class="docutils literal">dict</tt> to a <tt class="docutils literal">fat.verdict</tt>, because the <tt class="docutils literal">dict</tt> is expected to be mutated, and the guards be will on <tt class="docutils literal">fat.verdict</tt> not on the original <tt class="docutils literal">dict</tt>.</p> <p>I worked around the bug by creating manually a <tt class="docutils literal">fat.verdict</tt> in FAT mode, instead of a <tt class="docutils literal">dict</tt>.</p> <p>This bug will go avoid if the versionning feature is moved directly into the builtin <tt class="docutils literal">dict</tt> type (and the <tt class="docutils literal">fat.verdict</tt> type is removed).</p> </div> </div> Port your Python 2 applications to Python 3 with sixer2015-06-16T15:00:00+02:002015-06-16T15:00:00+02:00Victor Stinnertag:vstinner.github.io,2015-06-16:/python3-sixer.html<p class="first last">Port your Python 2 applications to Python 3 with sixer</p> <div class="section" id="from-2to3-to-2to6"> <h2>From 2to3 to 2to6</h2> <p>When Python 3.0 was released, the official statement was to port your application using <a class="reference external" href="https://docs.python.org/3.5/library/2to3.html">2to3</a> and drop Python 2 support. It didn't work because you had to port all libraries first. If a library drops Python 2 support, existing applications running on Python 2 cannot use this library anymore.</p> <p>This chicken-and-egg issue was solved by the creation of the <a class="reference external" href="https://pythonhosted.org/six/">six module</a> by <a class="reference external" href="https://benjamin.pe/">Benjamin Peterson</a>. Thank you so much Benjamin! Using the six module, it is possible to write a single code base working on Python 2 and Python 3.</p> <p>2to3 was hacked to create the <a class="reference external" href="http://python-modernize.readthedocs.org/">modernize</a> and <a class="reference external" href="https://github.com/limodou/2to6">2to6</a> projects to <em>add Python 3 support</em> without loosing Python 2 support. Problem solved!</p> </div> <div class="section" id="creation-of-the-sixer-tool"> <h2>Creation of the sixer tool</h2> <p>Problem solved? Well, not for my specific use case. I'm porting the huge OpenStack project to Python 3. modernize and 2to6 modify a lot of things at once, add unwanted changes (ex: add <tt class="docutils literal">from __future__ import absolute_import</tt> at the top of each file), and don't respect the OpenStack coding style (especially the <a class="reference external" href="http://docs.openstack.org/developer/hacking/#imports">complex rules to sort and group Python imports</a>).</p> <p>I wrote the <a class="reference external" href="https://pypi.python.org/pypi/sixer">sixer</a> project to <em>generate</em> patches for OpenStack. The problem is that OpenStack code changes very quickly, so it's common to have to fix conflicts the day after submiting a change. At the beginning, it took at least one week to get Python 3 changes merged, whereas many changes are merged every day, so being able to regenerate patches helped a lot.</p> <p>I created the <a class="reference external" href="https://pypi.python.org/pypi/sixer">sixer</a> tool using a list of regular expressions to replace a pattern with another. For example, it replaces <tt class="docutils literal">dict.itervalues()</tt> with <tt class="docutils literal">six.itervalues(dict)</tt>. The code was very simple. The most difficult part was to respect the OpenStack coding style for Python imports.</p> <p>sixer is a success since its creationg, it helped me to fix the all obvious Python 3 issues: replace <tt class="docutils literal">unicode(x)</tt> with <tt class="docutils literal">six.text_type(x)</tt>, replace <tt class="docutils literal">dict.itervalues()</tt> with <tt class="docutils literal">six.itervalues(dict)</tt>, etc. These changes are simple, but it's boring to have to modify manually many files. The OpenStack Nova project has almost 1500 Python files for example.</p> <p>The development version of sixer supports the following operations:</p> <ul class="simple"> <li>all</li> <li>basestring</li> <li>dict0</li> <li>dict_add</li> <li>iteritems</li> <li>iterkeys</li> <li>itertools</li> <li>itervalues</li> <li>long</li> <li>next</li> <li>raise</li> <li>six_moves</li> <li>stringio</li> <li>unicode</li> <li>urllib</li> <li>xrange</li> </ul> </div> <div class="section" id="creation-of-the-sixer-test-suite"> <h2>Creation of the Sixer Test Suite</h2> <p>Slowly, I added more and more patterns to sixer. The code became too complex to be able to check regressions manually, so I also started to write unit tests. Now each operation has at least one unit test. Some complex operations have four tests or more.</p> <p>At the beginning, tests called directly the Python function. It is fast and convenient, but it failed to catch regressions on the command line program. So I added tests running sixer has a blackbox: pass an input file and check the output file. Then I added specific tests on the code parsing command line options.</p> </div> <div class="section" id="the-new-all-operation"> <h2>The new &quot;all&quot; operation</h2> <p>At the beginning, I used sixer to generate a patch for a single pattern. For example, replace <tt class="docutils literal">unicode()</tt> in a whole project.</p> <p>Later, I started to use it differently: I fixed all Python 3 issues at once, but only in some selected files. I did that when we reached a minimum set of tests which pass on Python 3 to have a green py34 check on Jenkins. Then we ported tests one by one. It's better to write short patches, they are easier and faster to review. And the review process is the bottlebeck of the OpenStack development process.</p> <p>To fix all Python 3 at once, I added an <tt class="docutils literal">all</tt> operation which simply applies sequentially each operation. So <tt class="docutils literal">sixer</tt> can now be used as <tt class="docutils literal">modernize</tt> and <tt class="docutils literal">2to6</tt> to fix all Python 3 issues at once in a whole project.</p> <p>I also added the ability to pass filenames instead of having to pass a directory to modify all files in all subdirectories.</p> </div> <div class="section" id="new-urllib-six-moves-and-stringio-operations"> <h2>New urllib, six_moves and stringio operations</h2> <div class="section" id="urllib"> <h3>urllib</h3> <p>I tried to keep the sixer code simple. But some changes are boring to write, like replacing <tt class="docutils literal">urllib</tt> imports <tt class="docutils literal">six.moves.urllib</tt> imports. Python 2 has 3 modules (<tt class="docutils literal">urllib</tt>, <tt class="docutils literal">urllib2</tt>, <tt class="docutils literal">urlparse</tt>), whereas Pytohn 3 uses a single <tt class="docutils literal">urllib</tt> namespace with submodules (<tt class="docutils literal">urllib.request</tt>, <tt class="docutils literal">urllib.parse</tt>, <tt class="docutils literal">urllib.error</tt>). Some Python 2 functions moved to one submodule, whereas others moved to another submodules. It required to know well the old and new layout.</p> <p>After loosing many hours to write manually patches for <tt class="docutils literal">urllib</tt>, I decided to add a <tt class="docutils literal">urllib</tt> operation. In fact, it was so not long to implement it, compared to the time taken to write patches manually.</p> </div> <div class="section" id="stringio"> <h3>stringio</h3> <p>Handling StringIO is also a little bit tricky because String.StringIO and String.cStringIO don't have the same performance on Python 2. Producing patches without killing performances require to pick the right module or symbol from six: <tt class="docutils literal">six.StringIO()</tt> or <tt class="docutils literal">six.moves.cStringIO.StringIO</tt> for example.</p> </div> <div class="section" id="six-moves"> <h3>six_moves</h3> <p>The generic <tt class="docutils literal">six_moves</tt> operation replaces various Python 2 imports with imports from <tt class="docutils literal">six.moves</tt>:</p> <ul class="simple"> <li>BaseHTTPServer</li> <li>ConfigParser</li> <li>Cookie</li> <li>HTMLParser</li> <li>Queue</li> <li>SimpleHTTPServer</li> <li>SimpleXMLRPCServer</li> <li>__builtin__</li> <li>cPickle</li> <li>cookielib</li> <li>htmlentitydefs</li> <li>httplib</li> <li>repr</li> <li>xmlrpclib</li> </ul> </div> </div> <div class="section" id="kiss-emit-warnings-instead-of-complex-implementation"> <h2>KISS: emit warnings instead of complex implementation</h2> <p>As I wrote, I tried to keep sixer simple (KISS principle: Keep It Simple, Stupid). I'm also lazy, I didn't try to write a perfect tool. I don't want to spend hours on the sixer project.</p> <p>When it was too tricky to make a decision or to implement a pattern, sixer emits &quot;warnings&quot; instead. For example, a warning is emitted on <tt class="docutils literal">def next(self):</tt> to remind that a <tt class="docutils literal">__next__ = next</tt> alias is probably needed on this class for Python 3.</p> </div> <div class="section" id="conclusion"> <h2>Conclusion</h2> <p>The sixer tool is incomplete and generates invalid changes. For example, it replaces patterns in comments, docstrings and strings, whereas usually these changes don't make sense. But I'm happy because the tool helped me a lot for to port OpenStack, it saved me hours.</p> <p>I hope that the tool will now be useful to others! Don't hesitate to give me feedback.</p> </div>