Jekyll2023-10-16T05:20:10+00:00https://cppcoffee.github.io/feed.xmlcppcoffee.github.ioRustacean🦀 路漫漫其修远兮,吾将上下而求索Sharp Liu分析 rust 实现 TCP idle 连接池2023-01-23T00:00:00+00:002023-01-23T00:00:00+00:00https://cppcoffee.github.io/network/2023/01/23/%E5%88%86%E6%9E%90rust%E5%AE%9E%E7%8E%B0%E7%9A%84TCP%E8%BF%9E%E6%8E%A5%E6%B1%A0<p>分析 rust 实现 TCP idle 连接池</p> <h2 id="简介">简介</h2> <p>通常用 C 语言实现 TCP idle 连接池,是将 idle fd 放到 epoll_wait 中等待事件通知(对端主动断开链接等事件)。而更高级的语言如 go/rust 如果照搬 epoll_wait 实现,获取 inner fd 会失去语言封装的特性。</p> <p>最近在阅读开源项目源码的时候,看到了 rust 实现的 ureq 库中的 TCP idle 连接池的实现,可以当作高级语言实现连接池的参考。</p> <h2 id="结构体">结构体</h2> <p>ureq 连接池的实现在 pool.rs 中,连接池结构体定义:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span><span class="p">(</span><span class="n">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="n">ConnectionPool</span> <span class="p">{</span> <span class="n">inner</span><span class="p">:</span> <span class="n">Mutex</span><span class="o">&lt;</span><span class="n">Inner</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">max_idle_connections</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">max_idle_connections_per_host</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="p">}</span> <span class="k">struct</span> <span class="n">Inner</span> <span class="p">{</span> <span class="c">// the actual pooled connection. however only one per hostname:port.</span> <span class="n">recycle</span><span class="p">:</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">PoolKey</span><span class="p">,</span> <span class="n">VecDeque</span><span class="o">&lt;</span><span class="n">Stream</span><span class="o">&gt;&gt;</span><span class="p">,</span> <span class="c">// This is used to keep track of which streams to expire when the</span> <span class="c">// pool reaches MAX_IDLE_CONNECTIONS. The corresponding PoolKeys for</span> <span class="c">// recently used Streams are added to the back of the queue;</span> <span class="c">// old streams are removed from the front.</span> <span class="n">lru</span><span class="p">:</span> <span class="n">VecDeque</span><span class="o">&lt;</span><span class="n">PoolKey</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> </code></pre></div></div> <p>空闲连接由 <code class="language-plaintext highlighter-rouge">HashMap&lt;PoolKey, VecDeque&lt;String&gt;&gt;</code> 存放,<code class="language-plaintext highlighter-rouge">host:port</code> 作为 key,连接存放到队列中。</p> <h2 id="空闲连接获取">空闲连接获取</h2> <p>ureq crate 从 <code class="language-plaintext highlighter-rouge">connect_socket</code> 接口获取 TCP 连接,如果 <code class="language-plaintext highlighter-rouge">use_pooled</code> 参数传递 <code class="language-plaintext highlighter-rouge">true</code>,就从连接池中获取连接。</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">/// Connect the socket, either by using the pool or grab a new one.</span> <span class="k">fn</span> <span class="nf">connect_socket</span><span class="p">(</span><span class="n">unit</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">Unit</span><span class="p">,</span> <span class="n">hostname</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">,</span> <span class="n">use_pooled</span><span class="p">:</span> <span class="nb">bool</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Result</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Stream</span><span class="p">,</span> <span class="nb">bool</span><span class="p">),</span> <span class="n">Error</span><span class="o">&gt;</span> <span class="p">{</span> <span class="o">...</span> <span class="k">if</span> <span class="n">use_pooled</span> <span class="p">{</span> <span class="k">let</span> <span class="n">pool</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">unit</span><span class="py">.agent.state.pool</span><span class="p">;</span> <span class="k">let</span> <span class="n">proxy</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">unit</span><span class="py">.agent.config.proxy</span><span class="p">;</span> <span class="c">// The connection may have been closed by the server</span> <span class="c">// due to idle timeout while it was sitting in the pool.</span> <span class="c">// Loop until we find one that is still good or run out of connections.</span> <span class="k">while</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">stream</span><span class="p">)</span> <span class="o">=</span> <span class="n">pool</span><span class="nf">.try_get_connection</span><span class="p">(</span><span class="o">&amp;</span><span class="n">unit</span><span class="py">.url</span><span class="p">,</span> <span class="n">proxy</span><span class="nf">.clone</span><span class="p">())</span> <span class="p">{</span> <span class="k">let</span> <span class="n">server_closed</span> <span class="o">=</span> <span class="n">stream</span><span class="nf">.server_closed</span><span class="p">()</span><span class="o">?</span><span class="p">;</span> <span class="k">if</span> <span class="o">!</span><span class="n">server_closed</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">Ok</span><span class="p">((</span><span class="n">stream</span><span class="p">,</span> <span class="k">true</span><span class="p">));</span> <span class="p">}</span> <span class="nd">debug!</span><span class="p">(</span><span class="s">"dropping stream from pool; closed by server: {:?}"</span><span class="p">,</span> <span class="n">stream</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="k">let</span> <span class="n">stream</span> <span class="o">=</span> <span class="k">match</span> <span class="n">unit</span><span class="py">.url</span><span class="nf">.scheme</span><span class="p">()</span> <span class="p">{</span> <span class="s">"http"</span> <span class="k">=&gt;</span> <span class="nn">stream</span><span class="p">::</span><span class="nf">connect_http</span><span class="p">(</span><span class="n">unit</span><span class="p">,</span> <span class="n">hostname</span><span class="p">),</span> <span class="s">"https"</span> <span class="k">=&gt;</span> <span class="nn">stream</span><span class="p">::</span><span class="nf">connect_https</span><span class="p">(</span><span class="n">unit</span><span class="p">,</span> <span class="n">hostname</span><span class="p">),</span> <span class="s">"test"</span> <span class="k">=&gt;</span> <span class="nf">connect_test</span><span class="p">(</span><span class="n">unit</span><span class="p">),</span> <span class="n">scheme</span> <span class="k">=&gt;</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">ErrorKind</span><span class="p">::</span><span class="n">UnknownScheme</span><span class="nf">.msg</span><span class="p">(</span><span class="nd">format!</span><span class="p">(</span><span class="s">"unknown scheme {}"</span><span class="p">,</span> <span class="n">scheme</span><span class="p">))),</span> <span class="p">};</span> <span class="nf">Ok</span><span class="p">((</span><span class="n">stream</span><span class="o">?</span><span class="p">,</span> <span class="k">false</span><span class="p">))</span> <span class="p">}</span> </code></pre></div></div> <p>函数循环从池子中获取连接,并调用 <code class="language-plaintext highlighter-rouge">server_closed</code> 判断空闲连接是否可用(没有被对端断开,没有残留数据)。</p> <p>空闲连接需要判断是否断开的逻辑:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="c">// Check if the server has closed a stream by performing a one-byte</span> <span class="c">// non-blocking read. If this returns EOF, the server has closed the</span> <span class="c">// connection: return true. If this returns a successful read, there are</span> <span class="c">// some bytes on the connection even though there was no inflight request.</span> <span class="c">// For plain HTTP streams, that might mean an HTTP 408 was pushed; it</span> <span class="c">// could also mean a buggy server that sent more bytes than a response's</span> <span class="c">// Content-Length. For HTTPS streams, that might mean a close_notify alert,</span> <span class="c">// which is the proper way to shut down an idle stream.</span> <span class="c">// Either way, bytes available on the stream before we've made a request</span> <span class="c">// means the stream is not usable, so we should discard it.</span> <span class="c">// If this returns WouldBlock (aka EAGAIN),</span> <span class="c">// that means the connection is still open: return false. Otherwise</span> <span class="c">// return an error.</span> <span class="k">fn</span> <span class="nf">serverclosed_stream</span><span class="p">(</span><span class="n">stream</span><span class="p">:</span> <span class="o">&amp;</span><span class="nn">std</span><span class="p">::</span><span class="nn">net</span><span class="p">::</span><span class="n">TcpStream</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nn">io</span><span class="p">::</span><span class="n">Result</span><span class="o">&lt;</span><span class="nb">bool</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">let</span> <span class="k">mut</span> <span class="n">buf</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">;</span> <span class="mi">1</span><span class="p">];</span> <span class="n">stream</span><span class="nf">.set_nonblocking</span><span class="p">(</span><span class="k">true</span><span class="p">)</span><span class="o">?</span><span class="p">;</span> <span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="k">match</span> <span class="n">stream</span><span class="nf">.peek</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">buf</span><span class="p">)</span> <span class="p">{</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="p">{</span> <span class="nd">debug!</span><span class="p">(</span> <span class="s">"peek on reused connection returned {}, not WouldBlock; discarding"</span><span class="p">,</span> <span class="n">n</span> <span class="p">);</span> <span class="nf">Ok</span><span class="p">(</span><span class="k">true</span><span class="p">)</span> <span class="p">}</span> <span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">if</span> <span class="n">e</span><span class="nf">.kind</span><span class="p">()</span> <span class="o">==</span> <span class="nn">io</span><span class="p">::</span><span class="nn">ErrorKind</span><span class="p">::</span><span class="n">WouldBlock</span> <span class="k">=&gt;</span> <span class="nf">Ok</span><span class="p">(</span><span class="k">false</span><span class="p">),</span> <span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">=&gt;</span> <span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">),</span> <span class="p">};</span> <span class="n">stream</span><span class="nf">.set_nonblocking</span><span class="p">(</span><span class="k">false</span><span class="p">)</span><span class="o">?</span><span class="p">;</span> <span class="n">result</span> <span class="p">}</span> </code></pre></div></div> <p>将 stream 设置成非 blocking,调用 <code class="language-plaintext highlighter-rouge">peak</code> 判断是否对端已经断开连接或者有残留数据。</p> <h2 id="参考">参考</h2> <p><a href="https://github.com/algesten/ureq/blob/main/src/unit.rs">https://github.com/algesten/ureq/blob/main/src/unit.rs</a></p> <p><a href="https://github.com/algesten/ureq/blob/main/src/stream.rs">https://github.com/algesten/ureq/blob/main/src/stream.rs</a></p>Sharp Liu分析 rust 实现 TCP idle 连接池ucontext实现mini协程库与优化2022-02-02T00:00:00+00:002022-02-02T00:00:00+00:00https://cppcoffee.github.io/system/program/2022/02/02/ucontext%E5%AE%9E%E7%8E%B0mini%E5%8D%8F%E7%A8%8B%E5%BA%93%E4%B8%8E%E4%BC%98%E5%8C%96<p>ucontext实现mini协程库与优化</p> <h2 id="简介">简介</h2> <p>Linux 下提供 <code class="language-plaintext highlighter-rouge">ucontext</code> 系列 API 来实现协程(coroutine)操作,协程可以由开发者实现调度。</p> <p><code class="language-plaintext highlighter-rouge">ucontent</code> 是 <code class="language-plaintext highlighter-rouge">setjmp</code>/<code class="language-plaintext highlighter-rouge">longjmp</code> 的高级版,支持携带参数调用。</p> <p><code class="language-plaintext highlighter-rouge">ucontext</code> APIs:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include &lt;ucontext.h&gt; int getcontext(ucontext_t *ucp); int setcontext(const ucontext_t *ucp); void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...); int swapcontext(ucontext_t *restrict oucp, const ucontext_t *restrict ucp); </code></pre></div></div> <p>使用 ucontext 系列 API 实现协程库需要实现基本的 coroutine <code class="language-plaintext highlighter-rouge">yield</code>/<code class="language-plaintext highlighter-rouge">resume</code> 接口,其中</p> <ul> <li><code class="language-plaintext highlighter-rouge">resume</code>: 重新执行协程暂停的位置</li> <li><code class="language-plaintext highlighter-rouge">yield</code>: 在当前点暂停协程的执行</li> </ul> <h2 id="实现">实现</h2> <h3 id="coroutine-状态">coroutine 状态</h3> <p>协程状态分成四种,定义四种协程状态</p> <ol> <li>准备就绪(ready)</li> <li>运行中(resume)</li> <li>暂停中(yield)</li> <li>运行完成(done)</li> </ol> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">enum</span> <span class="p">{</span> <span class="n">COROUTINE_READY</span><span class="p">,</span> <span class="n">COROUTINE_RUNNING</span><span class="p">,</span> <span class="n">COROUTINE_SUSPEND</span><span class="p">,</span> <span class="n">COROUTINE_DEAD</span><span class="p">,</span> <span class="p">}</span> <span class="n">coroutine_status_e</span><span class="p">;</span> </code></pre></div></div> <h3 id="coroutine-结构体">coroutine 结构体</h3> <p>协程结构体需要包含协程栈大小和协程相关状态,使用 stack_id 用于解决使用 valgrind 跟踪出现的栈变动警告。</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span> <span class="n">ucontext_t</span> <span class="n">main</span><span class="p">;</span> <span class="n">ucontext_t</span> <span class="n">ctx</span><span class="p">;</span> <span class="c1">// 协程执行入口函数与参数</span> <span class="n">coroutine_pt</span> <span class="n">func</span><span class="p">;</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ud</span><span class="p">;</span> <span class="c1">// 协程栈指针与栈大小</span> <span class="kt">void</span> <span class="o">*</span><span class="n">stack</span><span class="p">;</span> <span class="kt">size_t</span> <span class="n">stack_size</span><span class="p">;</span> <span class="c1">// 协程运行状态</span> <span class="n">coroutine_status_e</span> <span class="n">status</span><span class="p">;</span> <span class="kt">int</span> <span class="n">stack_id</span><span class="p">;</span> <span class="c1">// 协程是否运行完成</span> <span class="kt">unsigned</span> <span class="n">done</span><span class="o">:</span><span class="mi">1</span><span class="p">;</span> <span class="p">}</span> <span class="n">coroutine_t</span><span class="p">;</span> </code></pre></div></div> <h3 id="coroutine_create">coroutine_create</h3> <p>创建协程,指定协程运行函数的入口与参数,还有协程运行需要的栈大小。</p> <p>如果指定栈大小为0,就使用 <code class="language-plaintext highlighter-rouge">SIGSTKSZ</code> 定义的大小。</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">coroutine_t</span> <span class="o">*</span> <span class="nf">coroutine_create</span><span class="p">(</span><span class="n">coroutine_pt</span> <span class="n">fn</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ud</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">stack_size</span><span class="p">)</span> <span class="p">{</span> <span class="n">coroutine_t</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span> <span class="kt">size_t</span> <span class="n">size</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">stack_size</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="n">stack_size</span> <span class="o">=</span> <span class="n">SIGSTKSZ</span><span class="p">;</span> <span class="p">}</span> <span class="n">size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">co</span><span class="p">)</span> <span class="o">+</span> <span class="n">stack_size</span><span class="p">;</span> <span class="n">co</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">size</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">co</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span> <span class="n">memset</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">co</span><span class="p">));</span> <span class="c1">// 设置协程执行入口函数和参数</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">func</span> <span class="o">=</span> <span class="n">fn</span><span class="p">;</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">ud</span> <span class="o">=</span> <span class="n">ud</span><span class="p">;</span> <span class="c1">// 栈与栈大小</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">stack</span> <span class="o">=</span> <span class="n">co</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">stack_size</span> <span class="o">=</span> <span class="n">stack_size</span><span class="p">;</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">status</span> <span class="o">=</span> <span class="n">COROUTINE_READY</span><span class="p">;</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">stack_id</span> <span class="o">=</span> <span class="n">VALGRIND_STACK_REGISTER</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span> <span class="n">co</span> <span class="o">+</span> <span class="n">size</span><span class="p">);</span> <span class="k">return</span> <span class="n">co</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <h3 id="coroutine_resume">coroutine_resume</h3> <p>协程切换/调度,恢复协程运行,并更新协程状态。</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">coroutine_resume</span><span class="p">(</span><span class="n">coroutine_t</span> <span class="o">*</span><span class="n">co</span><span class="p">)</span> <span class="p">{</span> <span class="k">switch</span> <span class="p">(</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">status</span><span class="p">)</span> <span class="p">{</span> <span class="k">case</span> <span class="n">COROUTINE_READY</span><span class="p">:</span> <span class="k">if</span> <span class="p">(</span><span class="n">getcontext</span><span class="p">(</span><span class="o">&amp;</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">CO_ERROR</span><span class="p">;</span> <span class="p">}</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">status</span> <span class="o">=</span> <span class="n">COROUTINE_RUNNING</span><span class="p">;</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">.</span><span class="n">uc_stack</span><span class="p">.</span><span class="n">ss_sp</span> <span class="o">=</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">stack</span><span class="p">;</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">.</span><span class="n">uc_stack</span><span class="p">.</span><span class="n">ss_size</span> <span class="o">=</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">stack_size</span><span class="p">;</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">.</span><span class="n">uc_stack</span><span class="p">.</span><span class="n">ss_flags</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">.</span><span class="n">uc_link</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">main</span><span class="p">;</span> <span class="c1">// 协程主入口 coroutine_mainfunc</span> <span class="n">makecontext</span><span class="p">(</span><span class="o">&amp;</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="p">)(</span><span class="kt">void</span><span class="p">))</span> <span class="n">coroutine_mainfunc</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">co</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">swapcontext</span><span class="p">(</span><span class="o">&amp;</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">main</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">CO_ERROR</span><span class="p">;</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span> <span class="k">case</span> <span class="n">COROUTINE_SUSPEND</span><span class="p">:</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">status</span> <span class="o">=</span> <span class="n">COROUTINE_RUNNING</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">swapcontext</span><span class="p">(</span><span class="o">&amp;</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">main</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">CO_ERROR</span><span class="p">;</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span> <span class="nl">default:</span> <span class="cm">/* unreachable */</span> <span class="n">assert</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="p">}</span> <span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">done</span><span class="p">)</span> <span class="p">{</span> <span class="n">coroutine_destroy</span><span class="p">(</span><span class="n">co</span><span class="p">);</span> <span class="p">}</span> <span class="k">return</span> <span class="n">CO_OK</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <h3 id="coroutine_mainfunc">coroutine_mainfunc</h3> <p>协程运行的入口函数,间接的调用传递的入口函数,并设置协程完成标识位。</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">coroutine_mainfunc</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">)</span> <span class="p">{</span> <span class="n">coroutine_t</span> <span class="o">*</span><span class="n">co</span> <span class="o">=</span> <span class="n">data</span><span class="p">;</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">func</span><span class="p">(</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">ud</span><span class="p">);</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">done</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <h3 id="coroutine_yield">coroutine_yield</h3> <p>协程暂停,切换到 main context 运行</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">coroutine_yield</span><span class="p">(</span><span class="n">coroutine_t</span> <span class="o">*</span><span class="n">co</span><span class="p">)</span> <span class="p">{</span> <span class="n">co</span><span class="o">-&gt;</span><span class="n">status</span> <span class="o">=</span> <span class="n">COROUTINE_SUSPEND</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">swapcontext</span><span class="p">(</span><span class="o">&amp;</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">co</span><span class="o">-&gt;</span><span class="n">main</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">CO_ERROR</span><span class="p">;</span> <span class="p">}</span> <span class="k">return</span> <span class="n">CO_OK</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <h2 id="协程优化">协程优化</h2> <p>协程运行会频繁的调用 <code class="language-plaintext highlighter-rouge">swapcontext</code> 与 <code class="language-plaintext highlighter-rouge">getcontext</code>,如果继续使用 <code class="language-plaintext highlighter-rouge">ucontext</code> 系列结构,那么精简 <code class="language-plaintext highlighter-rouge">ucontext</code> 调用的汇编指令会是优化的关键</p> <ol> <li>移除 <code class="language-plaintext highlighter-rouge">swapcontext</code> 内部调用设置的 <code class="language-plaintext highlighter-rouge">sig_flags</code> API 操作</li> <li>移除参数寄存器 (x64 上面是 RDI, RDX, RCX, R8, R9 and RSI) 操作</li> <li>移除浮点数寄存器操作</li> </ol> <h3 id="ucontext_ih">ucontext_i.h</h3> <p>定义寄存器存储的偏移量</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define SIG_BLOCK 0 #define SIG_SETMASK 2 #define _NSIG8 8 #define oRBP 120 #define oRSP 160 #define oRBX 128 #define oR8 40 #define oR9 48 #define oR10 56 #define oR11 64 #define oR12 72 #define oR13 80 #define oR14 88 #define oR15 96 #define oRDI 104 #define oRSI 112 #define oRDX 136 #define oRAX 144 #define oRCX 152 #define oRIP 168 #define oEFL 176 #define oFPREGS 224 #define oSIGMASK 296 #define oFPREGSMEM 424 #define oMXCSR 448 </span></code></pre></div></div> <h3 id="lightweight_getcontext">lightweight_getcontext</h3> <p>轻量级的 getcontext 实现</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">lightweight_getcontext</span><span class="p">.</span><span class="n">S</span> <span class="cp">#include "ucontext_i.h" </span> <span class="p">.</span><span class="n">globl</span> <span class="n">lightweight_getcontext</span><span class="p">;</span> <span class="p">.</span><span class="n">type</span> <span class="n">lightweight_getcontext</span><span class="p">,</span> <span class="err">@</span><span class="n">function</span><span class="p">;</span> <span class="n">lightweight_getcontext</span><span class="o">:</span> <span class="p">.</span><span class="n">cfi_startproc</span><span class="p">;</span> <span class="cm">/* Save the preserved registers, the registers used for passing args, and the return address. */</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rbx</span><span class="p">,</span> <span class="n">oRBX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rbp</span><span class="p">,</span> <span class="n">oRBP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r12</span><span class="p">,</span> <span class="n">oR12</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r13</span><span class="p">,</span> <span class="n">oR13</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r14</span><span class="p">,</span> <span class="n">oR14</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r15</span><span class="p">,</span> <span class="n">oR15</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rdi</span><span class="p">,</span> <span class="n">oRDI</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rsi</span><span class="p">,</span> <span class="n">oRSI</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rdx</span><span class="p">,</span> <span class="n">oRDX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRCX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r8</span><span class="p">,</span> <span class="n">oR8</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r9</span><span class="p">,</span> <span class="n">oR9</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRIP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">leaq</span> <span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="cm">/* Exclude the return address. */</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRSP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="cm">/* We have separate floating-point register content memory on the stack. We use the __fpregs_mem block in the context. Set the links up correctly. */</span> <span class="n">leaq</span> <span class="n">oFPREGSMEM</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oFPREGS</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="cm">/* Save the floating-point environment. */</span> <span class="n">fnstenv</span> <span class="p">(</span><span class="o">%</span><span class="n">rcx</span><span class="p">)</span> <span class="n">stmxcsr</span> <span class="n">oMXCSR</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="cm">/* Formerly here: a call to sigprocmask. Deleted because unnecessary for our application. */</span> <span class="cm">/* All done, return 0 for success. */</span> <span class="n">xorl</span> <span class="o">%</span><span class="n">eax</span><span class="p">,</span> <span class="o">%</span><span class="n">eax</span> <span class="n">ret</span> <span class="p">.</span><span class="n">cfi_endproc</span><span class="p">;</span> </code></pre></div></div> <h3 id="lightweight_swapcontext">lightweight_swapcontext</h3> <p>轻量级的 swapcontext 实现,移除了注册信号的系统调用</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include "ucontext_i.h" </span> <span class="p">.</span><span class="n">globl</span> <span class="n">lightweight_swapcontext</span><span class="p">;</span> <span class="p">.</span><span class="n">type</span> <span class="n">lightweight_swapcontext</span><span class="p">,</span> <span class="err">@</span><span class="n">function</span><span class="p">;</span> <span class="n">lightweight_swapcontext</span><span class="o">:</span> <span class="p">.</span><span class="n">cfi_startproc</span><span class="p">;</span> <span class="cm">/* Save the preserved registers, the registers used for passing args, and the return address. */</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rbx</span><span class="p">,</span> <span class="n">oRBX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rbp</span><span class="p">,</span> <span class="n">oRBP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r12</span><span class="p">,</span> <span class="n">oR12</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r13</span><span class="p">,</span> <span class="n">oR13</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r14</span><span class="p">,</span> <span class="n">oR14</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r15</span><span class="p">,</span> <span class="n">oR15</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="cm">/* Don't bother saving and restoring argument registers */</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rdi</span><span class="p">,</span> <span class="n">oRDI</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rsi</span><span class="p">,</span> <span class="n">oRSI</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rdx</span><span class="p">,</span> <span class="n">oRDX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRCX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r8</span><span class="p">,</span> <span class="n">oR8</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="o">%</span><span class="n">r9</span><span class="p">,</span> <span class="n">oR9</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">movq</span> <span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRIP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="n">leaq</span> <span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="cm">/* Exclude the return address. */</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRSP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="cm">/* We have separate floating-point register content memory on the stack. We use the __fpregs_mem block in the context. Set the links up correctly. */</span> <span class="n">leaq</span> <span class="n">oFPREGSMEM</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oFPREGS</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="cm">/* Save the floating-point environment. */</span> <span class="n">fnstenv</span> <span class="p">(</span><span class="o">%</span><span class="n">rcx</span><span class="p">)</span> <span class="n">stmxcsr</span> <span class="n">oMXCSR</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span> <span class="cm">/* Formerly here: a call to sigprocmask. Deleted because unnecessary for our application. */</span> <span class="cm">/* Restore the floating-point context. Not the registers, only the rest. */</span> <span class="n">movq</span> <span class="n">oFPREGS</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="n">fldenv</span> <span class="p">(</span><span class="o">%</span><span class="n">rcx</span><span class="p">)</span> <span class="n">ldmxcsr</span> <span class="n">oMXCSR</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">)</span> <span class="cm">/* Load the new stack pointer and the preserved registers. */</span> <span class="n">movq</span> <span class="n">oRSP</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rsp</span> <span class="n">movq</span> <span class="n">oRBX</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rbx</span> <span class="n">movq</span> <span class="n">oRBP</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rbp</span> <span class="n">movq</span> <span class="n">oR12</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r12</span> <span class="n">movq</span> <span class="n">oR13</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r13</span> <span class="n">movq</span> <span class="n">oR14</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r14</span> <span class="n">movq</span> <span class="n">oR15</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r15</span> <span class="cm">/* The following ret should return to the address set with getcontext. Therefore push the address on the stack. */</span> <span class="n">movq</span> <span class="n">oRIP</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="n">pushq</span> <span class="o">%</span><span class="n">rcx</span> <span class="cm">/* Setup registers used for passing args--don't bother with this */</span> <span class="n">movq</span> <span class="n">oRDI</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rdi</span> <span class="n">movq</span> <span class="n">oRDX</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rdx</span> <span class="n">movq</span> <span class="n">oRCX</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="n">movq</span> <span class="n">oR8</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r8</span> <span class="n">movq</span> <span class="n">oR9</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r9</span> <span class="cm">/* Setup finally %rsi. */</span> <span class="n">movq</span> <span class="n">oRSI</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rsi</span> <span class="cm">/* Clear rax to indicate success. */</span> <span class="n">xorl</span> <span class="o">%</span><span class="n">eax</span><span class="p">,</span> <span class="o">%</span><span class="n">eax</span> <span class="n">ret</span> <span class="p">.</span><span class="n">cfi_endproc</span> </code></pre></div></div> <h2 id="references">References</h2> <p><a href="https://github.com/cppcoffee/coroutine">https://github.com/cppcoffee/coroutine</a></p> <p><a href="https://man7.org/linux/man-pages/man3/swapcontext.3.html">https://man7.org/linux/man-pages/man3/swapcontext.3.html</a></p> <p><a href="https://github.com/cloudwu/coroutine">https://github.com/cloudwu/coroutine</a></p> <p><a href="https://rethinkdb.com/blog/making-coroutines-fast/">https://rethinkdb.com/blog/making-coroutines-fast/</a></p>Sharp Liuucontext实现mini协程库与优化 简介IP防火墙 – XDP实现2021-10-17T00:00:00+00:002021-10-17T00:00:00+00:00https://cppcoffee.github.io/linux/kernel/2021/10/17/IP%E9%98%B2%E7%81%AB%E5%A2%99--XDP%E5%AE%9E%E7%8E%B0<p>IP防火墙 – XDP实现</p> <h3 id="xdp-简介">XDP 简介</h3> <p>XDP 在 linux 4.8 版本内核中引入,在位于数据包接受最早的数据点(还未分配 <code class="language-plaintext highlighter-rouge">struct __sk_buff</code>),可以直接对数据包改写、丢弃或转发等操作。</p> <p>本文将用户层传递进来的规则进行操作(丢弃 或 允许),来实现 IP 防火墙的功能。</p> <h3 id="ip-block">IP Block</h3> <p>实现分成两部分,用户接口部分 与 内核部分。</p> <p>用户接口提供两个程序,分别是 加载器 和 IP规则修改:</p> <p><strong>ipblock-loader</strong>: XDP 加载器,将 IP Block XDP Prog 挂载到内核中:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># attach IP Block to eth2</span> ./ipblock-loader <span class="nt">-d</span> eth2 <span class="c"># detach IP Block from eth2</span> ./ipblock-loader <span class="nt">-d</span> eth2 <span class="nt">-u</span> </code></pre></div></div> <p><strong>ipblock-rule</strong>: 通过 XDP 暴露的 MAP 结构,变更 IP Block 规则:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># droping IP packets for the ::ffff:c612:13/128</span> <span class="nv">$ </span>./ipblock-rule <span class="nt">-a</span> ::ffff:c612:13/128 <span class="nt">-p</span> deny <span class="c"># allow IP packets for the 192.168.31.0/24</span> <span class="nv">$ </span>./ipblock-rule <span class="nt">-a</span> 192.168.31.0/24 <span class="nt">-p</span> allow <span class="c"># delete rules</span> <span class="nv">$ </span>./ipblock-rule <span class="nt">-d</span> ::ffff:c612:13/128 <span class="nv">$ </span>./ipblock-rule <span class="nt">-d</span> 192.168.31.0/24 </code></pre></div></div> <h4 id="map-存储结构">Map 存储结构</h4> <p>ipblock XDP 程序里定义 IPv4 和 IPv6 两个类型的前缀树 map,方便应用层调用 bpf helper API 进行操作。</p> <p>map key 类型使用 <code class="language-plaintext highlighter-rouge">bpf_lpm_triekey</code> + <code class="language-plaintext highlighter-rouge">sockaddr</code> map value 类型为 <code class="language-plaintext highlighter-rouge">enum xdp_action</code></p> <p>IPv4 sockaddr 使用 <code class="language-plaintext highlighter-rouge">uint32_t</code> 类型存放(与 <code class="language-plaintext highlighter-rouge">struct in_addr</code> 类型的内存模型一致) IPv6 sockaddr 使用 <code class="language-plaintext highlighter-rouge">struct in6_addr</code></p> <p>如下所示:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">lpm_v4_key</span> <span class="p">{</span> <span class="k">struct</span> <span class="n">bpf_lpm_trie_key</span> <span class="n">lpm</span><span class="p">;</span> <span class="kt">uint32_t</span> <span class="n">addr</span><span class="p">;</span> <span class="p">};</span> <span class="k">struct</span> <span class="n">lpm_v6_key</span> <span class="p">{</span> <span class="k">struct</span> <span class="n">bpf_lpm_trie_key</span> <span class="n">lpm</span><span class="p">;</span> <span class="k">struct</span> <span class="n">in6_addr</span> <span class="n">addr</span><span class="p">;</span> <span class="p">};</span> <span class="c1">// IPv4 map</span> <span class="k">struct</span> <span class="p">{</span> <span class="n">__uint</span><span class="p">(</span><span class="n">type</span><span class="p">,</span> <span class="n">BPF_MAP_TYPE_LPM_TRIE</span><span class="p">);</span> <span class="n">__uint</span><span class="p">(</span><span class="n">max_entries</span><span class="p">,</span> <span class="n">MAX_RULES</span><span class="p">);</span> <span class="n">__type</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="k">struct</span> <span class="n">lpm_v4_key</span><span class="p">);</span> <span class="n">__type</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="k">enum</span> <span class="n">xdp_action</span><span class="p">);</span> <span class="n">__uint</span><span class="p">(</span><span class="n">map_flags</span><span class="p">,</span> <span class="n">BPF_F_NO_PREALLOC</span><span class="p">);</span> <span class="p">}</span> <span class="n">ipv4_map</span> <span class="nf">SEC</span><span class="p">(</span><span class="s">".maps"</span><span class="p">);</span> <span class="c1">// IPv6 map</span> <span class="k">struct</span> <span class="p">{</span> <span class="n">__uint</span><span class="p">(</span><span class="n">type</span><span class="p">,</span> <span class="n">BPF_MAP_TYPE_LPM_TRIE</span><span class="p">);</span> <span class="n">__uint</span><span class="p">(</span><span class="n">max_entries</span><span class="p">,</span> <span class="n">MAX_RULES</span><span class="p">);</span> <span class="n">__type</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="k">struct</span> <span class="n">lpm_v6_key</span><span class="p">);</span> <span class="n">__type</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="k">enum</span> <span class="n">xdp_action</span><span class="p">);</span> <span class="n">__uint</span><span class="p">(</span><span class="n">map_flags</span><span class="p">,</span> <span class="n">BPF_F_NO_PREALLOC</span><span class="p">);</span> <span class="p">}</span> <span class="n">ipv6_map</span> <span class="nf">SEC</span><span class="p">(</span><span class="s">".maps"</span><span class="p">);</span> </code></pre></div></div> <h4 id="xdp-实现逻辑">XDP 实现逻辑</h4> <p>由于解析部分代码重复性比较多,做成了宏,简化重复的代码</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define PARSE_FUNC_DECLARATION(STRUCT) \ static __always_inline \ struct STRUCT *parse_ ## STRUCT (struct cursor *c) \ { \ struct STRUCT *ret = c-&gt;pos; \ if (c-&gt;pos + sizeof(struct STRUCT) &gt; c-&gt;end) { \ return NULL; \ } \ c-&gt;pos += sizeof(struct STRUCT); \ return ret; \ } </span> <span class="n">PARSE_FUNC_DECLARATION</span><span class="p">(</span><span class="n">ethhdr</span><span class="p">)</span> <span class="n">PARSE_FUNC_DECLARATION</span><span class="p">(</span><span class="n">vlanhdr</span><span class="p">)</span> <span class="n">PARSE_FUNC_DECLARATION</span><span class="p">(</span><span class="n">iphdr</span><span class="p">)</span> <span class="n">PARSE_FUNC_DECLARATION</span><span class="p">(</span><span class="n">ipv6hdr</span><span class="p">)</span> </code></pre></div></div> <p><code class="language-plaintext highlighter-rouge">struct cursor</code> 使用保存了待解析的数据位置。</p> <p><code class="language-plaintext highlighter-rouge">PARSE_FUNC_DECLARATION(iphdr)</code> 宏定义展开后,生成如下代码:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">__always_inline</span> <span class="k">struct</span> <span class="n">iphdr</span> <span class="o">*</span><span class="nf">parse_iphdr</span><span class="p">(</span><span class="k">struct</span> <span class="n">cursor</span> <span class="o">*</span><span class="n">c</span><span class="p">)</span> <span class="p">{</span> <span class="k">struct</span> <span class="n">iphdr</span> <span class="o">*</span><span class="n">ret</span> <span class="o">=</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">pos</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">c</span><span class="o">-&gt;</span><span class="n">pos</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">iphdr</span><span class="p">)</span> <span class="o">&gt;</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">end</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span> <span class="n">c</span><span class="o">-&gt;</span><span class="n">pos</span> <span class="o">+=</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">iphdr</span><span class="p">);</span> <span class="k">return</span> <span class="n">ret</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>以下是数据包处理逻辑:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SEC</span><span class="p">(</span><span class="s">"xdp"</span><span class="p">)</span> <span class="kt">int</span> <span class="nf">xdp_prog</span><span class="p">(</span><span class="k">struct</span> <span class="n">xdp_md</span> <span class="o">*</span><span class="n">ctx</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="n">rc</span> <span class="o">=</span> <span class="n">XDP_PASS</span><span class="p">;</span> <span class="n">cursor_init</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="n">ctx</span><span class="p">);</span> <span class="c1">// 解析 eth header</span> <span class="n">eth</span> <span class="o">=</span> <span class="n">parse_eth</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">eth_proto</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">eth</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">goto</span> <span class="n">pass</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// 解析 IP header</span> <span class="k">if</span> <span class="p">(</span><span class="n">eth_proto</span> <span class="o">==</span> <span class="n">bpf_htons</span><span class="p">(</span><span class="n">ETH_P_IP</span><span class="p">))</span> <span class="p">{</span> <span class="n">iph</span> <span class="o">=</span> <span class="n">parse_iphdr</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">iph</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">goto</span> <span class="n">pass</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// 从 ipv4 map 中拿到 action</span> <span class="n">rc</span> <span class="o">=</span> <span class="n">ip_map_lookup_value</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ipv4_map</span><span class="p">,</span> <span class="n">iph</span><span class="o">-&gt;</span><span class="n">saddr</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">eth_proto</span> <span class="o">==</span> <span class="n">bpf_htons</span><span class="p">(</span><span class="n">ETH_P_IPV6</span><span class="p">))</span> <span class="p">{</span> <span class="n">ip6h</span> <span class="o">=</span> <span class="n">parse_ipv6hdr</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">ip6h</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">goto</span> <span class="n">pass</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// 从 ipv6 map 中拿到 action</span> <span class="n">rc</span> <span class="o">=</span> <span class="n">ip6_map_lookup_value</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ipv6_map</span><span class="p">,</span> <span class="n">ip6h</span><span class="o">-&gt;</span><span class="n">saddr</span><span class="p">);</span> <span class="p">}</span> <span class="nl">pass:</span> <span class="k">return</span> <span class="n">rc</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <h4 id="ipblock-loader">ipblock-loader</h4> <p>ipblock-loader 是 XDP 加载器,用于将 XDP program 挂载到指定网卡中。</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">do_load</span><span class="p">(</span><span class="k">struct</span> <span class="n">options</span> <span class="o">*</span><span class="n">opt</span><span class="p">,</span> <span class="k">struct</span> <span class="n">ipblock_bpf</span> <span class="o">*</span><span class="n">skel</span><span class="p">)</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">err</span><span class="p">;</span> <span class="c1">// 挂载 XDP 到指定网卡</span> <span class="n">err</span> <span class="o">=</span> <span class="n">xdp_link_attach</span><span class="p">(</span><span class="n">opt</span><span class="o">-&gt;</span><span class="n">ifindex</span><span class="p">,</span> <span class="n">opt</span><span class="o">-&gt;</span><span class="n">xdp_flags</span><span class="p">,</span> <span class="n">bpf_program__fd</span><span class="p">(</span><span class="n">skel</span><span class="o">-&gt;</span><span class="n">progs</span><span class="p">.</span><span class="n">xdp_prog</span><span class="p">));</span> <span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">err</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// PIN map 到 bpf fs 中</span> <span class="n">err</span> <span class="o">=</span> <span class="n">pin_maps_in_bpf_object</span><span class="p">(</span><span class="n">skel</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="n">err</span><span class="p">;</span> <span class="p">}</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>挂载成功后,将 map PIN 到 bpf fs,路径分别为:</p> <ul> <li>/sys/fs/bpf/ipblock/ipv4_map</li> <li>/sys/fs/bpf/ipblock/ipv6_map</li> </ul> <h4 id="ipblock-rule">ipblock-rule</h4> <p>ipblock-rule 实现为规则控制程序,用于增删改规则</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span> <span class="nf">do_add_cmd</span><span class="p">(</span><span class="n">options_t</span> <span class="o">*</span><span class="n">opt</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="c1">// 根据 IP地址类型,打开对应的 bpf map</span> <span class="n">fd</span> <span class="o">=</span> <span class="n">open_bpf_map</span><span class="p">(</span><span class="n">opt</span><span class="o">-&gt;</span><span class="n">cidr</span><span class="p">.</span><span class="n">af</span><span class="p">);</span> <span class="p">...</span> <span class="c1">// 设置 bpf_lpm_trie_key</span> <span class="n">lpm</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">lpm</span><span class="p">)</span> <span class="o">+</span> <span class="n">opt</span><span class="o">-&gt;</span><span class="n">cidr</span><span class="p">.</span><span class="n">socklen</span><span class="p">);</span> <span class="n">lpm</span><span class="o">-&gt;</span><span class="n">prefixlen</span> <span class="o">=</span> <span class="n">opt</span><span class="o">-&gt;</span><span class="n">cidr</span><span class="p">.</span><span class="n">prefixlen</span><span class="p">;</span> <span class="n">memcpy</span><span class="p">(</span><span class="n">lpm</span><span class="o">-&gt;</span><span class="n">data</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">opt</span><span class="o">-&gt;</span><span class="n">cidr</span><span class="p">.</span><span class="n">sockaddr</span><span class="p">,</span> <span class="n">opt</span><span class="o">-&gt;</span><span class="n">cidr</span><span class="p">.</span><span class="n">socklen</span><span class="p">);</span> <span class="c1">// BPF_ANY 增加或更新规则</span> <span class="k">if</span> <span class="p">(</span><span class="n">bpf_map_update_elem</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">lpm</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">opt</span><span class="o">-&gt;</span><span class="n">action</span><span class="p">,</span> <span class="n">BPF_ANY</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span> <span class="p">...</span> <span class="p">}</span> </code></pre></div></div> <p>详细代码在文末的 github 仓库链接中。</p> <h3 id="reference">Reference</h3> <p><a href="https://docs.cilium.io/en/v1.10/bpf/">BPF and XDP Reference Guide</a></p> <p><a href="https://github.com/libbpf/libbpf">github libbpf</a></p> <p><a href="https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html">BPF Portability and CO-RE</a></p> <p><a href="https://github.com/cppcoffee/ipblock">https://github.com/cppcoffee/ipblock</a></p>Sharp LiuIP防火墙 – XDP实现Hugepage 内存分配器 – Rust实现2021-07-24T00:00:00+00:002021-07-24T00:00:00+00:00https://cppcoffee.github.io/system/program/2021/07/24/hugepage%E5%86%85%E5%AD%98%E5%88%86%E9%85%8D%E5%99%A8--rust%E5%AE%9E%E7%8E%B0<p>Hugepage 内存分配器 – Rust实现</p> <h3 id="hugepage简介">HugePage简介</h3> <p>Linux 默认内存页大小是 4KB(x86和x86_64),hugepage 的特性允许内核管理比默认内存页还要大的内存页(Huge Page)。</p> <p>在 Linux 虚拟内存系统中维护一张 TLB(Translation Lookaside Buffer)的表,该表用于虚拟内存地址映射到物理内存地址。当系统需要访问一个虚拟内存位置时,需要进行 TLB 查找并进行地址转换。</p> <p>启用 HugePages 后,系统使用更少的页表,减少了维护和访问页表的开销。Hugepages 保持在内存中,不被 swap,所以内核 swap 守护程序没有管理它们的工作,内核也不需要为它们执行页表查找。较少的页面数量减少了执行内存操作的开销,同时也减少了访问页表时出现瓶颈的可能性。</p> <p>HugePage 在 x86 上是 4MB,x86_64 是 2MB。</p> <p>关于 Hugepages 更详细的内容可以参考本文末尾的 References。</p> <h3 id="hugepage-api">HugePage API</h3> <p>Linux 提供 mmap(MAP_HUGETLB) 来分配 hugepages,如下调用分配 len 长度的 hugepages。flags 参数传递 MAP_HUGETLB:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">flags</span> <span class="o">=</span> <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_HUGETLB</span><span class="p">;</span> <span class="kt">void</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span><span class="n">null_ptr</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span> </code></pre></div></div> <h3 id="allocator">Allocator</h3> <p>接下来使用 rust 实现一个 hugepage 分配器</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">MEMINFO_PATH</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span> <span class="o">=</span> <span class="s">"/proc/meminfo"</span><span class="p">;</span> <span class="k">const</span> <span class="n">TOKEN</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span> <span class="o">=</span> <span class="s">"Hugepagesize:"</span><span class="p">;</span> <span class="nd">lazy_static!</span> <span class="p">{</span> <span class="c">// 从 '/proc/meminfo' 中解析出 'Hugepagesize' 来初始化全局变量 HUGEPAGE_SIZE</span> <span class="c">// HUGEPAGE_SIZE 用于 Allocator 分配内存时做对齐用。</span> <span class="k">static</span> <span class="k">ref</span> <span class="n">HUGEPAGE_SIZE</span><span class="p">:</span> <span class="nb">isize</span> <span class="o">=</span> <span class="p">{</span> <span class="k">let</span> <span class="n">buf</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">open</span><span class="p">(</span><span class="n">MEMINFO_PATH</span><span class="p">)</span><span class="nf">.map_or</span><span class="p">(</span><span class="s">""</span><span class="nf">.to_owned</span><span class="p">(),</span> <span class="p">|</span><span class="k">mut</span> <span class="n">f</span><span class="p">|</span> <span class="p">{</span> <span class="k">let</span> <span class="k">mut</span> <span class="n">s</span> <span class="o">=</span> <span class="nn">String</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span> <span class="k">let</span> <span class="mi">_</span> <span class="o">=</span> <span class="n">f</span><span class="nf">.read_to_string</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="n">s</span><span class="p">);</span> <span class="n">s</span> <span class="p">});</span> <span class="nf">parse_hugepage_size</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">)</span> <span class="p">};</span> <span class="p">}</span> <span class="c">// 解析 Hugepagesize</span> <span class="c">// meminfo 内容存在多行,需一行行找到 TOKEN='Hugepagesize:' 并对值进行解析</span> <span class="k">fn</span> <span class="nf">parse_hugepage_size</span><span class="p">(</span><span class="n">s</span><span class="p">:</span> <span class="o">&amp;</span><span class="nb">str</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">isize</span> <span class="p">{</span> <span class="k">for</span> <span class="n">line</span> <span class="n">in</span> <span class="n">s</span><span class="nf">.lines</span><span class="p">()</span> <span class="p">{</span> <span class="c">// 找到 ‘Hugepagesize:’ 前缀</span> <span class="k">if</span> <span class="n">line</span><span class="nf">.starts_with</span><span class="p">(</span><span class="n">TOKEN</span><span class="p">)</span> <span class="p">{</span> <span class="k">let</span> <span class="k">mut</span> <span class="n">parts</span> <span class="o">=</span> <span class="n">line</span><span class="p">[</span><span class="n">TOKEN</span><span class="nf">.len</span><span class="p">()</span><span class="o">..</span><span class="p">]</span><span class="nf">.split_whitespace</span><span class="p">();</span> <span class="c">// parse size</span> <span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="n">parts</span><span class="nf">.next</span><span class="p">()</span><span class="nf">.unwrap_or</span><span class="p">(</span><span class="s">"0"</span><span class="p">);</span> <span class="k">let</span> <span class="k">mut</span> <span class="n">hugepage_size</span> <span class="o">=</span> <span class="n">p</span><span class="py">.parse</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">isize</span><span class="o">&gt;</span><span class="p">()</span><span class="nf">.unwrap_or</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span> <span class="c">// parse unit</span> <span class="n">hugepage_size</span> <span class="o">*=</span> <span class="n">parts</span><span class="nf">.next</span><span class="p">()</span><span class="nf">.map_or</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">|</span><span class="n">x</span><span class="p">|</span> <span class="k">match</span> <span class="n">x</span> <span class="p">{</span> <span class="c">// 当前支持 kB 解析</span> <span class="s">"kB"</span> <span class="k">=&gt;</span> <span class="mi">1024</span><span class="p">,</span> <span class="mi">_</span> <span class="k">=&gt;</span> <span class="mi">1</span><span class="p">,</span> <span class="p">});</span> <span class="k">return</span> <span class="n">hugepage_size</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>定义 Allocator 结构体,采用空结构体类型(不需要内部数据,所以无任何结构字段)</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span><span class="p">(</span><span class="n">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="n">HugePageAllocator</span><span class="p">;</span> </code></pre></div></div> <p>使用 libc crate 提供的接口来调用 <strong>libc::mmap</strong>。那么接下来实现 std::alloc::GlobalAlloc trait:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// 实现 GlobalAlloc trait</span> <span class="k">unsafe</span> <span class="k">impl</span> <span class="n">GlobalAlloc</span> <span class="k">for</span> <span class="n">HugePageAllocator</span> <span class="p">{</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">alloc</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">layout</span><span class="p">:</span> <span class="n">Layout</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span> <span class="p">{</span> <span class="c">// 分配的内存大小需对齐 HUGEPAGE_SIZE,调用辅助函数 align_to</span> <span class="k">let</span> <span class="n">len</span> <span class="o">=</span> <span class="nf">align_to</span><span class="p">(</span><span class="n">layout</span><span class="nf">.size</span><span class="p">(),</span> <span class="o">*</span><span class="n">HUGEPAGE_SIZE</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">);</span> <span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="nf">mmap</span><span class="p">(</span> <span class="nf">null_mut</span><span class="p">(),</span> <span class="n">len</span><span class="p">,</span> <span class="n">PROT_READ</span> <span class="p">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span> <span class="n">MAP_PRIVATE</span> <span class="p">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="p">|</span> <span class="n">MAP_HUGETLB</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="p">);</span> <span class="c">// 无法分配 hugepage 则返回 null.</span> <span class="k">if</span> <span class="n">p</span> <span class="o">==</span> <span class="n">MAP_FAILED</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">null_mut</span><span class="p">();</span> <span class="p">}</span> <span class="n">p</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span> <span class="p">}</span> <span class="c">// 删除时候也需要 layout 参数.</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">dealloc</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">p</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">layout</span><span class="p">:</span> <span class="n">Layout</span><span class="p">)</span> <span class="p">{</span> <span class="nn">libc</span><span class="p">::</span><span class="nf">munmap</span><span class="p">(</span><span class="n">p</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">c_void</span><span class="p">,</span> <span class="n">layout</span><span class="nf">.size</span><span class="p">());</span> <span class="p">}</span> <span class="p">}</span> <span class="c">// 辅助函数,用于对其字节</span> <span class="k">fn</span> <span class="nf">align_to</span><span class="p">(</span><span class="n">size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">align</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">usize</span> <span class="p">{</span> <span class="p">(</span><span class="n">size</span> <span class="o">+</span> <span class="n">align</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&amp;</span> <span class="o">!</span><span class="p">(</span><span class="n">align</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="p">}</span> </code></pre></div></div> <p>以上就完成了简单的 Hugepage 分配器。</p> <h3 id="boxed">Boxed</h3> <p>实现 Allocator 后,导出一个全局的 Allocator 给 Box 使用。</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// lib.rs 定义一个全局的 default_allocator() 接口,给整个 crate 使用。</span> <span class="nd">lazy_static!</span> <span class="p">{</span> <span class="k">static</span> <span class="k">ref</span> <span class="n">HUGEPAGE_ALLOCATOR</span><span class="p">:</span> <span class="n">HugePageAllocator</span> <span class="o">=</span> <span class="n">HugePageAllocator</span><span class="p">;</span> <span class="p">}</span> <span class="c">// 只暴露给自身 crate 调用</span> <span class="k">pub</span><span class="p">(</span><span class="n">crate</span><span class="p">)</span> <span class="k">fn</span> <span class="nf">default_allocator</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nv">'static</span> <span class="n">HugePageAllocator</span> <span class="p">{</span> <span class="o">&amp;</span><span class="n">HUGEPAGE_ALLOCATOR</span> <span class="p">}</span> </code></pre></div></div> <p>实现一个简单的 Box,支持 deref 操作,过了 Box scope 后,自动释放,具体实现如下:</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="n">data</span><span class="p">:</span> <span class="n">NonNull</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">let</span> <span class="n">layout</span> <span class="o">=</span> <span class="nn">Layout</span><span class="p">::</span><span class="nn">new</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">();</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="k">let</span> <span class="k">mut</span> <span class="n">p</span> <span class="o">=</span> <span class="nn">NonNull</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nf">default_allocator</span><span class="p">()</span><span class="nf">.alloc</span><span class="p">(</span><span class="n">layout</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">T</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span> <span class="o">*</span><span class="p">(</span><span class="n">p</span><span class="nf">.as_mut</span><span class="p">())</span> <span class="o">=</span> <span class="n">data</span><span class="p">;</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">data</span><span class="p">:</span> <span class="n">p</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">from_raw</span><span class="p">(</span><span class="n">raw</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">T</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">data</span><span class="p">:</span> <span class="nn">NonNull</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">raw</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">(),</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">Drop</span> <span class="k">for</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">fn</span> <span class="k">drop</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nf">default_allocator</span><span class="p">()</span><span class="nf">.dealloc</span><span class="p">(</span><span class="k">self</span><span class="py">.data</span><span class="nf">.as_ptr</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="nn">Layout</span><span class="p">::</span><span class="nn">new</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">());</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">Deref</span> <span class="k">for</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">type</span> <span class="n">Target</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">deref</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="n">T</span> <span class="p">{</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="k">self</span><span class="py">.data</span><span class="nf">.as_ref</span><span class="p">()</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">DerefMut</span> <span class="k">for</span> <span class="nb">Box</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">fn</span> <span class="nf">deref_mut</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">T</span> <span class="p">{</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="k">self</span><span class="py">.data</span><span class="nf">.as_mut</span><span class="p">()</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>更详细的代码在文末给的 github 仓库中。</p> <h3 id="reference">Reference</h3> <p><a href="https://lwn.net/Articles/374424/">Huge pages part 1 (Introduction)</a></p> <p><a href="https://lwn.net/Articles/375096/">Huge pages part 2: Interfaces</a></p> <p><a href="https://lwn.net/Articles/376606/">Huge pages part 3: Administration</a></p> <p><a href="https://lwn.net/Articles/378641/">Huge pages part 4: benchmarking with huge pages</a></p> <p><a href="https://lwn.net/Articles/379748/">Huge pages part 5: A deeper look at TLBs and costs</a></p> <p><a href="https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt">https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt</a></p> <p><a href="https://man7.org/linux/man-pages/man2/mmap.2.html">https://man7.org/linux/man-pages/man2/mmap.2.html</a></p> <p><a href="https://github.com/cppcoffee/hugepage-rs">https://github.com/cppcoffee/hugepage-rs</a></p>Sharp LiuHugepage 内存分配器 – Rust实现自旋读写锁实现2021-05-13T00:00:00+00:002021-05-13T00:00:00+00:00https://cppcoffee.github.io/system/program/2021/05/13/%E8%87%AA%E6%97%8B%E8%AF%BB%E5%86%99%E9%94%81%E5%AE%9E%E7%8E%B0<p>自旋读写锁实现</p> <h3 id="读写锁">读写锁</h3> <p>读写锁是并发控制的一种同步机制,也称 “共享-互斥锁”、多读者-单写者锁。读操作可以并发重入,写操作是互斥的。</p> <p>读写锁实现有多种方式,本文描述的是 <strong>自旋读写锁</strong> 的实现。</p> <h3 id="优先策略">优先策略</h3> <p>读写锁的策略分为:</p> <ul> <li>读操作优先:允许最大并发,但写操作可能饿死。</li> <li>写操作优先:一旦所有已经开始的读操作完成,等待的写操作立即获得锁。</li> <li>未指定优先级</li> </ul> <p>本文实现的读写锁策略是 <strong>写操作优先</strong></p> <h3 id="自旋读写锁的设计">自旋读写锁的设计</h3> <p>采用 uint64_t(64位整形)类型作为锁内部值。</p> <p>写操作占用 1 位最高位,其余位用于读操作。</p> <p>写操作位用十六进制表示为 0x8000000000000000,每次只能有一个写锁操作。退出时重置写操作位。</p> <p>读操作位支持多个并发读操作,最高支持 0x7FFFFFFFFFFFFFFF 个读操作。每发生一次读锁定操作,则增加 1,退出时减少 1。</p> <p><strong>备注</strong>:读写都使用 CAS 操作。</p> <h3 id="实现">实现</h3> <p>自旋读写锁 C 语言实现</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include &lt;stdio.h&gt; #include &lt;stdlib.h&gt; #include &lt;string.h&gt; #include &lt;stdbool.h&gt; #include &lt;assert.h&gt; </span> <span class="k">static</span> <span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">SHARED_LOCK_INIT</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="k">static</span> <span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="o">=</span> <span class="mi">1UL</span> <span class="o">&lt;&lt;</span> <span class="mi">63</span><span class="p">;</span> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">shared_rwlock_s</span> <span class="n">shared_rwlock_t</span><span class="p">;</span> <span class="k">struct</span> <span class="n">shared_rwlock_s</span> <span class="p">{</span> <span class="kt">uint64_t</span> <span class="n">lock</span><span class="p">;</span> <span class="p">};</span> <span class="kt">void</span> <span class="nf">shared_lock_init</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span> <span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span> <span class="o">=</span> <span class="n">SHARED_LOCK_INIT</span><span class="p">;</span> <span class="p">}</span> <span class="kt">void</span> <span class="nf">shared_read_lock</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span> <span class="kt">uint64_t</span> <span class="n">value</span><span class="p">;</span> <span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span> <span class="n">value</span> <span class="o">=</span> <span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">;</span> <span class="c1">// is wirte locked?</span> <span class="k">if</span> <span class="p">(</span><span class="n">value</span> <span class="o">&gt;=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">)</span> <span class="p">{</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// increase reader bit.</span> <span class="k">if</span> <span class="p">(</span><span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&amp;</span><span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">value</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="p">{</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="kt">void</span> <span class="nf">shared_read_unlock</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span> <span class="n">assert</span><span class="p">(</span><span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span> <span class="o">&gt;</span> <span class="n">SHARED_LOCK_INIT</span><span class="p">);</span> <span class="n">__sync_sub_and_fetch</span><span class="p">(</span><span class="o">&amp;</span><span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span> <span class="p">}</span> <span class="kt">void</span> <span class="nf">shared_write_lock</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span> <span class="kt">uint64_t</span> <span class="n">value</span><span class="p">;</span> <span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span> <span class="n">value</span> <span class="o">=</span> <span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">;</span> <span class="c1">// is wirte locked?</span> <span class="k">if</span> <span class="p">(</span><span class="n">value</span> <span class="o">&gt;=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">)</span> <span class="p">{</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// set write lock bit.</span> <span class="k">if</span> <span class="p">(</span><span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&amp;</span><span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">value</span> <span class="o">|</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">))</span> <span class="p">{</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="c1">// wait for active readers.</span> <span class="k">while</span> <span class="p">(</span><span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span> <span class="o">!=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* void */</span> <span class="p">}</span> <span class="p">}</span> <span class="kt">void</span> <span class="nf">shared_write_unlock</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span> <span class="n">assert</span><span class="p">(</span><span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span> <span class="o">==</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">);</span> <span class="n">__sync_sub_and_fetch</span><span class="p">(</span><span class="o">&amp;</span><span class="n">p</span><span class="o">-&gt;</span><span class="n">lock</span><span class="p">,</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <h3 id="附加功能">附加功能</h3> <p>在锁内部增加一个循环等待上限值,当循环计数到达阈值时,仍然没有获得锁,让出当前 CPU 时间片。</p> <p>伪代码</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">count</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="n">is_write_locked</span><span class="p">(</span><span class="o">&amp;</span><span class="n">rwlock</span><span class="p">))</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="o">++</span><span class="n">count</span> <span class="o">&gt;=</span> <span class="n">limit_rate</span><span class="p">)</span> <span class="p">{</span> <span class="n">count</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">sched_yield</span><span class="p">();</span> <span class="p">}</span> <span class="p">}</span> <span class="p">...</span> <span class="p">}</span> </code></pre></div></div> <h3 id="rust-实现版本">Rust 实现版本</h3> <p>该版本读写锁使用 RAII 哨兵,并增加了 owner 字段,能够发现自身线程在使用过程中产生的死锁问题。</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">cell</span><span class="p">::</span><span class="n">UnsafeCell</span><span class="p">;</span> <span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ops</span><span class="p">::{</span><span class="n">Deref</span><span class="p">,</span> <span class="n">DerefMut</span><span class="p">};</span> <span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="nn">atomic</span><span class="p">::{</span><span class="n">AtomicU64</span><span class="p">,</span> <span class="n">Ordering</span><span class="p">};</span> <span class="k">use</span> <span class="nn">crate</span><span class="p">::{</span><span class="n">Error</span><span class="p">,</span> <span class="n">Result</span><span class="p">};</span> <span class="c">// The writer lock bit.</span> <span class="k">const</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">:</span> <span class="nb">u64</span> <span class="o">=</span> <span class="mi">1u64</span> <span class="o">&lt;&lt;</span> <span class="mi">63</span><span class="p">;</span> <span class="k">unsafe</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="nb">Send</span> <span class="k">for</span> <span class="n">SharedLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{}</span> <span class="k">unsafe</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">Sync</span> <span class="k">for</span> <span class="n">SharedLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{}</span> <span class="cm">/* * A reader-writer lock */</span> <span class="k">pub</span> <span class="k">struct</span> <span class="n">SharedLock</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">&gt;</span> <span class="p">{</span> <span class="n">inner</span><span class="p">:</span> <span class="n">AtomicU64</span><span class="p">,</span> <span class="n">owner</span><span class="p">:</span> <span class="n">AtomicU64</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">UnsafeCell</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">SharedLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">t</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">SharedLock</span> <span class="p">{</span> <span class="n">inner</span><span class="p">:</span> <span class="nn">AtomicU64</span><span class="p">::</span><span class="nf">default</span><span class="p">(),</span> <span class="n">owner</span><span class="p">:</span> <span class="nn">AtomicU64</span><span class="p">::</span><span class="nf">default</span><span class="p">(),</span> <span class="n">data</span><span class="p">:</span> <span class="nn">UnsafeCell</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">t</span><span class="p">),</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">&gt;</span> <span class="n">SharedLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">pub</span> <span class="k">fn</span> <span class="nf">read</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Result</span><span class="o">&lt;</span><span class="n">SharedLockReadGuard</span><span class="o">&lt;</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;&gt;</span> <span class="p">{</span> <span class="nn">SharedLockReadGuard</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="k">self</span><span class="p">)</span> <span class="p">}</span> <span class="k">pub</span> <span class="k">fn</span> <span class="nf">write</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Result</span><span class="o">&lt;</span><span class="n">SharedLockWriteGuard</span><span class="o">&lt;</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;&gt;</span> <span class="p">{</span> <span class="nn">SharedLockWriteGuard</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="k">self</span><span class="p">)</span> <span class="p">}</span> <span class="k">fn</span> <span class="nf">is_hold</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">bool</span> <span class="p">{</span> <span class="k">let</span> <span class="n">tid</span> <span class="o">=</span> <span class="k">self</span><span class="py">.owner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">);</span> <span class="n">tid</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">tid</span> <span class="o">==</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">libc</span><span class="p">::</span><span class="nf">pthread_self</span><span class="p">()</span> <span class="p">}</span> <span class="k">as</span> <span class="nb">u64</span> <span class="p">}</span> <span class="k">fn</span> <span class="nf">set_owner_id</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">tid</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="p">{</span> <span class="k">self</span><span class="py">.owner</span><span class="nf">.store</span><span class="p">(</span><span class="n">tid</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="cm">/* * RAII structure used to release the shared read access of a lock when dropped. * This structure is created by the read methods on SharedLock. */</span> <span class="k">pub</span> <span class="k">struct</span> <span class="n">SharedLockReadGuard</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span> <span class="o">+</span> <span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span> <span class="n">lock</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">SharedLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">&gt;</span> <span class="n">SharedLockReadGuard</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">lock</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">SharedLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Result</span><span class="o">&lt;</span><span class="n">SharedLockReadGuard</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;&gt;</span> <span class="p">{</span> <span class="k">if</span> <span class="n">lock</span><span class="nf">.is_hold</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">Error</span><span class="p">::</span><span class="n">DeadLockError</span><span class="p">);</span> <span class="p">}</span> <span class="k">loop</span> <span class="p">{</span> <span class="k">let</span> <span class="n">value</span> <span class="o">=</span> <span class="n">lock</span><span class="py">.inner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">);</span> <span class="k">if</span> <span class="n">value</span> <span class="o">&gt;=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="p">{</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span> <span class="k">if</span> <span class="n">lock</span> <span class="py">.inner</span> <span class="nf">.compare_exchange</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">value</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Relaxed</span><span class="p">)</span> <span class="nf">.is_ok</span><span class="p">()</span> <span class="p">{</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">SharedLockReadGuard</span> <span class="p">{</span> <span class="n">lock</span> <span class="p">})</span> <span class="p">}</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">&gt;</span> <span class="n">Deref</span> <span class="k">for</span> <span class="n">SharedLockReadGuard</span><span class="o">&lt;</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">type</span> <span class="n">Target</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">deref</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="n">T</span> <span class="p">{</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;*</span><span class="k">self</span><span class="py">.lock.data</span><span class="nf">.get</span><span class="p">()</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">&gt;</span> <span class="n">Drop</span> <span class="k">for</span> <span class="n">SharedLockReadGuard</span><span class="o">&lt;</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">fn</span> <span class="k">drop</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span> <span class="k">self</span><span class="py">.lock.inner</span><span class="nf">.fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="cm">/* * RAII structure used to release the exclusive write access of a lock when dropped. * This structure is created by the write methods on SharedLock. */</span> <span class="k">pub</span> <span class="k">struct</span> <span class="n">SharedLockWriteGuard</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span> <span class="o">+</span> <span class="nv">'a</span><span class="o">&gt;</span> <span class="p">{</span> <span class="n">lock</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">SharedLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">&gt;</span> <span class="n">SharedLockWriteGuard</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">lock</span><span class="p">:</span> <span class="o">&amp;</span><span class="nv">'a</span> <span class="n">SharedLock</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Result</span><span class="o">&lt;</span><span class="n">SharedLockWriteGuard</span><span class="o">&lt;</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;&gt;</span> <span class="p">{</span> <span class="k">if</span> <span class="n">lock</span><span class="nf">.is_hold</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">Error</span><span class="p">::</span><span class="n">DeadLockError</span><span class="p">);</span> <span class="p">}</span> <span class="k">loop</span> <span class="p">{</span> <span class="k">let</span> <span class="n">value</span> <span class="o">=</span> <span class="n">lock</span><span class="py">.inner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">);</span> <span class="k">if</span> <span class="n">value</span> <span class="o">&gt;=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="p">{</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span> <span class="k">if</span> <span class="n">lock</span> <span class="py">.inner</span> <span class="nf">.compare_exchange</span><span class="p">(</span> <span class="n">value</span><span class="p">,</span> <span class="n">value</span> <span class="p">|</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Relaxed</span><span class="p">,</span> <span class="p">)</span> <span class="nf">.is_ok</span><span class="p">()</span> <span class="p">{</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="k">if</span> <span class="n">lock</span><span class="py">.owner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">Error</span><span class="p">::</span><span class="n">Poisoned</span><span class="p">);</span> <span class="p">}</span> <span class="n">lock</span><span class="nf">.set_owner_id</span><span class="p">(</span><span class="k">unsafe</span> <span class="p">{</span> <span class="nn">libc</span><span class="p">::</span><span class="nf">pthread_self</span><span class="p">()</span> <span class="p">}</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">);</span> <span class="c">// wait for active readers.</span> <span class="k">while</span> <span class="n">lock</span><span class="py">.inner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">)</span> <span class="o">!=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="p">{}</span> <span class="nf">Ok</span><span class="p">(</span><span class="n">SharedLockWriteGuard</span> <span class="p">{</span> <span class="n">lock</span> <span class="p">})</span> <span class="p">}</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">&gt;</span> <span class="n">Deref</span> <span class="k">for</span> <span class="n">SharedLockWriteGuard</span><span class="o">&lt;</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">type</span> <span class="n">Target</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">deref</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="n">T</span> <span class="p">{</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;*</span><span class="k">self</span><span class="py">.lock.data</span><span class="nf">.get</span><span class="p">()</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">&gt;</span> <span class="n">Drop</span> <span class="k">for</span> <span class="n">SharedLockWriteGuard</span><span class="o">&lt;</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">fn</span> <span class="k">drop</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span> <span class="k">let</span> <span class="n">value</span> <span class="o">=</span> <span class="k">self</span><span class="py">.lock.inner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">);</span> <span class="k">if</span> <span class="n">value</span> <span class="o">!=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="p">{</span> <span class="nd">panic!</span><span class="p">(</span><span class="s">"write unlock inner value: {}"</span><span class="p">,</span> <span class="n">value</span><span class="p">);</span> <span class="p">}</span> <span class="c">// reset owner id.</span> <span class="k">if</span> <span class="o">!</span><span class="k">self</span><span class="py">.lock</span><span class="nf">.is_hold</span><span class="p">()</span> <span class="p">{</span> <span class="nd">panic!</span><span class="p">(</span> <span class="s">"Poisoned!!! owner id: {}"</span><span class="p">,</span> <span class="k">self</span><span class="py">.lock.owner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">)</span> <span class="p">);</span> <span class="p">}</span> <span class="k">self</span><span class="py">.lock</span><span class="nf">.set_owner_id</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="k">self</span><span class="py">.lock</span> <span class="py">.inner</span> <span class="nf">.fetch_sub</span><span class="p">(</span><span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">&gt;</span> <span class="n">DerefMut</span> <span class="k">for</span> <span class="n">SharedLockWriteGuard</span><span class="o">&lt;</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">fn</span> <span class="nf">deref_mut</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="n">T</span> <span class="p">{</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;</span><span class="k">mut</span> <span class="o">*</span><span class="k">self</span><span class="py">.lock.data</span><span class="nf">.get</span><span class="p">()</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <h3 id="参考">参考</h3> <p><a href="https://github.com/cppcoffee/sharelock-rs">https://github.com/cppcoffee/sharelock-rs</a></p> <p><a href="https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock">https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock</a></p>Sharp Liu自旋读写锁实现Process crash print stacktrace – C Library2021-04-25T00:00:00+00:002021-04-25T00:00:00+00:00https://cppcoffee.github.io/system/program/2021/04/25/Process-Crash-Print-Stacktrace--C-Library<p>Process crash print stacktrace – C Library</p> <h3 id="简述">简述</h3> <p>在使用非内存安全,直接操作内存指针的计算机语言进行开发时,不免会碰到操作野指针、回收再访问的内存等等让进程崩溃的情况。</p> <p>进程 crash 后,如果有开启 coredump 功能,linux 系统会 dump 进程相关信息到文件中。 在不安装 debug-info 源码包查看 coredump 产生的 core 崩溃的堆栈信息,可以使用如下 gdb 命令:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># batch mode 下,执行 bt 打印堆栈</span> gdb <span class="nt">-batch</span> <span class="nt">-c</span> ./coredump-nginx-pid <span class="nt">-ex</span> bt /bin/nginx </code></pre></div></div> <p>coredump 开启后,碰到 crash 的进程占用较大内存时,导致 dump 进程数据到磁盘过程过长,机械磁盘负载会持续飙高。</p> <p>但如果限制了 coredump 次数与 coredump 文件的大小,会导致某些条件的 coredump 无法被发现。</p> <p>本文描述开发 crash 输出栈信息到 C Library 的实现。</p> <h3 id="crash-调用栈信息输出">crash 调用栈信息输出</h3> <p>如果进程 crash 后,将导致 crash 的调用栈信息输出到文件,这样可以方便查找问题。</p> <p>主要逻辑如下:</p> <ol> <li>进程启动后,调用库的初始化,注册进程 crash 的信号处理</li> <li>当 crash 发生后,调用信号处理函数</li> <li>在处理函数中,将调用栈输出到 stderr</li> <li>重新设置默认信号 handler,向上传递发生的信号</li> </ol> <p>该 C Library 依赖 libbfd (Binary File Descriptor library),使用它来解析 elf sections,找出调用栈函数名和代码行。</p> <p>libbfd 由 binutils package 提供,在 CentOS 中,可以使用下列命令行进行安装:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>yum <span class="nb">install </span>binutils-devel <span class="nt">-y</span> </code></pre></div></div> <h3 id="主要实现">主要实现</h3> <p>下列代码将 crash 的信息输出到 stderr。</p> <p>更详细的代码见文末 github libstacktrace 仓库链接</p> <p>部分主要逻辑如下:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define _GNU_SOURCE #include &lt;execinfo.h&gt; #include &lt;assert.h&gt; #include &lt;stdio.h&gt; #include &lt;stdlib.h&gt; #include &lt;string.h&gt; #include &lt;limits.h&gt; #include &lt;signal.h&gt; #include &lt;unistd.h&gt; </span> <span class="cp">#include "symbol_table.h" </span> <span class="c1">// The max number of levels in the stack trace</span> <span class="cp">#define STACK_TRACE_MAX_LEVELS 100 #define BUFFER_LENGTH 4096 </span> <span class="k">typedef</span> <span class="nf">void</span> <span class="p">(</span><span class="o">*</span><span class="n">signal_handler_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">signo</span><span class="p">,</span> <span class="n">siginfo_t</span> <span class="o">*</span><span class="n">info</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ctx</span><span class="p">);</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">register_crash_handlers</span><span class="p">();</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">backtrace_symbol_write</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">text</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">addr</span><span class="p">);</span> <span class="c1">// store process full path.</span> <span class="k">static</span> <span class="kt">char</span> <span class="n">program_path</span><span class="p">[</span><span class="n">PATH_MAX</span><span class="p">];</span> <span class="c1">// the current program binary symbol table.</span> <span class="k">static</span> <span class="n">symbol_table_t</span> <span class="n">symtab</span><span class="p">;</span> <span class="c1">// initialize stacktrace library.</span> <span class="kt">int</span> <span class="nf">init_stacktrace</span><span class="p">()</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">n</span><span class="p">;</span> <span class="n">n</span> <span class="o">=</span> <span class="n">readlink</span><span class="p">(</span><span class="s">"/proc/self/exe"</span><span class="p">,</span> <span class="n">program_path</span><span class="p">,</span> <span class="n">PATH_MAX</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">n</span> <span class="o">&gt;=</span> <span class="n">PATH_MAX</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span> <span class="p">}</span> <span class="n">program_path</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'\0'</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">symbol_table_build</span><span class="p">(</span><span class="n">program_path</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">symtab</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="o">-</span><span class="mi">2</span><span class="p">;</span> <span class="p">}</span> <span class="n">register_crash_handlers</span><span class="p">();</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">stack_trace_dump</span><span class="p">()</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">btl</span><span class="p">;</span> <span class="kt">char</span> <span class="o">**</span><span class="n">strings</span><span class="p">;</span> <span class="kt">void</span> <span class="o">*</span><span class="n">stack</span><span class="p">[</span><span class="n">STACK_TRACE_MAX_LEVELS</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">msg</span> <span class="o">=</span> <span class="s">" - STACK TRACE: </span><span class="se">\n</span><span class="s">"</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">STDERR_FILENO</span><span class="p">,</span> <span class="n">program_path</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">program_path</span><span class="p">))</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="k">if</span> <span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">STDERR_FILENO</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">msg</span><span class="p">))</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="n">memset</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">stack</span><span class="p">));</span> <span class="k">if</span> <span class="p">((</span><span class="n">btl</span> <span class="o">=</span> <span class="n">backtrace</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span> <span class="n">STACK_TRACE_MAX_LEVELS</span><span class="p">))</span> <span class="o">&gt;</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span> <span class="n">strings</span> <span class="o">=</span> <span class="n">backtrace_symbols</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span> <span class="n">btl</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="n">strings</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">btl</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span> <span class="n">backtrace_symbol_write</span><span class="p">(</span><span class="n">STDERR_FILENO</span><span class="p">,</span> <span class="n">strings</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">stack</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span> <span class="p">}</span> <span class="n">free</span><span class="p">(</span><span class="n">strings</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="n">backtrace_symbols_fd</span><span class="p">(</span><span class="n">stack</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="n">btl</span> <span class="o">-</span> <span class="mi">2</span><span class="p">,</span> <span class="n">STDERR_FILENO</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="c1">// Reset a signal handler to the default handler.</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">signal_reset_default</span><span class="p">(</span><span class="kt">int</span> <span class="n">signo</span><span class="p">)</span> <span class="p">{</span> <span class="k">struct</span> <span class="n">sigaction</span> <span class="n">act</span><span class="p">;</span> <span class="n">act</span><span class="p">.</span><span class="n">sa_handler</span> <span class="o">=</span> <span class="n">SIG_DFL</span><span class="p">;</span> <span class="n">act</span><span class="p">.</span><span class="n">sa_flags</span> <span class="o">=</span> <span class="n">SA_NODEFER</span> <span class="o">|</span> <span class="n">SA_ONSTACK</span> <span class="o">|</span> <span class="n">SA_RESETHAND</span><span class="p">;</span> <span class="n">sigemptyset</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">act</span><span class="p">.</span><span class="n">sa_mask</span><span class="p">));</span> <span class="n">assert</span><span class="p">(</span><span class="n">sigaction</span><span class="p">(</span><span class="n">signo</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">act</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">signal_crash_handler</span><span class="p">(</span><span class="kt">int</span> <span class="n">signo</span><span class="p">,</span> <span class="n">siginfo_t</span> <span class="o">*</span><span class="n">siginfo</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">)</span> <span class="p">{</span> <span class="n">stack_trace_dump</span><span class="p">();</span> <span class="n">signal_reset_default</span><span class="p">(</span><span class="n">signo</span><span class="p">);</span> <span class="c1">// throw signal to default handler.</span> <span class="n">raise</span><span class="p">(</span><span class="n">signo</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">set_signal</span><span class="p">(</span><span class="kt">int</span> <span class="n">signo</span><span class="p">,</span> <span class="n">signal_handler_t</span> <span class="n">handler</span><span class="p">)</span> <span class="p">{</span> <span class="k">struct</span> <span class="n">sigaction</span> <span class="n">act</span><span class="p">;</span> <span class="n">act</span><span class="p">.</span><span class="n">sa_handler</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span> <span class="n">act</span><span class="p">.</span><span class="n">sa_sigaction</span> <span class="o">=</span> <span class="n">handler</span><span class="p">;</span> <span class="n">act</span><span class="p">.</span><span class="n">sa_flags</span> <span class="o">=</span> <span class="n">SA_SIGINFO</span><span class="p">;</span> <span class="n">sigemptyset</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">act</span><span class="p">.</span><span class="n">sa_mask</span><span class="p">));</span> <span class="n">assert</span><span class="p">(</span><span class="n">sigaction</span><span class="p">(</span><span class="n">signo</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">act</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">register_crash_handlers</span><span class="p">()</span> <span class="p">{</span> <span class="n">set_signal</span><span class="p">(</span><span class="n">SIGBUS</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span> <span class="n">set_signal</span><span class="p">(</span><span class="n">SIGSEGV</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span> <span class="n">set_signal</span><span class="p">(</span><span class="n">SIGILL</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span> <span class="n">set_signal</span><span class="p">(</span><span class="n">SIGTRAP</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span> <span class="n">set_signal</span><span class="p">(</span><span class="n">SIGFPE</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span> <span class="n">set_signal</span><span class="p">(</span><span class="n">SIGABRT</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">backtrace_symbol_format</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">prefix</span><span class="p">,</span> <span class="n">frame_record_t</span> <span class="n">fr</span><span class="p">)</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">n</span><span class="p">;</span> <span class="kt">char</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">buf</span><span class="p">;</span> <span class="c1">// file name</span> <span class="k">if</span> <span class="p">(</span><span class="n">fr</span><span class="p">.</span><span class="n">filename</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">"%s %s"</span><span class="p">,</span> <span class="n">prefix</span><span class="p">,</span> <span class="n">fr</span><span class="p">.</span><span class="n">filename</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">"%s ??"</span><span class="p">,</span> <span class="n">prefix</span><span class="p">);</span> <span class="p">}</span> <span class="n">p</span> <span class="o">+=</span> <span class="n">n</span><span class="p">;</span> <span class="n">len</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span> <span class="c1">// function name</span> <span class="k">if</span> <span class="p">(</span><span class="n">fr</span><span class="p">.</span><span class="n">functionname</span> <span class="o">!=</span> <span class="nb">NULL</span> <span class="o">&amp;&amp;</span> <span class="o">*</span><span class="n">fr</span><span class="p">.</span><span class="n">functionname</span> <span class="o">!=</span> <span class="sc">'\0'</span><span class="p">)</span> <span class="p">{</span> <span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">" %s()"</span><span class="p">,</span> <span class="n">fr</span><span class="p">.</span><span class="n">functionname</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">" ??"</span><span class="p">);</span> <span class="p">}</span> <span class="n">p</span> <span class="o">+=</span> <span class="n">n</span><span class="p">;</span> <span class="n">len</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span> <span class="c1">// line</span> <span class="k">if</span> <span class="p">(</span><span class="n">fr</span><span class="p">.</span><span class="n">line</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">":%u"</span><span class="p">,</span> <span class="n">fr</span><span class="p">.</span><span class="n">line</span><span class="p">);</span> <span class="n">p</span> <span class="o">+=</span> <span class="n">n</span><span class="p">;</span> <span class="n">len</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// discriminator</span> <span class="k">if</span> <span class="p">(</span><span class="n">fr</span><span class="p">.</span><span class="n">discriminator</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span> <span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">" (discriminator %u)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">fr</span><span class="p">.</span><span class="n">discriminator</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span> <span class="p">}</span> <span class="n">p</span> <span class="o">+=</span> <span class="n">n</span><span class="p">;</span> <span class="n">len</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span> <span class="k">return</span> <span class="n">p</span> <span class="o">-</span> <span class="n">buf</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span> <span class="kt">int</span> <span class="nf">backtrace_symbol_write</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">text</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">addr</span><span class="p">)</span> <span class="p">{</span> <span class="n">frame_record_t</span> <span class="n">fr</span><span class="p">;</span> <span class="kt">int</span> <span class="n">n</span><span class="p">;</span> <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="n">BUFFER_LENGTH</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span> <span class="k">if</span> <span class="p">(</span><span class="n">symbol_table_find</span><span class="p">(</span><span class="o">&amp;</span><span class="n">symtab</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">fr</span><span class="p">))</span> <span class="p">{</span> <span class="n">n</span> <span class="o">=</span> <span class="n">backtrace_symbol_format</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">BUFFER_LENGTH</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="n">fr</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">BUFFER_LENGTH</span><span class="p">,</span> <span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">text</span><span class="p">);</span> <span class="p">}</span> <span class="n">buf</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'\0'</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span> <span class="p">}</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <h3 id="崩溃栈输出">崩溃栈输出</h3> <p>在 example.c 中,有个访问空指针的代码,导致 crash,完整代码如下:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#include &lt;stdio.h&gt;</span> <span class="c">#include "stacktrace.h"</span> static void bar<span class="o">()</span> <span class="o">{</span> // 访问空指针,导致 crash char <span class="k">*</span>p <span class="o">=</span> 0<span class="p">;</span> <span class="k">*</span>p <span class="o">=</span> <span class="s1">'a'</span><span class="p">;</span> <span class="o">}</span> static void foo<span class="o">()</span> <span class="o">{</span> bar<span class="o">()</span><span class="p">;</span> <span class="o">}</span> int main<span class="o">()</span> <span class="o">{</span> // 初始化 library init_stacktrace<span class="o">()</span><span class="p">;</span> foo<span class="o">()</span><span class="p">;</span> <span class="k">return </span>0<span class="p">;</span> <span class="o">}</span> </code></pre></div></div> <p>运行后,example 进程 crash 输出:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@localhost libstacktrace]# ./example /home/sharp/libstacktrace/example - STACK TRACE: /lib64/libc.so.6<span class="o">(</span>+0x36450<span class="o">)</span> <span class="o">[</span>0x7fca8db52450] ./example<span class="o">()</span> <span class="o">[</span>0x4036f0] example.c bar<span class="o">()</span> ./example<span class="o">()</span> <span class="o">[</span>0x403703] example.c foo<span class="o">()</span> ./example<span class="o">()</span> <span class="o">[</span>0x40371d] ?? main<span class="o">()</span> /lib64/libc.so.6<span class="o">(</span>__libc_start_main+0xf5<span class="o">)</span> <span class="o">[</span>0x7fca8db3e555] ./example<span class="o">()</span> <span class="o">[</span>0x402c9a] ?? _start<span class="o">()</span> Segmentation fault </code></pre></div></div> <p>如果使用 -g 编译,crash 输出更详细,包括崩溃的具体代码行:</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@localhost libstacktrace]# ./example /home/sharp/libstacktrace/example - STACK TRACE: /lib64/libc.so.6<span class="o">(</span>+0x36450<span class="o">)</span> <span class="o">[</span>0x7fa21a2e0450] ./example<span class="o">()</span> <span class="o">[</span>0x4036f0] /home/sharp/libstacktrace/example.c bar<span class="o">()</span>:8 ./example<span class="o">()</span> <span class="o">[</span>0x403703] /home/sharp/libstacktrace/example.c foo<span class="o">()</span>:15 ./example<span class="o">()</span> <span class="o">[</span>0x40371d] /home/sharp/libstacktrace/example.c main<span class="o">()</span>:24 /lib64/libc.so.6<span class="o">(</span>__libc_start_main+0xf5<span class="o">)</span> <span class="o">[</span>0x7fa21a2cc555] ./example<span class="o">()</span> <span class="o">[</span>0x402c9a] ?? _start<span class="o">()</span> Segmentation fault </code></pre></div></div> <h3 id="参考">参考</h3> <p><a href="https://github.com/cppcoffee/libstacktrace">https://github.com/cppcoffee/libstacktrace</a></p> <p><a href="https://man7.org/linux/man-pages/man1/gdb.1.html">https://man7.org/linux/man-pages/man1/gdb.1.html</a></p> <p><a href="https://man7.org/linux/man-pages/man1/addr2line.1.html">https://man7.org/linux/man-pages/man1/addr2line.1.html</a></p> <p><a href="https://github.com/apache/trafficserver/blob/master/src/tscore/signals.cc">https://github.com/apache/trafficserver/blob/master/src/tscore/signals.cc</a></p> <p><a href="https://sourceware.org/binutils/docs/bfd/">https://sourceware.org/binutils/docs/bfd/</a></p>Sharp LiuProcess crash print stacktrace – C LibraryLock-Free Stack Implement2021-04-07T00:00:00+00:002021-04-07T00:00:00+00:00https://cppcoffee.github.io/datastructure/2021/04/07/lock-free-stack-implement<h2 id="lock-free-stack-implement">Lock-Free Stack Implement</h2> <h3 id="无锁链式栈">无锁链式栈</h3> <p>栈是一种 LIFO (Last In First Out) 的数据结构,常见的实现有数组的方式,操作数组索引进行出入栈;还有另外一种是链式栈实现,操作指针进行出入栈。本文将讨论的是基于链式栈实现无锁操作。</p> <p>链式栈是一种单向链表的结构体,每个节点有一个 next 指针,指向当前栈的下一个栈节点。</p> <p>最基本操作:栈的初始化、入栈、出栈。</p> <h3 id="实现">实现</h3> <p>这里采用 Rust 实现,<strong>crossbeam-epoch</strong> crate 来解决无锁结构体的 ABA 问题和内存回收问题。</p> <h4 id="结构体">结构体</h4> <p>栈的结构体需要有一个指针指向当前栈的栈顶,由于只需要原子操作一个栈顶指针,实现起来将会变得简单。</p> <p>栈结构体和栈节点的结构体定义如下:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 链式栈节点结构体</span> <span class="k">struct</span> <span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">:</span> <span class="n">Send</span><span class="o">&gt;</span> <span class="p">{</span> <span class="nl">next:</span> <span class="n">Atomic</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;</span><span class="p">,</span> <span class="c1">// 下一个节点</span> <span class="nl">value:</span> <span class="n">Option</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span> <span class="c1">// 存储的值</span> <span class="p">}</span> <span class="c1">// 栈对象结构体</span> <span class="n">pub</span> <span class="k">struct</span> <span class="n">Stack</span><span class="o">&lt;</span><span class="n">T</span><span class="o">:</span> <span class="n">Send</span><span class="o">&gt;</span> <span class="p">{</span> <span class="nl">top:</span> <span class="n">Atomic</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;</span><span class="p">,</span> <span class="p">}</span> </code></pre></div></div> <h4 id="初始化">初始化</h4> <p>初始化不需要原子操作,这里提供两个方法:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="o">:</span> <span class="n">Send</span><span class="o">&gt;</span> <span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="c1">// 普通节点</span> <span class="n">fn</span> <span class="n">new</span><span class="p">(</span><span class="n">v</span><span class="o">:</span> <span class="n">T</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">Self</span> <span class="p">{</span> <span class="nl">next:</span> <span class="n">Atomic</span><span class="o">::</span><span class="n">null</span><span class="p">(),</span> <span class="nl">value:</span> <span class="n">Some</span><span class="p">(</span><span class="n">v</span><span class="p">),</span> <span class="p">}</span> <span class="p">}</span> <span class="c1">// 哨兵节点</span> <span class="n">fn</span> <span class="n">sentinel</span><span class="p">()</span> <span class="o">-&gt;</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">Self</span> <span class="p">{</span> <span class="nl">next:</span> <span class="n">Atomic</span><span class="o">::</span><span class="n">null</span><span class="p">(),</span> <span class="nl">value:</span> <span class="n">None</span><span class="p">,</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <h4 id="push压栈">push/压栈</h4> <p>压栈操作是将栈顶指针设置为新压入的栈节点。</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pub</span> <span class="n">fn</span> <span class="nf">push</span><span class="p">(</span><span class="o">&amp;</span><span class="n">self</span><span class="p">,</span> <span class="n">v</span><span class="o">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span> <span class="n">unsafe</span> <span class="p">{</span> <span class="n">self</span><span class="p">.</span><span class="n">try_push</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span> <span class="n">unsafe</span> <span class="n">fn</span> <span class="nf">try_push</span><span class="p">(</span><span class="o">&amp;</span><span class="n">self</span><span class="p">,</span> <span class="n">v</span><span class="o">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span> <span class="n">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">epoch</span><span class="o">::</span><span class="n">pin</span><span class="p">();</span> <span class="n">let</span> <span class="n">node</span> <span class="o">=</span> <span class="n">Owned</span><span class="o">::</span><span class="n">new</span><span class="p">(</span><span class="n">Node</span><span class="o">::</span><span class="n">new</span><span class="p">(</span><span class="n">v</span><span class="p">)).</span><span class="n">into_shared</span><span class="p">(</span><span class="n">guard</span><span class="p">);</span> <span class="n">loop</span> <span class="p">{</span> <span class="n">let</span> <span class="n">top_ptr</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">top</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span> <span class="c1">// 新节点的下一个节点指向栈顶</span> <span class="p">(</span><span class="o">*</span><span class="n">node</span><span class="p">.</span><span class="n">as_raw</span><span class="p">()).</span><span class="n">next</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="n">top_ptr</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">);</span> <span class="c1">// 设置 top 为新节点</span> <span class="k">if</span> <span class="n">self</span> <span class="p">.</span><span class="n">top</span> <span class="p">.</span><span class="n">compare_exchange</span><span class="p">(</span><span class="n">top_ptr</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span> <span class="p">.</span><span class="n">is_ok</span><span class="p">()</span> <span class="p">{</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <h4 id="pop出栈">pop/出栈</h4> <p>出栈操作是将栈顶指针设置为栈顶的下一个栈节点</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pub</span> <span class="n">fn</span> <span class="nf">pop</span><span class="p">(</span><span class="o">&amp;</span><span class="n">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Option</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="n">unsafe</span> <span class="p">{</span> <span class="n">self</span><span class="p">.</span><span class="n">try_pop</span><span class="p">()</span> <span class="p">}</span> <span class="p">}</span> <span class="n">unsafe</span> <span class="n">fn</span> <span class="n">try_pop</span><span class="p">(</span><span class="o">&amp;</span><span class="n">self</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Option</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="n">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">epoch</span><span class="o">::</span><span class="n">pin</span><span class="p">();</span> <span class="n">loop</span> <span class="p">{</span> <span class="n">let</span> <span class="n">top_ptr</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">top</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span> <span class="n">let</span> <span class="n">next_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">top_ptr</span><span class="p">.</span><span class="n">as_raw</span><span class="p">()).</span><span class="n">next</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span> <span class="k">if</span> <span class="n">next_ptr</span><span class="p">.</span><span class="n">is_null</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">None</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// 设置栈顶指针为栈顶的下一个栈节点</span> <span class="k">if</span> <span class="n">self</span> <span class="p">.</span><span class="n">top</span> <span class="p">.</span><span class="n">compare_exchange</span><span class="p">(</span><span class="n">top_ptr</span><span class="p">,</span> <span class="n">next_ptr</span><span class="p">,</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span> <span class="p">.</span><span class="n">is_ok</span><span class="p">()</span> <span class="p">{</span> <span class="n">let</span> <span class="n">top_ptr</span> <span class="o">=</span> <span class="n">top_ptr</span><span class="p">.</span><span class="n">as_raw</span><span class="p">()</span> <span class="n">as</span> <span class="o">*</span><span class="n">mut</span> <span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">;</span> <span class="k">return</span> <span class="p">(</span><span class="o">*</span><span class="n">top_ptr</span><span class="p">).</span><span class="n">value</span><span class="p">.</span><span class="n">take</span><span class="p">();</span> <span class="p">}</span> <span class="p">}</span> <span class="err">}</span> </code></pre></div></div> <p>完整代码链接放在文末 <strong>参考</strong> 字段</p> <h3 id="性能测试">性能测试</h3> <p>lib.rs Stack 与 标准库的 Mutex<LinkedList> 类型进行压测对比</LinkedList></p> <p>笔记本电脑 CPU 参数如下:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>machdep.cpu.brand_string: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz machdep.cpu.core_count: 2 machdep.cpu.thread_count: 4 </code></pre></div></div> <h4 id="压测描述">压测描述</h4> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>stack_loop_n(n):一个 stack 对象,循环 n 次入栈出栈 stack_thread_n_m(n, m):同一个 stack 对象, n 个线程入栈和出栈,循环 m 次数据 </code></pre></div></div> <h4 id="结果对比">结果对比</h4> <table> <thead> <tr> <th>压测类型</th> <th>总耗时</th> <th>平均耗时</th> </tr> </thead> <tbody> <tr> <td>stack_loop_n(100000)</td> <td>56.523467ms</td> <td>565ns</td> </tr> <tr> <td>list_loop_n(100000)</td> <td>67.573497ms</td> <td>675ns</td> </tr> <tr> <td>stack_thread_n_m(2, 100000)</td> <td>115.590207ms</td> <td>577ns</td> </tr> <tr> <td>list_thread_n_m(2, 100000)</td> <td>161.359683ms</td> <td>806ns</td> </tr> <tr> <td>stack_thread_n_m(4, 100000)</td> <td>440.585874ms</td> <td>1.101µs</td> </tr> <tr> <td>list_thread_n_m(4, 100000)</td> <td>562.439723ms</td> <td>1.406µs</td> </tr> <tr> <td>stack_thread_n_m(8, 100000)</td> <td>1.886768172s</td> <td>2.358µs</td> </tr> <tr> <td>list_thread_n_m(8, 100000)</td> <td>2.120945074s</td> <td>2.651µs</td> </tr> </tbody> </table> <h3 id="参考">参考</h3> <p><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.8674">Implementing Lock-Free Queues (1994)</a></p> <p><a href="https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html">https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html</a></p> <p><a href="https://lib.rs/crates/crossbeam-epoch">https://lib.rs/crates/crossbeam-epoch</a></p> <p><a href="https://github.com/cppcoffee/stack-rs">https://github.com/cppcoffee/stack-rs</a></p>Sharp LiuLock-Free Stack ImplementLock-Free Queues Implement2021-03-25T00:00:00+00:002021-03-25T00:00:00+00:00https://cppcoffee.github.io/datastructure/2021/03/25/lock-free-queues-implements<h2 id="lock-free-queues-implement">Lock-Free Queues Implement</h2> <p>队列是一种FIFO的抽象数据结构,这里提到的无锁队列实现是 <code class="language-plaintext highlighter-rouge">Implementing Lock-Free Queues(1994)</code> 这篇论文提出来的。</p> <p>无锁队列操作依靠 CPU 的 CAS (Compare And Swap) 指令,CAS 对应的 Intel CPU 指令是 <code class="language-plaintext highlighter-rouge">lock cmpxchg</code>,前缀 <code class="language-plaintext highlighter-rouge">lock</code> 表明这是一条原子操作指令。</p> <p>现在许多新语言都有自带 CAS 相关函数;底层基础库也有提供内建函数,例如 GCC 提供的内建 CAS 函数:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bool __sync_bool_compare_and_swap (type *ptr, type oldval, type newval, ...) type __sync_val_compare_and_swap (type *ptr, type oldval, type newval, ...) </code></pre></div></div> <h3 id="队列结构体">队列结构体</h3> <p>实现无锁队列需要有两个指针:一个 head 指针,指向队列头部;一个 tail 指针,指向队列尾部。</p> <p>节点结构体有一个 next 指针,指向下一个节点,形成链式队列。</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 队列节点结构体</span> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">node_s</span> <span class="n">node_t</span><span class="p">;</span> <span class="k">struct</span> <span class="n">node_s</span> <span class="p">{</span> <span class="n">node_t</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span> <span class="kt">void</span> <span class="o">*</span><span class="n">value</span><span class="p">;</span> <span class="p">};</span> <span class="c1">// 队列结构体</span> <span class="k">typedef</span> <span class="k">struct</span> <span class="n">queue_s</span> <span class="n">queue_t</span><span class="p">;</span> <span class="k">struct</span> <span class="n">queue_s</span> <span class="p">{</span> <span class="n">node_t</span> <span class="o">*</span><span class="n">head</span><span class="p">;</span> <span class="n">node_t</span> <span class="o">*</span><span class="n">tail</span><span class="p">;</span> <span class="p">};</span> </code></pre></div></div> <h3 id="初始化">初始化</h3> <p>论文提到初始化的时候生成一个 dummy 节点作为 head 和 tail 的初始值。</p> <p>dummy 节点为了防止在空队列或只有一个节点的时候出现边界问题。</p> <p>初始化的实现就如下:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">queue_init</span><span class="p">(</span><span class="n">queue_t</span> <span class="o">*</span><span class="n">q</span><span class="p">)</span> <span class="p">{</span> <span class="n">node_t</span> <span class="o">*</span><span class="n">dummy</span> <span class="o">=</span> <span class="p">(</span><span class="n">node_t</span> <span class="o">*</span><span class="p">)</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">node_t</span><span class="p">));</span> <span class="k">if</span> <span class="p">(</span><span class="n">dummy</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span> <span class="p">}</span> <span class="n">memset</span><span class="p">(</span><span class="n">dummy</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">node_t</span><span class="p">));</span> <span class="n">q</span><span class="o">-&gt;</span><span class="n">head</span> <span class="o">=</span> <span class="n">q</span><span class="o">-&gt;</span><span class="n">tail</span> <span class="o">=</span> <span class="n">dummy</span><span class="p">;</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>函数中需要判断内存分配错误。</p> <h3 id="入队列">入队列</h3> <p>根据论文的伪代码实现如下:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">enqueue</span><span class="p">(</span><span class="n">queue_t</span> <span class="o">*</span><span class="n">q</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span> <span class="n">node_t</span> <span class="o">*</span><span class="n">node</span><span class="p">,</span> <span class="o">*</span><span class="n">tail</span><span class="p">,</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span> <span class="n">node</span> <span class="o">=</span> <span class="p">(</span><span class="n">node_t</span> <span class="o">*</span><span class="p">)</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">node_t</span><span class="p">));</span> <span class="k">if</span> <span class="p">(</span><span class="n">node</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span> <span class="p">}</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">value</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span> <span class="n">node</span><span class="o">-&gt;</span><span class="n">next</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span> <span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span> <span class="n">tail</span> <span class="o">=</span> <span class="n">q</span><span class="o">-&gt;</span><span class="n">tail</span><span class="p">;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">tail</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">tail</span> <span class="o">!=</span> <span class="n">q</span><span class="o">-&gt;</span><span class="n">tail</span><span class="p">)</span> <span class="p">{</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span> <span class="k">if</span> <span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">tail</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">),</span> <span class="n">next</span><span class="p">,</span> <span class="n">node</span><span class="p">))</span> <span class="p">{</span> <span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">tail</span><span class="p">),</span> <span class="n">tail</span><span class="p">,</span> <span class="n">node</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">tail</span><span class="p">),</span> <span class="n">tail</span><span class="p">,</span> <span class="n">next</span><span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <h3 id="出队列">出队列</h3> <p>出队列的实现会比较简单:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="o">*</span><span class="nf">dequeue</span><span class="p">(</span><span class="n">queue_t</span> <span class="o">*</span><span class="n">q</span><span class="p">)</span> <span class="p">{</span> <span class="kt">void</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span> <span class="n">node_t</span> <span class="o">*</span><span class="n">head</span><span class="p">,</span> <span class="o">*</span><span class="n">tail</span><span class="p">,</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span> <span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span> <span class="n">head</span> <span class="o">=</span> <span class="n">q</span><span class="o">-&gt;</span><span class="n">head</span><span class="p">;</span> <span class="n">tail</span> <span class="o">=</span> <span class="n">q</span><span class="o">-&gt;</span><span class="n">tail</span><span class="p">;</span> <span class="n">next</span> <span class="o">=</span> <span class="n">head</span><span class="o">-&gt;</span><span class="n">next</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">head</span> <span class="o">!=</span> <span class="n">q</span><span class="o">-&gt;</span><span class="n">head</span><span class="p">)</span> <span class="p">{</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span> <span class="k">if</span> <span class="p">(</span><span class="n">head</span> <span class="o">==</span> <span class="n">tail</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span> <span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">tail</span><span class="p">),</span> <span class="n">tail</span><span class="p">,</span> <span class="n">next</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span> <span class="n">v</span> <span class="o">=</span> <span class="n">next</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span> <span class="k">if</span> <span class="p">(</span><span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&amp;</span><span class="p">(</span><span class="n">q</span><span class="o">-&gt;</span><span class="n">head</span><span class="p">),</span> <span class="n">head</span><span class="p">,</span> <span class="n">next</span><span class="p">))</span> <span class="p">{</span> <span class="c1">// FIXME: 释放会引发并发结构经典的 ABA 和内存回收问题</span> <span class="c1">//free(head);</span> <span class="k">return</span> <span class="n">v</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <h3 id="aba-问题">ABA 问题</h3> <p>在多线程中,ABA 问题发生在同步期间,当一个位置被读取两次,两次读取的值都是一样的,“值是一样的”被用来表示“没有变化”。然而,另一个线程可以在两次读取之间执行,并改变值,做其他工作,然后把值改回来,从而欺骗第一个线程,使其认为“没有变化”,即使第二个线程所做的工作违反了这个假设:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>T1 从共享内存中读取 A=Load(A) 后被暂停 T2 被调度执行 T2 修改共享内存 CAS(A, B) 将 A 修改成 B,并在被系统调度前 CAS(B, A) B 再被修改成 A T1 再次被调度执行,从而看到 A 并没有被改变过 </code></pre></div></div> <p>这需要保证内存不能立即释放(还有线程饮用它),也不能立即被重用,这就是无锁结构 CAS 最常见的坑,实际项目中,通常配合 128 位 CAS 来避免 ABA 问题,而支持 128 位 CAS 的硬件并不通用,所以需要做指针压缩(TaggedPointer)</p> <h4 id="tagged-pointer">Tagged Pointer</h4> <p>在 x86_64 机器上,指针高位地址用于在内核层表示,在应用层空间中就能够使用高位地址来作为 tag。</p> <p>如下是 64 位长度的地址:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0000 0000 0000 0000 </code></pre></div></div> <p>根据 linux mm 文档中描述,应用程序虚拟内存范围是 0000000000000000 - 00007fffffffffff</p> <p>也就是说高 16 位是可以用来作为 tag.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0000 FFFF FFFF FFFF ^^^^ Free Data! </code></pre></div></div> <h3 id="内存回收问题">内存回收问题</h3> <p>在多线程操作中,内存不能直接释放,由于有其他线程在访问它,这样会造成 <strong>释放后访问</strong> 的问题:</p> <blockquote> <p>T1 执行到 next = tail-&gt;next; 时被调度走 T2 执行 dequeue,将 tail 指向的内存释放 T1 再次被调度到,此时访问 tail-&gt;next 将造成 内存释放后再访问的问题</p> </blockquote> <p>这种情况需要保证内存访问的安全性,可以使用 引用计数、hazard pointers 和 epoch based reclamation 等内存延迟回收算法。</p> <h3 id="rust-实现">Rust 实现</h3> <p>最后附上一版使用 Rust 实现无锁队列的完整代码,这里使用 <strong>crossbeam_epoch</strong> crate 来解决 ABA 问题和内存回收问题。</p> <p>lib.rs</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="nn">atomic</span><span class="p">::</span><span class="nn">Ordering</span><span class="p">::{</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">Release</span><span class="p">};</span> <span class="k">use</span> <span class="nn">crossbeam_epoch</span><span class="p">::{</span><span class="k">self</span> <span class="k">as</span> <span class="n">epoch</span><span class="p">,</span> <span class="n">Atomic</span><span class="p">,</span> <span class="nb">Owned</span><span class="p">,</span> <span class="n">Shared</span><span class="p">};</span> <span class="k">unsafe</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">&gt;</span> <span class="n">Sync</span> <span class="k">for</span> <span class="n">Queue</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{}</span> <span class="k">struct</span> <span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">&gt;</span> <span class="p">{</span> <span class="n">next</span><span class="p">:</span> <span class="n">Atomic</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">&gt;</span> <span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">v</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">next</span><span class="p">:</span> <span class="nn">Default</span><span class="p">::</span><span class="nf">default</span><span class="p">(),</span> <span class="n">data</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="n">v</span><span class="p">),</span> <span class="p">}</span> <span class="p">}</span> <span class="k">fn</span> <span class="nf">sentinel</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">Self</span> <span class="p">{</span> <span class="n">next</span><span class="p">:</span> <span class="nn">Atomic</span><span class="p">::</span><span class="nf">null</span><span class="p">(),</span> <span class="n">data</span><span class="p">:</span> <span class="nb">None</span><span class="p">,</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="k">pub</span> <span class="k">struct</span> <span class="n">Queue</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">&gt;</span> <span class="p">{</span> <span class="n">head</span><span class="p">:</span> <span class="n">Atomic</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;</span><span class="p">,</span> <span class="n">tail</span><span class="p">:</span> <span class="n">Atomic</span><span class="o">&lt;</span><span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">&gt;</span> <span class="n">Queue</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="n">Self</span> <span class="p">{</span> <span class="k">let</span> <span class="n">q</span> <span class="o">=</span> <span class="n">Queue</span> <span class="p">{</span> <span class="n">head</span><span class="p">:</span> <span class="nn">Atomic</span><span class="p">::</span><span class="nf">null</span><span class="p">(),</span> <span class="n">tail</span><span class="p">:</span> <span class="nn">Atomic</span><span class="p">::</span><span class="nf">null</span><span class="p">(),</span> <span class="p">};</span> <span class="k">let</span> <span class="n">sentinel</span> <span class="o">=</span> <span class="nn">Owned</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Node</span><span class="p">::</span><span class="nf">sentinel</span><span class="p">());</span> <span class="k">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;</span><span class="nn">epoch</span><span class="p">::</span><span class="nf">unprotected</span><span class="p">()</span> <span class="p">};</span> <span class="k">let</span> <span class="n">sentinel</span> <span class="o">=</span> <span class="n">sentinel</span><span class="nf">.into_shared</span><span class="p">(</span><span class="n">guard</span><span class="p">);</span> <span class="n">q</span><span class="py">.head</span><span class="nf">.store</span><span class="p">(</span><span class="n">sentinel</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">);</span> <span class="n">q</span><span class="py">.tail</span><span class="nf">.store</span><span class="p">(</span><span class="n">sentinel</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">);</span> <span class="n">q</span> <span class="p">}</span> <span class="k">pub</span> <span class="k">fn</span> <span class="nf">enq</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">v</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="k">self</span><span class="nf">.try_enq</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">try_enq</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="n">v</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span> <span class="k">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="o">&amp;</span><span class="nn">epoch</span><span class="p">::</span><span class="nf">pin</span><span class="p">();</span> <span class="k">let</span> <span class="n">node</span> <span class="o">=</span> <span class="nn">Owned</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Node</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">v</span><span class="p">))</span><span class="nf">.into_shared</span><span class="p">(</span><span class="n">guard</span><span class="p">);</span> <span class="k">loop</span> <span class="p">{</span> <span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="k">self</span><span class="py">.tail</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span> <span class="py">.next</span> <span class="nf">.compare_exchange</span><span class="p">(</span><span class="nn">Shared</span><span class="p">::</span><span class="nf">null</span><span class="p">(),</span> <span class="n">node</span><span class="p">,</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span> <span class="nf">.is_ok</span><span class="p">()</span> <span class="p">{</span> <span class="k">let</span> <span class="mi">_</span> <span class="o">=</span> <span class="k">self</span><span class="py">.tail</span><span class="nf">.compare_exchange</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="k">let</span> <span class="mi">_</span> <span class="o">=</span> <span class="k">self</span><span class="py">.tail</span><span class="nf">.compare_exchange</span><span class="p">(</span> <span class="n">p</span><span class="p">,</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span><span class="py">.next</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">),</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">,</span> <span class="p">);</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="k">pub</span> <span class="k">fn</span> <span class="nf">deq</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="k">self</span><span class="nf">.try_deq</span><span class="p">()</span> <span class="p">}</span> <span class="p">}</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">try_deq</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">Option</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="o">&amp;</span><span class="nn">epoch</span><span class="p">::</span><span class="nf">pin</span><span class="p">();</span> <span class="k">loop</span> <span class="p">{</span> <span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="k">self</span><span class="py">.head</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span> <span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span><span class="py">.next</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span><span class="nf">.is_null</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">None</span><span class="p">;</span> <span class="p">}</span> <span class="k">if</span> <span class="k">self</span> <span class="py">.head</span> <span class="nf">.compare_exchange</span><span class="p">(</span> <span class="n">p</span><span class="p">,</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span><span class="py">.next</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">),</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">,</span> <span class="p">)</span> <span class="nf">.is_ok</span><span class="p">()</span> <span class="p">{</span> <span class="k">let</span> <span class="n">next</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span><span class="py">.next</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span><span class="nf">.as_raw</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">Node</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">;</span> <span class="k">return</span> <span class="p">(</span><span class="o">*</span><span class="n">next</span><span class="p">)</span><span class="py">.data</span><span class="nf">.take</span><span class="p">();</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <h3 id="benchmark">benchmark</h3> <p>lib.rs Queue 与 标准库的 Mutex&lt;LinkedList&gt; 类型进行压测对比</p> <h4 id="压测代码">压测代码</h4> <p>Queue 压测代码 与 Mutex&lt;LinkedList&gt; 实现大同小异,不同的只是 enq 操作对应 push_front,deq 操作对应 pop_back。</p> <p>这里贴两个 Queue 的压测相关代码,更多详细内容见文末的 queue-rs 仓库链接。</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// n 次压测操作</span> <span class="k">fn</span> <span class="nf">queue_loop_n</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Duration</span> <span class="p">{</span> <span class="k">let</span> <span class="n">q</span> <span class="o">=</span> <span class="nn">Queue</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span> <span class="k">let</span> <span class="n">earler</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span> <span class="k">for</span> <span class="n">i</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">n</span> <span class="p">{</span> <span class="n">q</span><span class="nf">.enq</span><span class="p">(</span><span class="n">i</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">);</span> <span class="p">}</span> <span class="k">for</span> <span class="mi">_</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">n</span> <span class="p">{</span> <span class="n">q</span><span class="nf">.deq</span><span class="p">();</span> <span class="p">}</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">()</span><span class="nf">.duration_since</span><span class="p">(</span><span class="n">earler</span><span class="p">)</span> <span class="p">}</span> <span class="c">// n 线程 + m 次操作</span> <span class="k">fn</span> <span class="nf">queue_thread_n_m</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">m</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="n">Duration</span> <span class="p">{</span> <span class="k">let</span> <span class="k">mut</span> <span class="n">handles</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span> <span class="k">let</span> <span class="n">elapsed</span> <span class="o">=</span> <span class="nn">Arc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">AtomicU64</span><span class="p">::</span><span class="nf">default</span><span class="p">());</span> <span class="k">for</span> <span class="mi">_</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">n</span> <span class="p">{</span> <span class="k">let</span> <span class="n">q</span> <span class="o">=</span> <span class="nn">Queue</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span> <span class="k">let</span> <span class="n">elapsed_clone</span> <span class="o">=</span> <span class="n">elapsed</span><span class="nf">.clone</span><span class="p">();</span> <span class="n">handles</span><span class="nf">.push</span><span class="p">(</span><span class="nn">thread</span><span class="p">::</span><span class="nf">spawn</span><span class="p">(</span><span class="k">move</span> <span class="p">||</span> <span class="p">{</span> <span class="k">let</span> <span class="n">start</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span> <span class="k">for</span> <span class="n">i</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">m</span> <span class="p">{</span> <span class="n">q</span><span class="nf">.enq</span><span class="p">(</span><span class="n">i</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">);</span> <span class="p">}</span> <span class="k">for</span> <span class="mi">_</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">m</span> <span class="p">{</span> <span class="n">q</span><span class="nf">.deq</span><span class="p">();</span> <span class="p">}</span> <span class="k">let</span> <span class="n">nanos</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">()</span><span class="nf">.duration_since</span><span class="p">(</span><span class="n">start</span><span class="p">)</span><span class="nf">.as_nanos</span><span class="p">();</span> <span class="n">elapsed_clone</span><span class="nf">.fetch_add</span><span class="p">(</span><span class="n">nanos</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">SeqCst</span><span class="p">);</span> <span class="p">}));</span> <span class="p">}</span> <span class="k">for</span> <span class="n">handle</span> <span class="n">in</span> <span class="n">handles</span> <span class="p">{</span> <span class="n">handle</span><span class="nf">.join</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span> <span class="p">}</span> <span class="nn">Duration</span><span class="p">::</span><span class="nf">from_nanos</span><span class="p">(</span><span class="nn">Arc</span><span class="p">::</span><span class="nf">try_unwrap</span><span class="p">(</span><span class="n">elapsed</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.into_inner</span><span class="p">())</span> <span class="p">}</span> </code></pre></div></div> <h4 id="结果对比">结果对比</h4> <p>笔记本电脑 CPU 参数如下:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>machdep.cpu.brand_string: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz machdep.cpu.core_count: 2 machdep.cpu.thread_count: 4 </code></pre></div></div> <p><strong>备注</strong>: 领先的数据加黑标注</p> <p>输出结果:</p> <table> <thead> <tr> <th>压测类型</th> <th>总耗时</th> <th>平均耗时</th> </tr> </thead> <tbody> <tr> <td>queue_loop_n(100000)</td> <td><strong>17.843828ms</strong></td> <td><strong>178ns</strong></td> </tr> <tr> <td>list_loop_n(100000)</td> <td>23.066353ms</td> <td>230ns</td> </tr> <tr> <td>queue_thread_n_m(2, 100000)</td> <td><strong>64.018836ms</strong></td> <td><strong>320ns</strong></td> </tr> <tr> <td>list_thread_n_m(2, 100000)</td> <td>74.660454ms</td> <td>373ns</td> </tr> <tr> <td>queue_thread_n_m(4, 100000)</td> <td><strong>149.736868ms</strong></td> <td><strong>374ns</strong></td> </tr> <tr> <td>list_thread_n_m(4, 100000)</td> <td>189.6352ms</td> <td>474ns</td> </tr> <tr> <td>queue_thread_n_m(8, 100000)</td> <td><strong>544.476377ms</strong></td> <td><strong>680ns</strong></td> </tr> <tr> <td>list_thread_n_m(8, 100000)</td> <td>980.688619ms</td> <td>1225ns</td> </tr> </tbody> </table> <h3 id="参考">参考</h3> <p><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.8674">Implementing Lock-Free Queues (1994)</a></p> <p><a href="https://en.wikipedia.org/wiki/ABA_problem">https://en.wikipedia.org/wiki/ABA_problem</a></p> <p><a href="https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html">https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html</a></p> <p><a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-579.pdf">Keir Fraser’s epoch-based reclamation</a></p> <p><a href="https://lib.rs/crates/crossbeam-epoch">crossbeam-epoch crate</a></p> <p><a href="https://en.wikipedia.org/wiki/Tagged_pointer">https://en.wikipedia.org/wiki/Tagged_pointer</a></p> <p><a href="https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt">https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt</a></p> <p><a href="https://github.com/cppcoffee/queue-rs">https://github.com/cppcoffee/queue-rs</a></p>Sharp LiuLock-Free Queues ImplementThread Condition Signal 的两个陷阱2021-02-27T00:00:00+00:002021-02-27T00:00:00+00:00https://cppcoffee.github.io/system/program/2021/02/27/Thread-Condition-Signal-%E7%9A%84%E4%B8%A4%E4%B8%AA%E9%99%B7%E9%98%B1<h2 id="thread-condition-signal">Thread Condition Signal</h2> <p>当接触 线程条件信号 时,通常是实现生产者和消费者的场景。翻看 man 手册后,很疑惑为什么 cond 需要依赖外部的 mutex?</p> <p>在 man 手册中没有 example 可以参考,很容易不假思索的写成下面这样子有陷阱的代码:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// producer</span> <span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> <span class="n">pthread_cond_signal</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cond</span><span class="p">);</span> <span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> <span class="c1">// consumer</span> <span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> <span class="n">pthread_cond_wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cond</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> <span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> </code></pre></div></div> <p>这样子写会步入 <em>信号丢失的陷阱</em> 中。</p> <h3 id="信号丢失的陷阱">信号丢失的陷阱</h3> <p>当 signal 发生于 wait 之前,信号就会丢失</p> <pre><code class="language-flow"> + +----------+ +----------+ | | producer | | consumer | | +----------+ +----------+ | | +----------+ | | lock | | +----------+ | +----------+ | | signal | | +----------+ Time | +----------+ | | unlock | | +----------+ |------------------------------- | +----------+ | | lock | | +----------+ | +----------+ | | wait | | +----------+ | +----------+ | | unlock | v +----------+ </code></pre> <p>这里是一个生产者,一个消费者的场景。 producer 优先执行,导致了信号丢失,consumer 一直在 wait 中。</p> <h3 id="虚假唤醒的陷阱">虚假唤醒的陷阱</h3> <p>man <code class="language-plaintext highlighter-rouge">pthread_cond_broadcast</code> 文档中,<code class="language-plaintext highlighter-rouge">Multiple Awakenings by Condition Signal</code> 段落提到的 <code class="language-plaintext highlighter-rouge">spurious wakeup</code> 问题。</p> <p>考虑到一个生产者,多个消费者的场景:</p> <p>一个线程尝试等待条件变量,另一个线程并发执行到了 <code class="language-plaintext highlighter-rouge">pthread_cond_signal</code>,第三个线程已经在等待中。</p> <p>如下伪代码实现与执行步骤(末尾数字):</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pthread_cond_wait</span><span class="p">(</span><span class="n">mutex</span><span class="p">,</span> <span class="n">cond</span><span class="p">)</span><span class="o">:</span> <span class="n">value</span> <span class="o">=</span> <span class="n">cond</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">;</span> <span class="cm">/* 1 */</span> <span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 2 */</span> <span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="n">cond</span><span class="o">-&gt;</span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 10 */</span> <span class="k">if</span> <span class="p">(</span><span class="n">value</span> <span class="o">==</span> <span class="n">cond</span><span class="o">-&gt;</span><span class="n">value</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* 11 */</span> <span class="n">me</span><span class="o">-&gt;</span><span class="n">next_cond</span> <span class="o">=</span> <span class="n">cond</span><span class="o">-&gt;</span><span class="n">waiter</span><span class="p">;</span> <span class="n">cond</span><span class="o">-&gt;</span><span class="n">waiter</span> <span class="o">=</span> <span class="n">me</span><span class="p">;</span> <span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="n">cond</span><span class="o">-&gt;</span><span class="n">mutex</span><span class="p">);</span> <span class="n">unable_to_run</span><span class="p">(</span><span class="n">me</span><span class="p">);</span> <span class="p">}</span> <span class="k">else</span> <span class="nf">pthread_mutex_unlock</span><span class="p">(</span><span class="n">cond</span><span class="o">-&gt;</span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 12 */</span> <span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 13 */</span> <span class="n">pthread_cond_signal</span><span class="p">(</span><span class="n">cond</span><span class="p">)</span><span class="o">:</span> <span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="n">cond</span><span class="o">-&gt;</span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 3 */</span> <span class="n">cond</span><span class="o">-&gt;</span><span class="n">value</span><span class="o">++</span><span class="p">;</span> <span class="cm">/* 4 */</span> <span class="k">if</span> <span class="p">(</span><span class="n">cond</span><span class="o">-&gt;</span><span class="n">waiter</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* 5 */</span> <span class="n">sleeper</span> <span class="o">=</span> <span class="n">cond</span><span class="o">-&gt;</span><span class="n">waiter</span><span class="p">;</span> <span class="cm">/* 6 */</span> <span class="n">cond</span><span class="o">-&gt;</span><span class="n">waiter</span> <span class="o">=</span> <span class="n">sleeper</span><span class="o">-&gt;</span><span class="n">next_cond</span><span class="p">;</span> <span class="cm">/* 7 */</span> <span class="n">able_to_run</span><span class="p">(</span><span class="n">sleeper</span><span class="p">);</span> <span class="cm">/* 8 */</span> <span class="p">}</span> <span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="n">cond</span><span class="o">-&gt;</span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 9 */</span> </code></pre></div></div> <p>调用一次 <code class="language-plaintext highlighter-rouge">pthread_cond_signal</code>,导致了多个 consumer 线程在 <code class="language-plaintext highlighter-rouge">pthread_cond_wait</code> 或者 <code class="language-plaintext highlighter-rouge">pthread_cond_timedwait</code> 返回,这现象称为 <code class="language-plaintext highlighter-rouge">spurious wakeup</code>。</p> <h3 id="解决方法">解决方法</h3> <p>当实现 Thread Condition Signal 逻辑时,外部的 mutex 锁是为了保证正确性,加入一个条件变量以保证唤醒信号不会丢失。</p> <p>如下正确的写法:</p> <div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// producer</span> <span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> <span class="n">condition_</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span> <span class="n">pthread_cond_signal</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cond</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> <span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> <span class="c1">// consumer</span> <span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> <span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">condition_</span><span class="p">)</span> <span class="p">{</span> <span class="n">pthread_cond_wait</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cond</span><span class="p">);</span> <span class="p">}</span> <span class="n">condition_</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span> <span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&amp;</span><span class="n">mutex</span><span class="p">);</span> </code></pre></div></div> <p>如果是多个生产者多个消费者的情况,可以将条件改成 count 计数器。</p> <h3 id="参考">参考</h3> <p><a href="https://man7.org/linux/man-pages/man3/pthread_cond_broadcast.3p.html">https://man7.org/linux/man-pages/man3/pthread_cond_broadcast.3p.html</a></p> <p><a href="https://code.woboq.org/userspace/glibc/nptl/pthread_cond_wait.c.html">https://code.woboq.org/userspace/glibc/nptl/pthread_cond_wait.c.html</a></p>Sharp LiuThread Condition SignalLinux 文件碎片 top 工具 – Rust实现2021-02-11T00:00:00+00:002021-02-11T00:00:00+00:00https://cppcoffee.github.io/filesystem/rust/2021/02/11/Linux%E6%96%87%E4%BB%B6%E7%A2%8E%E7%89%87top%E5%B7%A5%E5%85%B7--Rust%E5%AE%9E%E7%8E%B0<p>Linux 文件碎片 top 工具 – Rust实现</p> <h2 id="fragtop-rs">fragtop-rs</h2> <p>上一篇提到 Linux 下的 <code class="language-plaintext highlighter-rouge">filefrag</code> 工具的实现方式,可以用它来查看文件碎片,它没有提供一个扫描目录进行 top 碎片数量排序的功能,既然这样,那就动手做一个玩。</p> <p>工具项目名为 <code class="language-plaintext highlighter-rouge">fragtop-rs</code>,寓意是跟 top 工具一样。<code class="language-plaintext highlighter-rouge">fragtop-rs</code> 能够根据 glob 匹配的文件进行碎片统计,并进行 top 排序输出。</p> <p><code class="language-plaintext highlighter-rouge">fragtop-rs</code> 采用 Rust 实现,Rust 有一个 <code class="language-plaintext highlighter-rouge">fiemap</code> 的 crate 可以使用 <a href="https://docs.rs/fiemap/">https://docs.rs/fiemap/</a>,接口干净整洁,可以拿过来使用。</p> <p>功能包括:需要指定 glob pattern,遍历匹配的文件进行碎片查询,最后给出 top-n 的列表。</p> <h3 id="cargotoml">Cargo.toml</h3> <p>首先创建项目,并进入项目目录中</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span> <span class="n">cargo</span> <span class="n">new</span> <span class="o">--</span><span class="n">bin</span> <span class="n">fragtop</span><span class="o">-</span><span class="n">rs</span> <span class="n">Created</span> <span class="nf">binary</span> <span class="p">(</span><span class="n">application</span><span class="p">)</span> <span class="err">`</span><span class="n">fragtop</span><span class="o">-</span><span class="n">rs</span><span class="err">`</span> <span class="n">package</span> <span class="o">%</span> <span class="n">cd</span> <span class="n">fragtop</span><span class="o">-</span><span class="n">rs</span> </code></pre></div></div> <p>需要的 crate 列表如下:</p> <ul> <li><code class="language-plaintext highlighter-rouge">clap</code>: 用于命令行操作</li> <li><code class="language-plaintext highlighter-rouge">glob</code>: 匹配文件路径</li> <li><code class="language-plaintext highlighter-rouge">anyhow</code>: 错误处理</li> <li><code class="language-plaintext highlighter-rouge">fiemap</code>: Linux 文件碎片查找</li> </ul> <p>逐个添加依赖的 crate</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span> <span class="n">cargo</span> <span class="n">add</span> <span class="n">clap</span> <span class="n">Updating</span> <span class="nv">'https</span><span class="p">:</span><span class="c">//github.com/rust-lang/crates.io-index' index</span> <span class="n">Adding</span> <span class="n">clap</span> <span class="n">v2</span><span class="na">.33.3</span> <span class="n">to</span> <span class="n">dependencies</span> <span class="o">%</span> <span class="n">cargo</span> <span class="n">add</span> <span class="n">glob</span> <span class="o">%</span> <span class="n">cargo</span> <span class="n">add</span> <span class="n">anyhow</span> <span class="o">%</span> <span class="n">cargo</span> <span class="n">add</span> <span class="n">fiemap</span> </code></pre></div></div> <p>添加完 crate 后,<code class="language-plaintext highlighter-rouge">Cargo.toml</code> 的 <code class="language-plaintext highlighter-rouge">dependencies</code> 字段如下所示:</p> <div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[dependencies]</span> <span class="py">clap</span> <span class="p">=</span> <span class="s">"2.33"</span> <span class="py">glob</span> <span class="p">=</span> <span class="s">"0.3"</span> <span class="py">anyhow</span> <span class="p">=</span> <span class="s">"1.0"</span> <span class="py">fiemap</span> <span class="p">=</span> <span class="s">"0.1"</span> </code></pre></div></div> <h3 id="逻辑实现">逻辑实现</h3> <p>增加 clap 命令行处理,该工具需要 <code class="language-plaintext highlighter-rouge">-p</code> 参数来指定 glob pattern 路径和 <code class="language-plaintext highlighter-rouge">-n</code> 来指定 top-n 数量。</p> <p>其中 <code class="language-plaintext highlighter-rouge">-p</code> 是要求必须指定;<code class="language-plaintext highlighter-rouge">-n</code> 默认值为 20,如果文件过多,就只显示 top 20 的文件</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[macro_use]</span> <span class="k">extern</span> <span class="n">crate</span> <span class="n">clap</span><span class="p">;</span> <span class="k">use</span> <span class="nn">clap</span><span class="p">::</span><span class="n">Arg</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nn">anyhow</span><span class="p">::</span><span class="n">Result</span><span class="o">&lt;</span><span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">let</span> <span class="n">matches</span> <span class="o">=</span> <span class="nn">clap</span><span class="p">::</span><span class="nd">app_from_crate!</span><span class="p">()</span> <span class="nf">.arg</span><span class="p">(</span> <span class="nn">Arg</span><span class="p">::</span><span class="nf">with_name</span><span class="p">(</span><span class="s">"path"</span><span class="p">)</span> <span class="nf">.short</span><span class="p">(</span><span class="s">"p"</span><span class="p">)</span> <span class="nf">.help</span><span class="p">(</span><span class="s">"Set the glob file path"</span><span class="p">)</span> <span class="nf">.required</span><span class="p">(</span><span class="k">true</span><span class="p">)</span> <span class="nf">.takes_value</span><span class="p">(</span><span class="k">true</span><span class="p">),</span> <span class="p">)</span> <span class="nf">.arg</span><span class="p">(</span> <span class="nn">Arg</span><span class="p">::</span><span class="nf">with_name</span><span class="p">(</span><span class="s">"top-n"</span><span class="p">)</span> <span class="nf">.short</span><span class="p">(</span><span class="s">"n"</span><span class="p">)</span> <span class="nf">.help</span><span class="p">(</span><span class="s">"Top fragment file"</span><span class="p">)</span> <span class="nf">.default_value</span><span class="p">(</span><span class="s">"20"</span><span class="p">)</span> <span class="nf">.takes_value</span><span class="p">(</span><span class="k">true</span><span class="p">),</span> <span class="p">)</span> <span class="nf">.get_matches</span><span class="p">();</span> <span class="k">let</span> <span class="n">path</span> <span class="o">=</span> <span class="n">matches</span><span class="nf">.value_of</span><span class="p">(</span><span class="s">"path"</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span> <span class="k">let</span> <span class="n">top_n</span> <span class="o">=</span> <span class="n">matches</span><span class="nf">.value_of</span><span class="p">(</span><span class="s">"top-n"</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span><span class="py">.parse</span><span class="p">::</span><span class="o">&lt;</span><span class="nb">usize</span><span class="o">&gt;</span><span class="p">()</span><span class="o">?</span><span class="p">;</span> <span class="nd">println!</span><span class="p">(</span><span class="s">"path: {}, top: {}"</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">top_n</span><span class="p">);</span> <span class="nf">Ok</span><span class="p">(())</span> <span class="p">}</span> </code></pre></div></div> <p>运行输出:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% cargo run -- -p /tmp/ path: /tmp/, top: 20 </code></pre></div></div> <p>输出结果正常,接下来要添加遍历匹配 glob pattern 的文件,并记录对应文件的碎片数量。</p> <p>使用 <code class="language-plaintext highlighter-rouge">BTreeSet&lt;Tuple(fragments, path)&gt;</code> 来记录文件和它对应的碎片数量,BTreeSet 的好处是可以按照从小到大进行遍历,如果 <code class="language-plaintext highlighter-rouge">rev()</code> 则可以从大到小进行遍历。</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">...</span> <span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">collections</span><span class="p">::</span><span class="n">BTreeSet</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nn">anyhow</span><span class="p">::</span><span class="n">Result</span><span class="o">&lt;</span><span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span> <span class="o">...</span> <span class="k">let</span> <span class="k">mut</span> <span class="n">records</span> <span class="o">=</span> <span class="nn">BTreeSet</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span> <span class="k">for</span> <span class="n">entry</span> <span class="n">in</span> <span class="nn">glob</span><span class="p">::</span><span class="nf">glob</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="o">?</span> <span class="p">{</span> <span class="k">let</span> <span class="n">entry</span> <span class="o">=</span> <span class="n">entry</span><span class="o">?</span><span class="p">;</span> <span class="c">// 输出正在处理的文件</span> <span class="c">// \r 开头则使用同一行进行替换输出</span> <span class="nd">print!</span><span class="p">(</span><span class="s">"</span><span class="se">\r</span><span class="s">In progress: {}"</span><span class="p">,</span> <span class="n">entry</span><span class="nf">.display</span><span class="p">());</span> <span class="c">// 获取文件碎片,并保存文件碎片数和文件路径</span> <span class="k">let</span> <span class="n">count</span> <span class="o">=</span> <span class="nn">fiemap</span><span class="p">::</span><span class="nf">fiemap</span><span class="p">(</span><span class="o">&amp;</span><span class="n">entry</span><span class="p">)</span><span class="o">?</span><span class="nf">.count</span><span class="p">();</span> <span class="n">records</span><span class="nf">.insert</span><span class="p">((</span><span class="n">count</span><span class="p">,</span> <span class="n">entry</span><span class="p">));</span> <span class="p">}</span> <span class="nf">Ok</span><span class="p">(())</span> <span class="p">}</span> </code></pre></div></div> <p>已经有了文件路径和它对应的碎片数量,最后就是对这些信息的总结输出(遍历)。</p> <div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="nn">anyhow</span><span class="p">::</span><span class="n">Result</span><span class="o">&lt;</span><span class="p">()</span><span class="o">&gt;</span> <span class="p">{</span> <span class="o">...</span> <span class="k">if</span> <span class="n">records</span><span class="nf">.len</span><span class="p">()</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span> <span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nd">anyhow!</span><span class="p">(</span><span class="s">"no files are scanned."</span><span class="p">));</span> <span class="p">}</span> <span class="nd">println!</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">Scan total file: {}"</span><span class="p">,</span> <span class="n">records</span><span class="nf">.len</span><span class="p">());</span> <span class="c">// rev() 倒序(从大到小),只取 take(top_n) 项</span> <span class="k">for</span> <span class="p">(</span><span class="n">count</span><span class="p">,</span> <span class="n">entry</span><span class="p">)</span> <span class="n">in</span> <span class="n">records</span><span class="nf">.iter</span><span class="p">()</span><span class="nf">.rev</span><span class="p">()</span><span class="nf">.take</span><span class="p">(</span><span class="n">top_n</span><span class="p">)</span> <span class="p">{</span> <span class="nd">println!</span><span class="p">(</span><span class="s">"{:&lt;48} {}"</span><span class="p">,</span> <span class="n">entry</span><span class="nf">.display</span><span class="p">(),</span> <span class="n">count</span><span class="p">);</span> <span class="p">}</span> <span class="nf">Ok</span><span class="p">(())</span> <span class="p">}</span> </code></pre></div></div> <p>以上,<code class="language-plaintext highlighter-rouge">fragtop-rs</code> 的代码完成了。</p> <p>用它来查看 <code class="language-plaintext highlighter-rouge">/var/log/</code> 目录下面的所有日志文件,并根据碎片数量输出</p> <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>./target/debug/fragtop-rs <span class="nt">-p</span> <span class="s1">'/var/log/**/*'</span> In progress: /var/log/yum.log-20210101 Scan total file: 657 /var/log/access.log 266 /var/log/wtmp 39 /var/log/messages-20210131 22 /var/log/messages-20210207 21 /var/log/messages-20210117 20 /var/log/audit/audit.log.2 20 /var/log/audit/audit.log.4 19 /var/log/messages-20210124 18 /var/log/audit/audit.log.1 18 /var/log/nginx/access.log 17 /var/log/audit/audit.log.3 17 /var/log/cron-20210207 14 /var/log/cron-20210131 14 /var/log/cron-20210124 14 /var/log/cron-20210117 14 /var/log/messages 13 /var/log/audit/audit.log 13 /var/log/yum.log-20200511 10 /var/log/tuned/tuned.log 10 /var/log/grubby 8 </code></pre></div></div> <h3 id="参考">参考</h3> <p><a href="https://docs.rs/fiemap/0.1.1/fiemap/">https://docs.rs/fiemap/0.1.1/fiemap/</a></p> <p><a href="https://github.com/cppcoffee/fragtop-rs">https://github.com/cppcoffee/fragtop-rs</a></p>Sharp LiuLinux 文件碎片 top 工具 – Rust实现