Jekyll2023-10-16T05:20:10+00:00https://cppcoffee.github.io/feed.xmlcppcoffee.github.ioRustacean🦀 路漫漫其修远兮,吾将上下而求索Sharp Liu分析 rust 实现 TCP idle 连接池2023-01-23T00:00:00+00:002023-01-23T00:00:00+00:00https://cppcoffee.github.io/network/2023/01/23/%E5%88%86%E6%9E%90rust%E5%AE%9E%E7%8E%B0%E7%9A%84TCP%E8%BF%9E%E6%8E%A5%E6%B1%A0<p>分析 rust 实现 TCP idle 连接池</p>
<h2 id="简介">简介</h2>
<p>通常用 C 语言实现 TCP idle 连接池,是将 idle fd 放到 epoll_wait 中等待事件通知(对端主动断开链接等事件)。而更高级的语言如 go/rust 如果照搬 epoll_wait 实现,获取 inner fd 会失去语言封装的特性。</p>
<p>最近在阅读开源项目源码的时候,看到了 rust 实现的 ureq 库中的 TCP idle 连接池的实现,可以当作高级语言实现连接池的参考。</p>
<h2 id="结构体">结构体</h2>
<p>ureq 连接池的实现在 pool.rs 中,连接池结构体定义:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span><span class="p">(</span><span class="n">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="n">ConnectionPool</span> <span class="p">{</span>
<span class="n">inner</span><span class="p">:</span> <span class="n">Mutex</span><span class="o"><</span><span class="n">Inner</span><span class="o">></span><span class="p">,</span>
<span class="n">max_idle_connections</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
<span class="n">max_idle_connections_per_host</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">struct</span> <span class="n">Inner</span> <span class="p">{</span>
<span class="c">// the actual pooled connection. however only one per hostname:port.</span>
<span class="n">recycle</span><span class="p">:</span> <span class="n">HashMap</span><span class="o"><</span><span class="n">PoolKey</span><span class="p">,</span> <span class="n">VecDeque</span><span class="o"><</span><span class="n">Stream</span><span class="o">>></span><span class="p">,</span>
<span class="c">// This is used to keep track of which streams to expire when the</span>
<span class="c">// pool reaches MAX_IDLE_CONNECTIONS. The corresponding PoolKeys for</span>
<span class="c">// recently used Streams are added to the back of the queue;</span>
<span class="c">// old streams are removed from the front.</span>
<span class="n">lru</span><span class="p">:</span> <span class="n">VecDeque</span><span class="o"><</span><span class="n">PoolKey</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<p>空闲连接由 <code class="language-plaintext highlighter-rouge">HashMap<PoolKey, VecDeque<String>></code> 存放,<code class="language-plaintext highlighter-rouge">host:port</code> 作为 key,连接存放到队列中。</p>
<h2 id="空闲连接获取">空闲连接获取</h2>
<p>ureq crate 从 <code class="language-plaintext highlighter-rouge">connect_socket</code> 接口获取 TCP 连接,如果 <code class="language-plaintext highlighter-rouge">use_pooled</code> 参数传递 <code class="language-plaintext highlighter-rouge">true</code>,就从连接池中获取连接。</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">/// Connect the socket, either by using the pool or grab a new one.</span>
<span class="k">fn</span> <span class="nf">connect_socket</span><span class="p">(</span><span class="n">unit</span><span class="p">:</span> <span class="o">&</span><span class="n">Unit</span><span class="p">,</span> <span class="n">hostname</span><span class="p">:</span> <span class="o">&</span><span class="nb">str</span><span class="p">,</span> <span class="n">use_pooled</span><span class="p">:</span> <span class="nb">bool</span><span class="p">)</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="p">(</span><span class="n">Stream</span><span class="p">,</span> <span class="nb">bool</span><span class="p">),</span> <span class="n">Error</span><span class="o">></span> <span class="p">{</span>
<span class="o">...</span>
<span class="k">if</span> <span class="n">use_pooled</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">pool</span> <span class="o">=</span> <span class="o">&</span><span class="n">unit</span><span class="py">.agent.state.pool</span><span class="p">;</span>
<span class="k">let</span> <span class="n">proxy</span> <span class="o">=</span> <span class="o">&</span><span class="n">unit</span><span class="py">.agent.config.proxy</span><span class="p">;</span>
<span class="c">// The connection may have been closed by the server</span>
<span class="c">// due to idle timeout while it was sitting in the pool.</span>
<span class="c">// Loop until we find one that is still good or run out of connections.</span>
<span class="k">while</span> <span class="k">let</span> <span class="nf">Some</span><span class="p">(</span><span class="n">stream</span><span class="p">)</span> <span class="o">=</span> <span class="n">pool</span><span class="nf">.try_get_connection</span><span class="p">(</span><span class="o">&</span><span class="n">unit</span><span class="py">.url</span><span class="p">,</span> <span class="n">proxy</span><span class="nf">.clone</span><span class="p">())</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">server_closed</span> <span class="o">=</span> <span class="n">stream</span><span class="nf">.server_closed</span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
<span class="k">if</span> <span class="o">!</span><span class="n">server_closed</span> <span class="p">{</span>
<span class="k">return</span> <span class="nf">Ok</span><span class="p">((</span><span class="n">stream</span><span class="p">,</span> <span class="k">true</span><span class="p">));</span>
<span class="p">}</span>
<span class="nd">debug!</span><span class="p">(</span><span class="s">"dropping stream from pool; closed by server: {:?}"</span><span class="p">,</span> <span class="n">stream</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">let</span> <span class="n">stream</span> <span class="o">=</span> <span class="k">match</span> <span class="n">unit</span><span class="py">.url</span><span class="nf">.scheme</span><span class="p">()</span> <span class="p">{</span>
<span class="s">"http"</span> <span class="k">=></span> <span class="nn">stream</span><span class="p">::</span><span class="nf">connect_http</span><span class="p">(</span><span class="n">unit</span><span class="p">,</span> <span class="n">hostname</span><span class="p">),</span>
<span class="s">"https"</span> <span class="k">=></span> <span class="nn">stream</span><span class="p">::</span><span class="nf">connect_https</span><span class="p">(</span><span class="n">unit</span><span class="p">,</span> <span class="n">hostname</span><span class="p">),</span>
<span class="s">"test"</span> <span class="k">=></span> <span class="nf">connect_test</span><span class="p">(</span><span class="n">unit</span><span class="p">),</span>
<span class="n">scheme</span> <span class="k">=></span> <span class="nf">Err</span><span class="p">(</span><span class="nn">ErrorKind</span><span class="p">::</span><span class="n">UnknownScheme</span><span class="nf">.msg</span><span class="p">(</span><span class="nd">format!</span><span class="p">(</span><span class="s">"unknown scheme {}"</span><span class="p">,</span> <span class="n">scheme</span><span class="p">))),</span>
<span class="p">};</span>
<span class="nf">Ok</span><span class="p">((</span><span class="n">stream</span><span class="o">?</span><span class="p">,</span> <span class="k">false</span><span class="p">))</span>
<span class="p">}</span>
</code></pre></div></div>
<p>函数循环从池子中获取连接,并调用 <code class="language-plaintext highlighter-rouge">server_closed</code> 判断空闲连接是否可用(没有被对端断开,没有残留数据)。</p>
<p>空闲连接需要判断是否断开的逻辑:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="c">// Check if the server has closed a stream by performing a one-byte</span>
<span class="c">// non-blocking read. If this returns EOF, the server has closed the</span>
<span class="c">// connection: return true. If this returns a successful read, there are</span>
<span class="c">// some bytes on the connection even though there was no inflight request.</span>
<span class="c">// For plain HTTP streams, that might mean an HTTP 408 was pushed; it</span>
<span class="c">// could also mean a buggy server that sent more bytes than a response's</span>
<span class="c">// Content-Length. For HTTPS streams, that might mean a close_notify alert,</span>
<span class="c">// which is the proper way to shut down an idle stream.</span>
<span class="c">// Either way, bytes available on the stream before we've made a request</span>
<span class="c">// means the stream is not usable, so we should discard it.</span>
<span class="c">// If this returns WouldBlock (aka EAGAIN),</span>
<span class="c">// that means the connection is still open: return false. Otherwise</span>
<span class="c">// return an error.</span>
<span class="k">fn</span> <span class="nf">serverclosed_stream</span><span class="p">(</span><span class="n">stream</span><span class="p">:</span> <span class="o">&</span><span class="nn">std</span><span class="p">::</span><span class="nn">net</span><span class="p">::</span><span class="n">TcpStream</span><span class="p">)</span> <span class="k">-></span> <span class="nn">io</span><span class="p">::</span><span class="n">Result</span><span class="o"><</span><span class="nb">bool</span><span class="o">></span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">buf</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">;</span> <span class="mi">1</span><span class="p">];</span>
<span class="n">stream</span><span class="nf">.set_nonblocking</span><span class="p">(</span><span class="k">true</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="k">let</span> <span class="n">result</span> <span class="o">=</span> <span class="k">match</span> <span class="n">stream</span><span class="nf">.peek</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span> <span class="n">buf</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">Ok</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="k">=></span> <span class="p">{</span>
<span class="nd">debug!</span><span class="p">(</span>
<span class="s">"peek on reused connection returned {}, not WouldBlock; discarding"</span><span class="p">,</span>
<span class="n">n</span>
<span class="p">);</span>
<span class="nf">Ok</span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
<span class="p">}</span>
<span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">if</span> <span class="n">e</span><span class="nf">.kind</span><span class="p">()</span> <span class="o">==</span> <span class="nn">io</span><span class="p">::</span><span class="nn">ErrorKind</span><span class="p">::</span><span class="n">WouldBlock</span> <span class="k">=></span> <span class="nf">Ok</span><span class="p">(</span><span class="k">false</span><span class="p">),</span>
<span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">=></span> <span class="nf">Err</span><span class="p">(</span><span class="n">e</span><span class="p">),</span>
<span class="p">};</span>
<span class="n">stream</span><span class="nf">.set_nonblocking</span><span class="p">(</span><span class="k">false</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="n">result</span>
<span class="p">}</span>
</code></pre></div></div>
<p>将 stream 设置成非 blocking,调用 <code class="language-plaintext highlighter-rouge">peak</code> 判断是否对端已经断开连接或者有残留数据。</p>
<h2 id="参考">参考</h2>
<p><a href="https://github.com/algesten/ureq/blob/main/src/unit.rs">https://github.com/algesten/ureq/blob/main/src/unit.rs</a></p>
<p><a href="https://github.com/algesten/ureq/blob/main/src/stream.rs">https://github.com/algesten/ureq/blob/main/src/stream.rs</a></p>Sharp Liu分析 rust 实现 TCP idle 连接池ucontext实现mini协程库与优化2022-02-02T00:00:00+00:002022-02-02T00:00:00+00:00https://cppcoffee.github.io/system/program/2022/02/02/ucontext%E5%AE%9E%E7%8E%B0mini%E5%8D%8F%E7%A8%8B%E5%BA%93%E4%B8%8E%E4%BC%98%E5%8C%96<p>ucontext实现mini协程库与优化</p>
<h2 id="简介">简介</h2>
<p>Linux 下提供 <code class="language-plaintext highlighter-rouge">ucontext</code> 系列 API 来实现协程(coroutine)操作,协程可以由开发者实现调度。</p>
<p><code class="language-plaintext highlighter-rouge">ucontent</code> 是 <code class="language-plaintext highlighter-rouge">setjmp</code>/<code class="language-plaintext highlighter-rouge">longjmp</code> 的高级版,支持携带参数调用。</p>
<p><code class="language-plaintext highlighter-rouge">ucontext</code> APIs:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include <ucontext.h>
int getcontext(ucontext_t *ucp);
int setcontext(const ucontext_t *ucp);
void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...);
int swapcontext(ucontext_t *restrict oucp, const ucontext_t *restrict ucp);
</code></pre></div></div>
<p>使用 ucontext 系列 API 实现协程库需要实现基本的 coroutine <code class="language-plaintext highlighter-rouge">yield</code>/<code class="language-plaintext highlighter-rouge">resume</code> 接口,其中</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">resume</code>: 重新执行协程暂停的位置</li>
<li><code class="language-plaintext highlighter-rouge">yield</code>: 在当前点暂停协程的执行</li>
</ul>
<h2 id="实现">实现</h2>
<h3 id="coroutine-状态">coroutine 状态</h3>
<p>协程状态分成四种,定义四种协程状态</p>
<ol>
<li>准备就绪(ready)</li>
<li>运行中(resume)</li>
<li>暂停中(yield)</li>
<li>运行完成(done)</li>
</ol>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">enum</span> <span class="p">{</span>
<span class="n">COROUTINE_READY</span><span class="p">,</span>
<span class="n">COROUTINE_RUNNING</span><span class="p">,</span>
<span class="n">COROUTINE_SUSPEND</span><span class="p">,</span>
<span class="n">COROUTINE_DEAD</span><span class="p">,</span>
<span class="p">}</span> <span class="n">coroutine_status_e</span><span class="p">;</span>
</code></pre></div></div>
<h3 id="coroutine-结构体">coroutine 结构体</h3>
<p>协程结构体需要包含协程栈大小和协程相关状态,使用 stack_id 用于解决使用 valgrind 跟踪出现的栈变动警告。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
<span class="n">ucontext_t</span> <span class="n">main</span><span class="p">;</span>
<span class="n">ucontext_t</span> <span class="n">ctx</span><span class="p">;</span>
<span class="c1">// 协程执行入口函数与参数</span>
<span class="n">coroutine_pt</span> <span class="n">func</span><span class="p">;</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">ud</span><span class="p">;</span>
<span class="c1">// 协程栈指针与栈大小</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">stack</span><span class="p">;</span>
<span class="kt">size_t</span> <span class="n">stack_size</span><span class="p">;</span>
<span class="c1">// 协程运行状态</span>
<span class="n">coroutine_status_e</span> <span class="n">status</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">stack_id</span><span class="p">;</span>
<span class="c1">// 协程是否运行完成</span>
<span class="kt">unsigned</span> <span class="n">done</span><span class="o">:</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span> <span class="n">coroutine_t</span><span class="p">;</span>
</code></pre></div></div>
<h3 id="coroutine_create">coroutine_create</h3>
<p>创建协程,指定协程运行函数的入口与参数,还有协程运行需要的栈大小。</p>
<p>如果指定栈大小为0,就使用 <code class="language-plaintext highlighter-rouge">SIGSTKSZ</code> 定义的大小。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">coroutine_t</span> <span class="o">*</span>
<span class="nf">coroutine_create</span><span class="p">(</span><span class="n">coroutine_pt</span> <span class="n">fn</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ud</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">stack_size</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">coroutine_t</span> <span class="o">*</span><span class="n">co</span><span class="p">;</span>
<span class="kt">size_t</span> <span class="n">size</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">stack_size</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">stack_size</span> <span class="o">=</span> <span class="n">SIGSTKSZ</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">co</span><span class="p">)</span> <span class="o">+</span> <span class="n">stack_size</span><span class="p">;</span>
<span class="n">co</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="n">size</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">memset</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">co</span><span class="p">));</span>
<span class="c1">// 设置协程执行入口函数和参数</span>
<span class="n">co</span><span class="o">-></span><span class="n">func</span> <span class="o">=</span> <span class="n">fn</span><span class="p">;</span>
<span class="n">co</span><span class="o">-></span><span class="n">ud</span> <span class="o">=</span> <span class="n">ud</span><span class="p">;</span>
<span class="c1">// 栈与栈大小</span>
<span class="n">co</span><span class="o">-></span><span class="n">stack</span> <span class="o">=</span> <span class="n">co</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">co</span><span class="o">-></span><span class="n">stack_size</span> <span class="o">=</span> <span class="n">stack_size</span><span class="p">;</span>
<span class="n">co</span><span class="o">-></span><span class="n">status</span> <span class="o">=</span> <span class="n">COROUTINE_READY</span><span class="p">;</span>
<span class="n">co</span><span class="o">-></span><span class="n">stack_id</span> <span class="o">=</span> <span class="n">VALGRIND_STACK_REGISTER</span><span class="p">(</span><span class="n">co</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="p">)</span> <span class="n">co</span> <span class="o">+</span> <span class="n">size</span><span class="p">);</span>
<span class="k">return</span> <span class="n">co</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="coroutine_resume">coroutine_resume</h3>
<p>协程切换/调度,恢复协程运行,并更新协程状态。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">coroutine_resume</span><span class="p">(</span><span class="n">coroutine_t</span> <span class="o">*</span><span class="n">co</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">status</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">COROUTINE_READY</span><span class="p">:</span>
<span class="k">if</span> <span class="p">(</span><span class="n">getcontext</span><span class="p">(</span><span class="o">&</span><span class="n">co</span><span class="o">-></span><span class="n">ctx</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">CO_ERROR</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">co</span><span class="o">-></span><span class="n">status</span> <span class="o">=</span> <span class="n">COROUTINE_RUNNING</span><span class="p">;</span>
<span class="n">co</span><span class="o">-></span><span class="n">ctx</span><span class="p">.</span><span class="n">uc_stack</span><span class="p">.</span><span class="n">ss_sp</span> <span class="o">=</span> <span class="n">co</span><span class="o">-></span><span class="n">stack</span><span class="p">;</span>
<span class="n">co</span><span class="o">-></span><span class="n">ctx</span><span class="p">.</span><span class="n">uc_stack</span><span class="p">.</span><span class="n">ss_size</span> <span class="o">=</span> <span class="n">co</span><span class="o">-></span><span class="n">stack_size</span><span class="p">;</span>
<span class="n">co</span><span class="o">-></span><span class="n">ctx</span><span class="p">.</span><span class="n">uc_stack</span><span class="p">.</span><span class="n">ss_flags</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">co</span><span class="o">-></span><span class="n">ctx</span><span class="p">.</span><span class="n">uc_link</span> <span class="o">=</span> <span class="o">&</span><span class="n">co</span><span class="o">-></span><span class="n">main</span><span class="p">;</span>
<span class="c1">// 协程主入口 coroutine_mainfunc</span>
<span class="n">makecontext</span><span class="p">(</span><span class="o">&</span><span class="n">co</span><span class="o">-></span><span class="n">ctx</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span> <span class="p">(</span><span class="o">*</span><span class="p">)(</span><span class="kt">void</span><span class="p">))</span> <span class="n">coroutine_mainfunc</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">co</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">swapcontext</span><span class="p">(</span><span class="o">&</span><span class="n">co</span><span class="o">-></span><span class="n">main</span><span class="p">,</span> <span class="o">&</span><span class="n">co</span><span class="o">-></span><span class="n">ctx</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">CO_ERROR</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="n">COROUTINE_SUSPEND</span><span class="p">:</span>
<span class="n">co</span><span class="o">-></span><span class="n">status</span> <span class="o">=</span> <span class="n">COROUTINE_RUNNING</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">swapcontext</span><span class="p">(</span><span class="o">&</span><span class="n">co</span><span class="o">-></span><span class="n">main</span><span class="p">,</span> <span class="o">&</span><span class="n">co</span><span class="o">-></span><span class="n">ctx</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">CO_ERROR</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">break</span><span class="p">;</span>
<span class="nl">default:</span>
<span class="cm">/* unreachable */</span>
<span class="n">assert</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">done</span><span class="p">)</span> <span class="p">{</span>
<span class="n">coroutine_destroy</span><span class="p">(</span><span class="n">co</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">CO_OK</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="coroutine_mainfunc">coroutine_mainfunc</h3>
<p>协程运行的入口函数,间接的调用传递的入口函数,并设置协程完成标识位。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span>
<span class="nf">coroutine_mainfunc</span><span class="p">(</span><span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">coroutine_t</span> <span class="o">*</span><span class="n">co</span> <span class="o">=</span> <span class="n">data</span><span class="p">;</span>
<span class="n">co</span><span class="o">-></span><span class="n">func</span><span class="p">(</span><span class="n">co</span><span class="o">-></span><span class="n">ud</span><span class="p">);</span>
<span class="n">co</span><span class="o">-></span><span class="n">done</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="coroutine_yield">coroutine_yield</h3>
<p>协程暂停,切换到 main context 运行</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span>
<span class="nf">coroutine_yield</span><span class="p">(</span><span class="n">coroutine_t</span> <span class="o">*</span><span class="n">co</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">co</span><span class="o">-></span><span class="n">status</span> <span class="o">=</span> <span class="n">COROUTINE_SUSPEND</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">swapcontext</span><span class="p">(</span><span class="o">&</span><span class="n">co</span><span class="o">-></span><span class="n">ctx</span><span class="p">,</span> <span class="o">&</span><span class="n">co</span><span class="o">-></span><span class="n">main</span><span class="p">)</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">CO_ERROR</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">CO_OK</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="协程优化">协程优化</h2>
<p>协程运行会频繁的调用 <code class="language-plaintext highlighter-rouge">swapcontext</code> 与 <code class="language-plaintext highlighter-rouge">getcontext</code>,如果继续使用 <code class="language-plaintext highlighter-rouge">ucontext</code> 系列结构,那么精简 <code class="language-plaintext highlighter-rouge">ucontext</code> 调用的汇编指令会是优化的关键</p>
<ol>
<li>移除 <code class="language-plaintext highlighter-rouge">swapcontext</code> 内部调用设置的 <code class="language-plaintext highlighter-rouge">sig_flags</code> API 操作</li>
<li>移除参数寄存器 (x64 上面是 RDI, RDX, RCX, R8, R9 and RSI) 操作</li>
<li>移除浮点数寄存器操作</li>
</ol>
<h3 id="ucontext_ih">ucontext_i.h</h3>
<p>定义寄存器存储的偏移量</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define SIG_BLOCK 0
#define SIG_SETMASK 2
#define _NSIG8 8
#define oRBP 120
#define oRSP 160
#define oRBX 128
#define oR8 40
#define oR9 48
#define oR10 56
#define oR11 64
#define oR12 72
#define oR13 80
#define oR14 88
#define oR15 96
#define oRDI 104
#define oRSI 112
#define oRDX 136
#define oRAX 144
#define oRCX 152
#define oRIP 168
#define oEFL 176
#define oFPREGS 224
#define oSIGMASK 296
#define oFPREGSMEM 424
#define oMXCSR 448
</span></code></pre></div></div>
<h3 id="lightweight_getcontext">lightweight_getcontext</h3>
<p>轻量级的 getcontext 实现</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">lightweight_getcontext</span><span class="p">.</span><span class="n">S</span>
<span class="cp">#include "ucontext_i.h"
</span>
<span class="p">.</span><span class="n">globl</span> <span class="n">lightweight_getcontext</span><span class="p">;</span>
<span class="p">.</span><span class="n">type</span> <span class="n">lightweight_getcontext</span><span class="p">,</span> <span class="err">@</span><span class="n">function</span><span class="p">;</span>
<span class="n">lightweight_getcontext</span><span class="o">:</span>
<span class="p">.</span><span class="n">cfi_startproc</span><span class="p">;</span>
<span class="cm">/* Save the preserved registers, the registers used for passing
args, and the return address. */</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rbx</span><span class="p">,</span> <span class="n">oRBX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rbp</span><span class="p">,</span> <span class="n">oRBP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r12</span><span class="p">,</span> <span class="n">oR12</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r13</span><span class="p">,</span> <span class="n">oR13</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r14</span><span class="p">,</span> <span class="n">oR14</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r15</span><span class="p">,</span> <span class="n">oR15</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rdi</span><span class="p">,</span> <span class="n">oRDI</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rsi</span><span class="p">,</span> <span class="n">oRSI</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rdx</span><span class="p">,</span> <span class="n">oRDX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRCX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r8</span><span class="p">,</span> <span class="n">oR8</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r9</span><span class="p">,</span> <span class="n">oR9</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRIP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">leaq</span> <span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="cm">/* Exclude the return address. */</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRSP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="cm">/* We have separate floating-point register content memory on the
stack. We use the __fpregs_mem block in the context. Set the
links up correctly. */</span>
<span class="n">leaq</span> <span class="n">oFPREGSMEM</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oFPREGS</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="cm">/* Save the floating-point environment. */</span>
<span class="n">fnstenv</span> <span class="p">(</span><span class="o">%</span><span class="n">rcx</span><span class="p">)</span>
<span class="n">stmxcsr</span> <span class="n">oMXCSR</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="cm">/* Formerly here: a call to sigprocmask.
Deleted because unnecessary for our application. */</span>
<span class="cm">/* All done, return 0 for success. */</span>
<span class="n">xorl</span> <span class="o">%</span><span class="n">eax</span><span class="p">,</span> <span class="o">%</span><span class="n">eax</span>
<span class="n">ret</span>
<span class="p">.</span><span class="n">cfi_endproc</span><span class="p">;</span>
</code></pre></div></div>
<h3 id="lightweight_swapcontext">lightweight_swapcontext</h3>
<p>轻量级的 swapcontext 实现,移除了注册信号的系统调用</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include "ucontext_i.h"
</span>
<span class="p">.</span><span class="n">globl</span> <span class="n">lightweight_swapcontext</span><span class="p">;</span>
<span class="p">.</span><span class="n">type</span> <span class="n">lightweight_swapcontext</span><span class="p">,</span> <span class="err">@</span><span class="n">function</span><span class="p">;</span>
<span class="n">lightweight_swapcontext</span><span class="o">:</span>
<span class="p">.</span><span class="n">cfi_startproc</span><span class="p">;</span>
<span class="cm">/* Save the preserved registers, the registers used for passing args,
and the return address. */</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rbx</span><span class="p">,</span> <span class="n">oRBX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rbp</span><span class="p">,</span> <span class="n">oRBP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r12</span><span class="p">,</span> <span class="n">oR12</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r13</span><span class="p">,</span> <span class="n">oR13</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r14</span><span class="p">,</span> <span class="n">oR14</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r15</span><span class="p">,</span> <span class="n">oR15</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="cm">/* Don't bother saving and restoring argument registers */</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rdi</span><span class="p">,</span> <span class="n">oRDI</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rsi</span><span class="p">,</span> <span class="n">oRSI</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rdx</span><span class="p">,</span> <span class="n">oRDX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRCX</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r8</span><span class="p">,</span> <span class="n">oR8</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">r9</span><span class="p">,</span> <span class="n">oR9</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">movq</span> <span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRIP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="n">leaq</span> <span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="n">rsp</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span> <span class="cm">/* Exclude the return address. */</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oRSP</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="cm">/* We have separate floating-point register content memory on the
stack. We use the __fpregs_mem block in the context. Set the
links up correctly. */</span>
<span class="n">leaq</span> <span class="n">oFPREGSMEM</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span>
<span class="n">movq</span> <span class="o">%</span><span class="n">rcx</span><span class="p">,</span> <span class="n">oFPREGS</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="cm">/* Save the floating-point environment. */</span>
<span class="n">fnstenv</span> <span class="p">(</span><span class="o">%</span><span class="n">rcx</span><span class="p">)</span>
<span class="n">stmxcsr</span> <span class="n">oMXCSR</span><span class="p">(</span><span class="o">%</span><span class="n">rdi</span><span class="p">)</span>
<span class="cm">/* Formerly here: a call to sigprocmask.
Deleted because unnecessary for our application. */</span>
<span class="cm">/* Restore the floating-point context. Not the registers, only the
rest. */</span>
<span class="n">movq</span> <span class="n">oFPREGS</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span>
<span class="n">fldenv</span> <span class="p">(</span><span class="o">%</span><span class="n">rcx</span><span class="p">)</span>
<span class="n">ldmxcsr</span> <span class="n">oMXCSR</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">)</span>
<span class="cm">/* Load the new stack pointer and the preserved registers. */</span>
<span class="n">movq</span> <span class="n">oRSP</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rsp</span>
<span class="n">movq</span> <span class="n">oRBX</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rbx</span>
<span class="n">movq</span> <span class="n">oRBP</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rbp</span>
<span class="n">movq</span> <span class="n">oR12</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r12</span>
<span class="n">movq</span> <span class="n">oR13</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r13</span>
<span class="n">movq</span> <span class="n">oR14</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r14</span>
<span class="n">movq</span> <span class="n">oR15</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r15</span>
<span class="cm">/* The following ret should return to the address set with
getcontext. Therefore push the address on the stack. */</span>
<span class="n">movq</span> <span class="n">oRIP</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span>
<span class="n">pushq</span> <span class="o">%</span><span class="n">rcx</span>
<span class="cm">/* Setup registers used for passing args--don't bother with this */</span>
<span class="n">movq</span> <span class="n">oRDI</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rdi</span>
<span class="n">movq</span> <span class="n">oRDX</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rdx</span>
<span class="n">movq</span> <span class="n">oRCX</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rcx</span>
<span class="n">movq</span> <span class="n">oR8</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r8</span>
<span class="n">movq</span> <span class="n">oR9</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">r9</span>
<span class="cm">/* Setup finally %rsi. */</span>
<span class="n">movq</span> <span class="n">oRSI</span><span class="p">(</span><span class="o">%</span><span class="n">rsi</span><span class="p">),</span> <span class="o">%</span><span class="n">rsi</span>
<span class="cm">/* Clear rax to indicate success. */</span>
<span class="n">xorl</span> <span class="o">%</span><span class="n">eax</span><span class="p">,</span> <span class="o">%</span><span class="n">eax</span>
<span class="n">ret</span>
<span class="p">.</span><span class="n">cfi_endproc</span>
</code></pre></div></div>
<h2 id="references">References</h2>
<p><a href="https://github.com/cppcoffee/coroutine">https://github.com/cppcoffee/coroutine</a></p>
<p><a href="https://man7.org/linux/man-pages/man3/swapcontext.3.html">https://man7.org/linux/man-pages/man3/swapcontext.3.html</a></p>
<p><a href="https://github.com/cloudwu/coroutine">https://github.com/cloudwu/coroutine</a></p>
<p><a href="https://rethinkdb.com/blog/making-coroutines-fast/">https://rethinkdb.com/blog/making-coroutines-fast/</a></p>Sharp Liuucontext实现mini协程库与优化 简介IP防火墙 – XDP实现2021-10-17T00:00:00+00:002021-10-17T00:00:00+00:00https://cppcoffee.github.io/linux/kernel/2021/10/17/IP%E9%98%B2%E7%81%AB%E5%A2%99--XDP%E5%AE%9E%E7%8E%B0<p>IP防火墙 – XDP实现</p>
<h3 id="xdp-简介">XDP 简介</h3>
<p>XDP 在 linux 4.8 版本内核中引入,在位于数据包接受最早的数据点(还未分配 <code class="language-plaintext highlighter-rouge">struct __sk_buff</code>),可以直接对数据包改写、丢弃或转发等操作。</p>
<p>本文将用户层传递进来的规则进行操作(丢弃 或 允许),来实现 IP 防火墙的功能。</p>
<h3 id="ip-block">IP Block</h3>
<p>实现分成两部分,用户接口部分 与 内核部分。</p>
<p>用户接口提供两个程序,分别是 加载器 和 IP规则修改:</p>
<p><strong>ipblock-loader</strong>: XDP 加载器,将 IP Block XDP Prog 挂载到内核中:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># attach IP Block to eth2</span>
./ipblock-loader <span class="nt">-d</span> eth2
<span class="c"># detach IP Block from eth2</span>
./ipblock-loader <span class="nt">-d</span> eth2 <span class="nt">-u</span>
</code></pre></div></div>
<p><strong>ipblock-rule</strong>: 通过 XDP 暴露的 MAP 结构,变更 IP Block 规则:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># droping IP packets for the ::ffff:c612:13/128</span>
<span class="nv">$ </span>./ipblock-rule <span class="nt">-a</span> ::ffff:c612:13/128 <span class="nt">-p</span> deny
<span class="c"># allow IP packets for the 192.168.31.0/24</span>
<span class="nv">$ </span>./ipblock-rule <span class="nt">-a</span> 192.168.31.0/24 <span class="nt">-p</span> allow
<span class="c"># delete rules</span>
<span class="nv">$ </span>./ipblock-rule <span class="nt">-d</span> ::ffff:c612:13/128
<span class="nv">$ </span>./ipblock-rule <span class="nt">-d</span> 192.168.31.0/24
</code></pre></div></div>
<h4 id="map-存储结构">Map 存储结构</h4>
<p>ipblock XDP 程序里定义 IPv4 和 IPv6 两个类型的前缀树 map,方便应用层调用 bpf helper API 进行操作。</p>
<p>map key 类型使用 <code class="language-plaintext highlighter-rouge">bpf_lpm_triekey</code> + <code class="language-plaintext highlighter-rouge">sockaddr</code>
map value 类型为 <code class="language-plaintext highlighter-rouge">enum xdp_action</code></p>
<p>IPv4 sockaddr 使用 <code class="language-plaintext highlighter-rouge">uint32_t</code> 类型存放(与 <code class="language-plaintext highlighter-rouge">struct in_addr</code> 类型的内存模型一致)
IPv6 sockaddr 使用 <code class="language-plaintext highlighter-rouge">struct in6_addr</code></p>
<p>如下所示:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">lpm_v4_key</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">bpf_lpm_trie_key</span> <span class="n">lpm</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">addr</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="n">lpm_v6_key</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">bpf_lpm_trie_key</span> <span class="n">lpm</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">in6_addr</span> <span class="n">addr</span><span class="p">;</span>
<span class="p">};</span>
<span class="c1">// IPv4 map</span>
<span class="k">struct</span> <span class="p">{</span>
<span class="n">__uint</span><span class="p">(</span><span class="n">type</span><span class="p">,</span> <span class="n">BPF_MAP_TYPE_LPM_TRIE</span><span class="p">);</span>
<span class="n">__uint</span><span class="p">(</span><span class="n">max_entries</span><span class="p">,</span> <span class="n">MAX_RULES</span><span class="p">);</span>
<span class="n">__type</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="k">struct</span> <span class="n">lpm_v4_key</span><span class="p">);</span>
<span class="n">__type</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="k">enum</span> <span class="n">xdp_action</span><span class="p">);</span>
<span class="n">__uint</span><span class="p">(</span><span class="n">map_flags</span><span class="p">,</span> <span class="n">BPF_F_NO_PREALLOC</span><span class="p">);</span>
<span class="p">}</span> <span class="n">ipv4_map</span> <span class="nf">SEC</span><span class="p">(</span><span class="s">".maps"</span><span class="p">);</span>
<span class="c1">// IPv6 map</span>
<span class="k">struct</span> <span class="p">{</span>
<span class="n">__uint</span><span class="p">(</span><span class="n">type</span><span class="p">,</span> <span class="n">BPF_MAP_TYPE_LPM_TRIE</span><span class="p">);</span>
<span class="n">__uint</span><span class="p">(</span><span class="n">max_entries</span><span class="p">,</span> <span class="n">MAX_RULES</span><span class="p">);</span>
<span class="n">__type</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="k">struct</span> <span class="n">lpm_v6_key</span><span class="p">);</span>
<span class="n">__type</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="k">enum</span> <span class="n">xdp_action</span><span class="p">);</span>
<span class="n">__uint</span><span class="p">(</span><span class="n">map_flags</span><span class="p">,</span> <span class="n">BPF_F_NO_PREALLOC</span><span class="p">);</span>
<span class="p">}</span> <span class="n">ipv6_map</span> <span class="nf">SEC</span><span class="p">(</span><span class="s">".maps"</span><span class="p">);</span>
</code></pre></div></div>
<h4 id="xdp-实现逻辑">XDP 实现逻辑</h4>
<p>由于解析部分代码重复性比较多,做成了宏,简化重复的代码</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define PARSE_FUNC_DECLARATION(STRUCT) \
static __always_inline \
struct STRUCT *parse_ ## STRUCT (struct cursor *c) \
{ \
struct STRUCT *ret = c->pos; \
if (c->pos + sizeof(struct STRUCT) > c->end) { \
return NULL; \
} \
c->pos += sizeof(struct STRUCT); \
return ret; \
}
</span>
<span class="n">PARSE_FUNC_DECLARATION</span><span class="p">(</span><span class="n">ethhdr</span><span class="p">)</span>
<span class="n">PARSE_FUNC_DECLARATION</span><span class="p">(</span><span class="n">vlanhdr</span><span class="p">)</span>
<span class="n">PARSE_FUNC_DECLARATION</span><span class="p">(</span><span class="n">iphdr</span><span class="p">)</span>
<span class="n">PARSE_FUNC_DECLARATION</span><span class="p">(</span><span class="n">ipv6hdr</span><span class="p">)</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">struct cursor</code> 使用保存了待解析的数据位置。</p>
<p><code class="language-plaintext highlighter-rouge">PARSE_FUNC_DECLARATION(iphdr)</code> 宏定义展开后,生成如下代码:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="n">__always_inline</span>
<span class="k">struct</span> <span class="n">iphdr</span> <span class="o">*</span><span class="nf">parse_iphdr</span><span class="p">(</span><span class="k">struct</span> <span class="n">cursor</span> <span class="o">*</span><span class="n">c</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">iphdr</span> <span class="o">*</span><span class="n">ret</span> <span class="o">=</span> <span class="n">c</span><span class="o">-></span><span class="n">pos</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">c</span><span class="o">-></span><span class="n">pos</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">iphdr</span><span class="p">)</span> <span class="o">></span> <span class="n">c</span><span class="o">-></span><span class="n">end</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">c</span><span class="o">-></span><span class="n">pos</span> <span class="o">+=</span> <span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">iphdr</span><span class="p">);</span>
<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>以下是数据包处理逻辑:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">SEC</span><span class="p">(</span><span class="s">"xdp"</span><span class="p">)</span>
<span class="kt">int</span> <span class="nf">xdp_prog</span><span class="p">(</span><span class="k">struct</span> <span class="n">xdp_md</span> <span class="o">*</span><span class="n">ctx</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">rc</span> <span class="o">=</span> <span class="n">XDP_PASS</span><span class="p">;</span>
<span class="n">cursor_init</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">,</span> <span class="n">ctx</span><span class="p">);</span>
<span class="c1">// 解析 eth header</span>
<span class="n">eth</span> <span class="o">=</span> <span class="n">parse_eth</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">,</span> <span class="o">&</span><span class="n">eth_proto</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">eth</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">pass</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// 解析 IP header</span>
<span class="k">if</span> <span class="p">(</span><span class="n">eth_proto</span> <span class="o">==</span> <span class="n">bpf_htons</span><span class="p">(</span><span class="n">ETH_P_IP</span><span class="p">))</span> <span class="p">{</span>
<span class="n">iph</span> <span class="o">=</span> <span class="n">parse_iphdr</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">iph</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">pass</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// 从 ipv4 map 中拿到 action</span>
<span class="n">rc</span> <span class="o">=</span> <span class="n">ip_map_lookup_value</span><span class="p">(</span><span class="o">&</span><span class="n">ipv4_map</span><span class="p">,</span> <span class="n">iph</span><span class="o">-></span><span class="n">saddr</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">eth_proto</span> <span class="o">==</span> <span class="n">bpf_htons</span><span class="p">(</span><span class="n">ETH_P_IPV6</span><span class="p">))</span> <span class="p">{</span>
<span class="n">ip6h</span> <span class="o">=</span> <span class="n">parse_ipv6hdr</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ip6h</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">goto</span> <span class="n">pass</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// 从 ipv6 map 中拿到 action</span>
<span class="n">rc</span> <span class="o">=</span> <span class="n">ip6_map_lookup_value</span><span class="p">(</span><span class="o">&</span><span class="n">ipv6_map</span><span class="p">,</span> <span class="n">ip6h</span><span class="o">-></span><span class="n">saddr</span><span class="p">);</span>
<span class="p">}</span>
<span class="nl">pass:</span>
<span class="k">return</span> <span class="n">rc</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="ipblock-loader">ipblock-loader</h4>
<p>ipblock-loader 是 XDP 加载器,用于将 XDP program 挂载到指定网卡中。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span>
<span class="nf">do_load</span><span class="p">(</span><span class="k">struct</span> <span class="n">options</span> <span class="o">*</span><span class="n">opt</span><span class="p">,</span> <span class="k">struct</span> <span class="n">ipblock_bpf</span> <span class="o">*</span><span class="n">skel</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">err</span><span class="p">;</span>
<span class="c1">// 挂载 XDP 到指定网卡</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">xdp_link_attach</span><span class="p">(</span><span class="n">opt</span><span class="o">-></span><span class="n">ifindex</span><span class="p">,</span> <span class="n">opt</span><span class="o">-></span><span class="n">xdp_flags</span><span class="p">,</span>
<span class="n">bpf_program__fd</span><span class="p">(</span><span class="n">skel</span><span class="o">-></span><span class="n">progs</span><span class="p">.</span><span class="n">xdp_prog</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// PIN map 到 bpf fs 中</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">pin_maps_in_bpf_object</span><span class="p">(</span><span class="n">skel</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">err</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>挂载成功后,将 map PIN 到 bpf fs,路径分别为:</p>
<ul>
<li>/sys/fs/bpf/ipblock/ipv4_map</li>
<li>/sys/fs/bpf/ipblock/ipv6_map</li>
</ul>
<h4 id="ipblock-rule">ipblock-rule</h4>
<p>ipblock-rule 实现为规则控制程序,用于增删改规则</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">int</span>
<span class="nf">do_add_cmd</span><span class="p">(</span><span class="n">options_t</span> <span class="o">*</span><span class="n">opt</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="c1">// 根据 IP地址类型,打开对应的 bpf map</span>
<span class="n">fd</span> <span class="o">=</span> <span class="n">open_bpf_map</span><span class="p">(</span><span class="n">opt</span><span class="o">-></span><span class="n">cidr</span><span class="p">.</span><span class="n">af</span><span class="p">);</span>
<span class="p">...</span>
<span class="c1">// 设置 bpf_lpm_trie_key</span>
<span class="n">lpm</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">lpm</span><span class="p">)</span> <span class="o">+</span> <span class="n">opt</span><span class="o">-></span><span class="n">cidr</span><span class="p">.</span><span class="n">socklen</span><span class="p">);</span>
<span class="n">lpm</span><span class="o">-></span><span class="n">prefixlen</span> <span class="o">=</span> <span class="n">opt</span><span class="o">-></span><span class="n">cidr</span><span class="p">.</span><span class="n">prefixlen</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">lpm</span><span class="o">-></span><span class="n">data</span><span class="p">,</span> <span class="o">&</span><span class="n">opt</span><span class="o">-></span><span class="n">cidr</span><span class="p">.</span><span class="n">sockaddr</span><span class="p">,</span> <span class="n">opt</span><span class="o">-></span><span class="n">cidr</span><span class="p">.</span><span class="n">socklen</span><span class="p">);</span>
<span class="c1">// BPF_ANY 增加或更新规则</span>
<span class="k">if</span> <span class="p">(</span><span class="n">bpf_map_update_elem</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">lpm</span><span class="p">,</span> <span class="o">&</span><span class="n">opt</span><span class="o">-></span><span class="n">action</span><span class="p">,</span> <span class="n">BPF_ANY</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="p">...</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>详细代码在文末的 github 仓库链接中。</p>
<h3 id="reference">Reference</h3>
<p><a href="https://docs.cilium.io/en/v1.10/bpf/">BPF and XDP Reference Guide</a></p>
<p><a href="https://github.com/libbpf/libbpf">github libbpf</a></p>
<p><a href="https://facebookmicrosites.github.io/bpf/blog/2020/02/19/bpf-portability-and-co-re.html">BPF Portability and CO-RE</a></p>
<p><a href="https://github.com/cppcoffee/ipblock">https://github.com/cppcoffee/ipblock</a></p>Sharp LiuIP防火墙 – XDP实现Hugepage 内存分配器 – Rust实现2021-07-24T00:00:00+00:002021-07-24T00:00:00+00:00https://cppcoffee.github.io/system/program/2021/07/24/hugepage%E5%86%85%E5%AD%98%E5%88%86%E9%85%8D%E5%99%A8--rust%E5%AE%9E%E7%8E%B0<p>Hugepage 内存分配器 – Rust实现</p>
<h3 id="hugepage简介">HugePage简介</h3>
<p>Linux 默认内存页大小是 4KB(x86和x86_64),hugepage 的特性允许内核管理比默认内存页还要大的内存页(Huge Page)。</p>
<p>在 Linux 虚拟内存系统中维护一张 TLB(Translation Lookaside Buffer)的表,该表用于虚拟内存地址映射到物理内存地址。当系统需要访问一个虚拟内存位置时,需要进行 TLB 查找并进行地址转换。</p>
<p>启用 HugePages 后,系统使用更少的页表,减少了维护和访问页表的开销。Hugepages 保持在内存中,不被 swap,所以内核 swap 守护程序没有管理它们的工作,内核也不需要为它们执行页表查找。较少的页面数量减少了执行内存操作的开销,同时也减少了访问页表时出现瓶颈的可能性。</p>
<p>HugePage 在 x86 上是 4MB,x86_64 是 2MB。</p>
<p>关于 Hugepages 更详细的内容可以参考本文末尾的 References。</p>
<h3 id="hugepage-api">HugePage API</h3>
<p>Linux 提供 mmap(MAP_HUGETLB) 来分配 hugepages,如下调用分配 len 长度的 hugepages。flags 参数传递 MAP_HUGETLB:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">flags</span> <span class="o">=</span> <span class="n">MAP_PRIVATE</span> <span class="o">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="o">|</span> <span class="n">MAP_HUGETLB</span><span class="p">;</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">mmap</span><span class="p">(</span><span class="n">null_ptr</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="n">PROT_READ</span> <span class="o">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span> <span class="n">flags</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</code></pre></div></div>
<h3 id="allocator">Allocator</h3>
<p>接下来使用 rust 实现一个 hugepage 分配器</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">MEMINFO_PATH</span><span class="p">:</span> <span class="o">&</span><span class="nb">str</span> <span class="o">=</span> <span class="s">"/proc/meminfo"</span><span class="p">;</span>
<span class="k">const</span> <span class="n">TOKEN</span><span class="p">:</span> <span class="o">&</span><span class="nb">str</span> <span class="o">=</span> <span class="s">"Hugepagesize:"</span><span class="p">;</span>
<span class="nd">lazy_static!</span> <span class="p">{</span>
<span class="c">// 从 '/proc/meminfo' 中解析出 'Hugepagesize' 来初始化全局变量 HUGEPAGE_SIZE</span>
<span class="c">// HUGEPAGE_SIZE 用于 Allocator 分配内存时做对齐用。</span>
<span class="k">static</span> <span class="k">ref</span> <span class="n">HUGEPAGE_SIZE</span><span class="p">:</span> <span class="nb">isize</span> <span class="o">=</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">buf</span> <span class="o">=</span> <span class="nn">File</span><span class="p">::</span><span class="nf">open</span><span class="p">(</span><span class="n">MEMINFO_PATH</span><span class="p">)</span><span class="nf">.map_or</span><span class="p">(</span><span class="s">""</span><span class="nf">.to_owned</span><span class="p">(),</span> <span class="p">|</span><span class="k">mut</span> <span class="n">f</span><span class="p">|</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">s</span> <span class="o">=</span> <span class="nn">String</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
<span class="k">let</span> <span class="mi">_</span> <span class="o">=</span> <span class="n">f</span><span class="nf">.read_to_string</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span> <span class="n">s</span><span class="p">);</span>
<span class="n">s</span>
<span class="p">});</span>
<span class="nf">parse_hugepage_size</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">)</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="c">// 解析 Hugepagesize</span>
<span class="c">// meminfo 内容存在多行,需一行行找到 TOKEN='Hugepagesize:' 并对值进行解析</span>
<span class="k">fn</span> <span class="nf">parse_hugepage_size</span><span class="p">(</span><span class="n">s</span><span class="p">:</span> <span class="o">&</span><span class="nb">str</span><span class="p">)</span> <span class="k">-></span> <span class="nb">isize</span> <span class="p">{</span>
<span class="k">for</span> <span class="n">line</span> <span class="n">in</span> <span class="n">s</span><span class="nf">.lines</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// 找到 ‘Hugepagesize:’ 前缀</span>
<span class="k">if</span> <span class="n">line</span><span class="nf">.starts_with</span><span class="p">(</span><span class="n">TOKEN</span><span class="p">)</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">parts</span> <span class="o">=</span> <span class="n">line</span><span class="p">[</span><span class="n">TOKEN</span><span class="nf">.len</span><span class="p">()</span><span class="o">..</span><span class="p">]</span><span class="nf">.split_whitespace</span><span class="p">();</span>
<span class="c">// parse size</span>
<span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="n">parts</span><span class="nf">.next</span><span class="p">()</span><span class="nf">.unwrap_or</span><span class="p">(</span><span class="s">"0"</span><span class="p">);</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">hugepage_size</span> <span class="o">=</span> <span class="n">p</span><span class="py">.parse</span><span class="p">::</span><span class="o"><</span><span class="nb">isize</span><span class="o">></span><span class="p">()</span><span class="nf">.unwrap_or</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
<span class="c">// parse unit</span>
<span class="n">hugepage_size</span> <span class="o">*=</span> <span class="n">parts</span><span class="nf">.next</span><span class="p">()</span><span class="nf">.map_or</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">|</span><span class="n">x</span><span class="p">|</span> <span class="k">match</span> <span class="n">x</span> <span class="p">{</span>
<span class="c">// 当前支持 kB 解析</span>
<span class="s">"kB"</span> <span class="k">=></span> <span class="mi">1024</span><span class="p">,</span>
<span class="mi">_</span> <span class="k">=></span> <span class="mi">1</span><span class="p">,</span>
<span class="p">});</span>
<span class="k">return</span> <span class="n">hugepage_size</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>定义 Allocator 结构体,采用空结构体类型(不需要内部数据,所以无任何结构字段)</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span><span class="p">(</span><span class="n">crate</span><span class="p">)</span> <span class="k">struct</span> <span class="n">HugePageAllocator</span><span class="p">;</span>
</code></pre></div></div>
<p>使用 libc crate 提供的接口来调用 <strong>libc::mmap</strong>。那么接下来实现 std::alloc::GlobalAlloc trait:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// 实现 GlobalAlloc trait</span>
<span class="k">unsafe</span> <span class="k">impl</span> <span class="n">GlobalAlloc</span> <span class="k">for</span> <span class="n">HugePageAllocator</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">alloc</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">,</span> <span class="n">layout</span><span class="p">:</span> <span class="n">Layout</span><span class="p">)</span> <span class="k">-></span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span> <span class="p">{</span>
<span class="c">// 分配的内存大小需对齐 HUGEPAGE_SIZE,调用辅助函数 align_to</span>
<span class="k">let</span> <span class="n">len</span> <span class="o">=</span> <span class="nf">align_to</span><span class="p">(</span><span class="n">layout</span><span class="nf">.size</span><span class="p">(),</span> <span class="o">*</span><span class="n">HUGEPAGE_SIZE</span> <span class="k">as</span> <span class="nb">usize</span><span class="p">);</span>
<span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="nn">libc</span><span class="p">::</span><span class="nf">mmap</span><span class="p">(</span>
<span class="nf">null_mut</span><span class="p">(),</span>
<span class="n">len</span><span class="p">,</span>
<span class="n">PROT_READ</span> <span class="p">|</span> <span class="n">PROT_WRITE</span><span class="p">,</span>
<span class="n">MAP_PRIVATE</span> <span class="p">|</span> <span class="n">MAP_ANONYMOUS</span> <span class="p">|</span> <span class="n">MAP_HUGETLB</span><span class="p">,</span>
<span class="o">-</span><span class="mi">1</span><span class="p">,</span>
<span class="mi">0</span><span class="p">,</span>
<span class="p">);</span>
<span class="c">// 无法分配 hugepage 则返回 null.</span>
<span class="k">if</span> <span class="n">p</span> <span class="o">==</span> <span class="n">MAP_FAILED</span> <span class="p">{</span>
<span class="k">return</span> <span class="nf">null_mut</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">p</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span>
<span class="p">}</span>
<span class="c">// 删除时候也需要 layout 参数.</span>
<span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">dealloc</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">,</span> <span class="n">p</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="n">layout</span><span class="p">:</span> <span class="n">Layout</span><span class="p">)</span> <span class="p">{</span>
<span class="nn">libc</span><span class="p">::</span><span class="nf">munmap</span><span class="p">(</span><span class="n">p</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">c_void</span><span class="p">,</span> <span class="n">layout</span><span class="nf">.size</span><span class="p">());</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c">// 辅助函数,用于对其字节</span>
<span class="k">fn</span> <span class="nf">align_to</span><span class="p">(</span><span class="n">size</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="n">align</span><span class="p">:</span> <span class="nb">usize</span><span class="p">)</span> <span class="k">-></span> <span class="nb">usize</span> <span class="p">{</span>
<span class="p">(</span><span class="n">size</span> <span class="o">+</span> <span class="n">align</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">&</span> <span class="o">!</span><span class="p">(</span><span class="n">align</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>以上就完成了简单的 Hugepage 分配器。</p>
<h3 id="boxed">Boxed</h3>
<p>实现 Allocator 后,导出一个全局的 Allocator 给 Box 使用。</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// lib.rs 定义一个全局的 default_allocator() 接口,给整个 crate 使用。</span>
<span class="nd">lazy_static!</span> <span class="p">{</span>
<span class="k">static</span> <span class="k">ref</span> <span class="n">HUGEPAGE_ALLOCATOR</span><span class="p">:</span> <span class="n">HugePageAllocator</span> <span class="o">=</span> <span class="n">HugePageAllocator</span><span class="p">;</span>
<span class="p">}</span>
<span class="c">// 只暴露给自身 crate 调用</span>
<span class="k">pub</span><span class="p">(</span><span class="n">crate</span><span class="p">)</span> <span class="k">fn</span> <span class="nf">default_allocator</span><span class="p">()</span> <span class="k">-></span> <span class="o">&</span><span class="nv">'static</span> <span class="n">HugePageAllocator</span> <span class="p">{</span>
<span class="o">&</span><span class="n">HUGEPAGE_ALLOCATOR</span>
<span class="p">}</span>
</code></pre></div></div>
<p>实现一个简单的 Box,支持 deref 操作,过了 Box scope 后,自动释放,具体实现如下:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">pub</span> <span class="k">struct</span> <span class="nb">Box</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="n">data</span><span class="p">:</span> <span class="n">NonNull</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="nb">Box</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Box</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">let</span> <span class="n">layout</span> <span class="o">=</span> <span class="nn">Layout</span><span class="p">::</span><span class="nn">new</span><span class="p">::</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">();</span>
<span class="k">unsafe</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">p</span> <span class="o">=</span> <span class="nn">NonNull</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nf">default_allocator</span><span class="p">()</span><span class="nf">.alloc</span><span class="p">(</span><span class="n">layout</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">T</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="o">*</span><span class="p">(</span><span class="n">p</span><span class="nf">.as_mut</span><span class="p">())</span> <span class="o">=</span> <span class="n">data</span><span class="p">;</span>
<span class="n">Self</span> <span class="p">{</span> <span class="n">data</span><span class="p">:</span> <span class="n">p</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">pub</span> <span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">from_raw</span><span class="p">(</span><span class="n">raw</span><span class="p">:</span> <span class="o">*</span><span class="k">mut</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">Self</span> <span class="p">{</span>
<span class="n">Self</span> <span class="p">{</span>
<span class="n">data</span><span class="p">:</span> <span class="nn">NonNull</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">raw</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">(),</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="n">Drop</span> <span class="k">for</span> <span class="nb">Box</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="k">drop</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span>
<span class="nf">default_allocator</span><span class="p">()</span><span class="nf">.dealloc</span><span class="p">(</span><span class="k">self</span><span class="py">.data</span><span class="nf">.as_ptr</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">,</span> <span class="nn">Layout</span><span class="p">::</span><span class="nn">new</span><span class="p">::</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">());</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="n">Deref</span> <span class="k">for</span> <span class="nb">Box</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">type</span> <span class="n">Target</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span>
<span class="k">fn</span> <span class="nf">deref</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="o">&</span><span class="n">T</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span> <span class="k">self</span><span class="py">.data</span><span class="nf">.as_ref</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="n">DerefMut</span> <span class="k">for</span> <span class="nb">Box</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="nf">deref_mut</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="o">&</span><span class="k">mut</span> <span class="n">T</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span> <span class="k">self</span><span class="py">.data</span><span class="nf">.as_mut</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>更详细的代码在文末给的 github 仓库中。</p>
<h3 id="reference">Reference</h3>
<p><a href="https://lwn.net/Articles/374424/">Huge pages part 1 (Introduction)</a></p>
<p><a href="https://lwn.net/Articles/375096/">Huge pages part 2: Interfaces</a></p>
<p><a href="https://lwn.net/Articles/376606/">Huge pages part 3: Administration</a></p>
<p><a href="https://lwn.net/Articles/378641/">Huge pages part 4: benchmarking with huge pages</a></p>
<p><a href="https://lwn.net/Articles/379748/">Huge pages part 5: A deeper look at TLBs and costs</a></p>
<p><a href="https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt">https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt</a></p>
<p><a href="https://man7.org/linux/man-pages/man2/mmap.2.html">https://man7.org/linux/man-pages/man2/mmap.2.html</a></p>
<p><a href="https://github.com/cppcoffee/hugepage-rs">https://github.com/cppcoffee/hugepage-rs</a></p>Sharp LiuHugepage 内存分配器 – Rust实现自旋读写锁实现2021-05-13T00:00:00+00:002021-05-13T00:00:00+00:00https://cppcoffee.github.io/system/program/2021/05/13/%E8%87%AA%E6%97%8B%E8%AF%BB%E5%86%99%E9%94%81%E5%AE%9E%E7%8E%B0<p>自旋读写锁实现</p>
<h3 id="读写锁">读写锁</h3>
<p>读写锁是并发控制的一种同步机制,也称 “共享-互斥锁”、多读者-单写者锁。读操作可以并发重入,写操作是互斥的。</p>
<p>读写锁实现有多种方式,本文描述的是 <strong>自旋读写锁</strong> 的实现。</p>
<h3 id="优先策略">优先策略</h3>
<p>读写锁的策略分为:</p>
<ul>
<li>读操作优先:允许最大并发,但写操作可能饿死。</li>
<li>写操作优先:一旦所有已经开始的读操作完成,等待的写操作立即获得锁。</li>
<li>未指定优先级</li>
</ul>
<p>本文实现的读写锁策略是 <strong>写操作优先</strong></p>
<h3 id="自旋读写锁的设计">自旋读写锁的设计</h3>
<p>采用 uint64_t(64位整形)类型作为锁内部值。</p>
<p>写操作占用 1 位最高位,其余位用于读操作。</p>
<p>写操作位用十六进制表示为 0x8000000000000000,每次只能有一个写锁操作。退出时重置写操作位。</p>
<p>读操作位支持多个并发读操作,最高支持 0x7FFFFFFFFFFFFFFF 个读操作。每发生一次读锁定操作,则增加 1,退出时减少 1。</p>
<p><strong>备注</strong>:读写都使用 CAS 操作。</p>
<h3 id="实现">实现</h3>
<p>自旋读写锁 C 语言实现</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#include <assert.h>
</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">SHARED_LOCK_INIT</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">static</span> <span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="o">=</span> <span class="mi">1UL</span> <span class="o"><<</span> <span class="mi">63</span><span class="p">;</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">shared_rwlock_s</span> <span class="n">shared_rwlock_t</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">shared_rwlock_s</span> <span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">lock</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="nf">shared_lock_init</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">p</span><span class="o">-></span><span class="n">lock</span> <span class="o">=</span> <span class="n">SHARED_LOCK_INIT</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">shared_read_lock</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">value</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">p</span><span class="o">-></span><span class="n">lock</span><span class="p">;</span>
<span class="c1">// is wirte locked?</span>
<span class="k">if</span> <span class="p">(</span><span class="n">value</span> <span class="o">>=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">)</span> <span class="p">{</span>
<span class="k">continue</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// increase reader bit.</span>
<span class="k">if</span> <span class="p">(</span><span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">lock</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">value</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">shared_read_unlock</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">assert</span><span class="p">(</span><span class="n">p</span><span class="o">-></span><span class="n">lock</span> <span class="o">></span> <span class="n">SHARED_LOCK_INIT</span><span class="p">);</span>
<span class="n">__sync_sub_and_fetch</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">lock</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">shared_write_lock</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">value</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">p</span><span class="o">-></span><span class="n">lock</span><span class="p">;</span>
<span class="c1">// is wirte locked?</span>
<span class="k">if</span> <span class="p">(</span><span class="n">value</span> <span class="o">>=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">)</span> <span class="p">{</span>
<span class="k">continue</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// set write lock bit.</span>
<span class="k">if</span> <span class="p">(</span><span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">lock</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">value</span> <span class="o">|</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">))</span> <span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// wait for active readers.</span>
<span class="k">while</span> <span class="p">(</span><span class="n">p</span><span class="o">-></span><span class="n">lock</span> <span class="o">!=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* void */</span> <span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">shared_write_unlock</span><span class="p">(</span><span class="n">shared_rwlock_t</span> <span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">assert</span><span class="p">(</span><span class="n">p</span><span class="o">-></span><span class="n">lock</span> <span class="o">==</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">);</span>
<span class="n">__sync_sub_and_fetch</span><span class="p">(</span><span class="o">&</span><span class="n">p</span><span class="o">-></span><span class="n">lock</span><span class="p">,</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="附加功能">附加功能</h3>
<p>在锁内部增加一个循环等待上限值,当循环计数到达阈值时,仍然没有获得锁,让出当前 CPU 时间片。</p>
<p>伪代码</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">count</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">is_write_locked</span><span class="p">(</span><span class="o">&</span><span class="n">rwlock</span><span class="p">))</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">++</span><span class="n">count</span> <span class="o">>=</span> <span class="n">limit_rate</span><span class="p">)</span> <span class="p">{</span>
<span class="n">count</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">sched_yield</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="rust-实现版本">Rust 实现版本</h3>
<p>该版本读写锁使用 RAII 哨兵,并增加了 owner 字段,能够发现自身线程在使用过程中产生的死锁问题。</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">cell</span><span class="p">::</span><span class="n">UnsafeCell</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ops</span><span class="p">::{</span><span class="n">Deref</span><span class="p">,</span> <span class="n">DerefMut</span><span class="p">};</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="nn">atomic</span><span class="p">::{</span><span class="n">AtomicU64</span><span class="p">,</span> <span class="n">Ordering</span><span class="p">};</span>
<span class="k">use</span> <span class="nn">crate</span><span class="p">::{</span><span class="n">Error</span><span class="p">,</span> <span class="n">Result</span><span class="p">};</span>
<span class="c">// The writer lock bit.</span>
<span class="k">const</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">:</span> <span class="nb">u64</span> <span class="o">=</span> <span class="mi">1u64</span> <span class="o"><<</span> <span class="mi">63</span><span class="p">;</span>
<span class="k">unsafe</span> <span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="nb">Send</span> <span class="k">for</span> <span class="n">SharedLock</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{}</span>
<span class="k">unsafe</span> <span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="n">Sync</span> <span class="k">for</span> <span class="n">SharedLock</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{}</span>
<span class="cm">/*
* A reader-writer lock
*/</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">SharedLock</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="p">{</span>
<span class="n">inner</span><span class="p">:</span> <span class="n">AtomicU64</span><span class="p">,</span>
<span class="n">owner</span><span class="p">:</span> <span class="n">AtomicU64</span><span class="p">,</span>
<span class="n">data</span><span class="p">:</span> <span class="n">UnsafeCell</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="n">SharedLock</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">t</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">Self</span> <span class="p">{</span>
<span class="n">SharedLock</span> <span class="p">{</span>
<span class="n">inner</span><span class="p">:</span> <span class="nn">AtomicU64</span><span class="p">::</span><span class="nf">default</span><span class="p">(),</span>
<span class="n">owner</span><span class="p">:</span> <span class="nn">AtomicU64</span><span class="p">::</span><span class="nf">default</span><span class="p">(),</span>
<span class="n">data</span><span class="p">:</span> <span class="nn">UnsafeCell</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">t</span><span class="p">),</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">SharedLock</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">read</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="n">SharedLockReadGuard</span><span class="o"><</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">>></span> <span class="p">{</span>
<span class="nn">SharedLockReadGuard</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="k">self</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">write</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="n">SharedLockWriteGuard</span><span class="o"><</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">>></span> <span class="p">{</span>
<span class="nn">SharedLockWriteGuard</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="k">self</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">is_hold</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="nb">bool</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">tid</span> <span class="o">=</span> <span class="k">self</span><span class="py">.owner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">);</span>
<span class="n">tid</span> <span class="o">></span> <span class="mi">0</span> <span class="o">&&</span> <span class="n">tid</span> <span class="o">==</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="nn">libc</span><span class="p">::</span><span class="nf">pthread_self</span><span class="p">()</span> <span class="p">}</span> <span class="k">as</span> <span class="nb">u64</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">set_owner_id</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">,</span> <span class="n">tid</span><span class="p">:</span> <span class="nb">u64</span><span class="p">)</span> <span class="p">{</span>
<span class="k">self</span><span class="py">.owner</span><span class="nf">.store</span><span class="p">(</span><span class="n">tid</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="cm">/*
* RAII structure used to release the shared read access of a lock when dropped.
* This structure is created by the read methods on SharedLock.
*/</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">SharedLockReadGuard</span><span class="o"><</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span> <span class="o">+</span> <span class="nv">'a</span><span class="o">></span> <span class="p">{</span>
<span class="n">lock</span><span class="p">:</span> <span class="o">&</span><span class="nv">'a</span> <span class="n">SharedLock</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">SharedLockReadGuard</span><span class="o"><</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">lock</span><span class="p">:</span> <span class="o">&</span><span class="nv">'a</span> <span class="n">SharedLock</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">)</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="n">SharedLockReadGuard</span><span class="o"><</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">>></span> <span class="p">{</span>
<span class="k">if</span> <span class="n">lock</span><span class="nf">.is_hold</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">Error</span><span class="p">::</span><span class="n">DeadLockError</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">loop</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">value</span> <span class="o">=</span> <span class="n">lock</span><span class="py">.inner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">);</span>
<span class="k">if</span> <span class="n">value</span> <span class="o">>=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="p">{</span>
<span class="k">continue</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="n">lock</span>
<span class="py">.inner</span>
<span class="nf">.compare_exchange</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">value</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Relaxed</span><span class="p">)</span>
<span class="nf">.is_ok</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nf">Ok</span><span class="p">(</span><span class="n">SharedLockReadGuard</span> <span class="p">{</span> <span class="n">lock</span> <span class="p">})</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">Deref</span> <span class="k">for</span> <span class="n">SharedLockReadGuard</span><span class="o"><</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">type</span> <span class="n">Target</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span>
<span class="k">fn</span> <span class="nf">deref</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="o">&</span><span class="n">T</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span> <span class="o">&*</span><span class="k">self</span><span class="py">.lock.data</span><span class="nf">.get</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">Drop</span> <span class="k">for</span> <span class="n">SharedLockReadGuard</span><span class="o"><</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="k">drop</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span>
<span class="k">self</span><span class="py">.lock.inner</span><span class="nf">.fetch_sub</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="cm">/*
* RAII structure used to release the exclusive write access of a lock when dropped.
* This structure is created by the write methods on SharedLock.
*/</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">SharedLockWriteGuard</span><span class="o"><</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span> <span class="o">+</span> <span class="nv">'a</span><span class="o">></span> <span class="p">{</span>
<span class="n">lock</span><span class="p">:</span> <span class="o">&</span><span class="nv">'a</span> <span class="n">SharedLock</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">SharedLockWriteGuard</span><span class="o"><</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">lock</span><span class="p">:</span> <span class="o">&</span><span class="nv">'a</span> <span class="n">SharedLock</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">)</span> <span class="k">-></span> <span class="n">Result</span><span class="o"><</span><span class="n">SharedLockWriteGuard</span><span class="o"><</span><span class="nv">'a</span><span class="p">,</span> <span class="n">T</span><span class="o">>></span> <span class="p">{</span>
<span class="k">if</span> <span class="n">lock</span><span class="nf">.is_hold</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">Error</span><span class="p">::</span><span class="n">DeadLockError</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">loop</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">value</span> <span class="o">=</span> <span class="n">lock</span><span class="py">.inner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">);</span>
<span class="k">if</span> <span class="n">value</span> <span class="o">>=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="p">{</span>
<span class="k">continue</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="n">lock</span>
<span class="py">.inner</span>
<span class="nf">.compare_exchange</span><span class="p">(</span>
<span class="n">value</span><span class="p">,</span>
<span class="n">value</span> <span class="p">|</span> <span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">,</span>
<span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">,</span>
<span class="nn">Ordering</span><span class="p">::</span><span class="n">Relaxed</span><span class="p">,</span>
<span class="p">)</span>
<span class="nf">.is_ok</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="n">lock</span><span class="py">.owner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span> <span class="p">{</span>
<span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nn">Error</span><span class="p">::</span><span class="n">Poisoned</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">lock</span><span class="nf">.set_owner_id</span><span class="p">(</span><span class="k">unsafe</span> <span class="p">{</span> <span class="nn">libc</span><span class="p">::</span><span class="nf">pthread_self</span><span class="p">()</span> <span class="p">}</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">);</span>
<span class="c">// wait for active readers.</span>
<span class="k">while</span> <span class="n">lock</span><span class="py">.inner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">)</span> <span class="o">!=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="p">{}</span>
<span class="nf">Ok</span><span class="p">(</span><span class="n">SharedLockWriteGuard</span> <span class="p">{</span> <span class="n">lock</span> <span class="p">})</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">Deref</span> <span class="k">for</span> <span class="n">SharedLockWriteGuard</span><span class="o"><</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">type</span> <span class="n">Target</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span>
<span class="k">fn</span> <span class="nf">deref</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="o">&</span><span class="n">T</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span> <span class="o">&*</span><span class="k">self</span><span class="py">.lock.data</span><span class="nf">.get</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">Drop</span> <span class="k">for</span> <span class="n">SharedLockWriteGuard</span><span class="o"><</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="k">drop</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">value</span> <span class="o">=</span> <span class="k">self</span><span class="py">.lock.inner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">);</span>
<span class="k">if</span> <span class="n">value</span> <span class="o">!=</span> <span class="n">SHARED_LOCK_WRITER_BIT</span> <span class="p">{</span>
<span class="nd">panic!</span><span class="p">(</span><span class="s">"write unlock inner value: {}"</span><span class="p">,</span> <span class="n">value</span><span class="p">);</span>
<span class="p">}</span>
<span class="c">// reset owner id.</span>
<span class="k">if</span> <span class="o">!</span><span class="k">self</span><span class="py">.lock</span><span class="nf">.is_hold</span><span class="p">()</span> <span class="p">{</span>
<span class="nd">panic!</span><span class="p">(</span>
<span class="s">"Poisoned!!! owner id: {}"</span><span class="p">,</span>
<span class="k">self</span><span class="py">.lock.owner</span><span class="nf">.load</span><span class="p">(</span><span class="nn">Ordering</span><span class="p">::</span><span class="n">Acquire</span><span class="p">)</span>
<span class="p">);</span>
<span class="p">}</span>
<span class="k">self</span><span class="py">.lock</span><span class="nf">.set_owner_id</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="k">self</span><span class="py">.lock</span>
<span class="py">.inner</span>
<span class="nf">.fetch_sub</span><span class="p">(</span><span class="n">SHARED_LOCK_WRITER_BIT</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">Release</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="o">?</span><span class="n">Sized</span><span class="o">></span> <span class="n">DerefMut</span> <span class="k">for</span> <span class="n">SharedLockWriteGuard</span><span class="o"><</span><span class="nv">'_</span><span class="p">,</span> <span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="nf">deref_mut</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span> <span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="o">&</span><span class="k">mut</span> <span class="n">T</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span> <span class="o">&</span><span class="k">mut</span> <span class="o">*</span><span class="k">self</span><span class="py">.lock.data</span><span class="nf">.get</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="参考">参考</h3>
<p><a href="https://github.com/cppcoffee/sharelock-rs">https://github.com/cppcoffee/sharelock-rs</a></p>
<p><a href="https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock">https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock</a></p>Sharp Liu自旋读写锁实现Process crash print stacktrace – C Library2021-04-25T00:00:00+00:002021-04-25T00:00:00+00:00https://cppcoffee.github.io/system/program/2021/04/25/Process-Crash-Print-Stacktrace--C-Library<p>Process crash print stacktrace – C Library</p>
<h3 id="简述">简述</h3>
<p>在使用非内存安全,直接操作内存指针的计算机语言进行开发时,不免会碰到操作野指针、回收再访问的内存等等让进程崩溃的情况。</p>
<p>进程 crash 后,如果有开启 coredump 功能,linux 系统会 dump 进程相关信息到文件中。
在不安装 debug-info 源码包查看 coredump 产生的 core 崩溃的堆栈信息,可以使用如下 gdb 命令:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># batch mode 下,执行 bt 打印堆栈</span>
gdb <span class="nt">-batch</span> <span class="nt">-c</span> ./coredump-nginx-pid <span class="nt">-ex</span> bt /bin/nginx
</code></pre></div></div>
<p>coredump 开启后,碰到 crash 的进程占用较大内存时,导致 dump 进程数据到磁盘过程过长,机械磁盘负载会持续飙高。</p>
<p>但如果限制了 coredump 次数与 coredump 文件的大小,会导致某些条件的 coredump 无法被发现。</p>
<p>本文描述开发 crash 输出栈信息到 C Library 的实现。</p>
<h3 id="crash-调用栈信息输出">crash 调用栈信息输出</h3>
<p>如果进程 crash 后,将导致 crash 的调用栈信息输出到文件,这样可以方便查找问题。</p>
<p>主要逻辑如下:</p>
<ol>
<li>进程启动后,调用库的初始化,注册进程 crash 的信号处理</li>
<li>当 crash 发生后,调用信号处理函数</li>
<li>在处理函数中,将调用栈输出到 stderr</li>
<li>重新设置默认信号 handler,向上传递发生的信号</li>
</ol>
<p>该 C Library 依赖 libbfd (Binary File Descriptor library),使用它来解析 elf sections,找出调用栈函数名和代码行。</p>
<p>libbfd 由 binutils package 提供,在 CentOS 中,可以使用下列命令行进行安装:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>yum <span class="nb">install </span>binutils-devel <span class="nt">-y</span>
</code></pre></div></div>
<h3 id="主要实现">主要实现</h3>
<p>下列代码将 crash 的信息输出到 stderr。</p>
<p>更详细的代码见文末 github libstacktrace 仓库链接</p>
<p>部分主要逻辑如下:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define _GNU_SOURCE
#include <execinfo.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <signal.h>
#include <unistd.h>
</span>
<span class="cp">#include "symbol_table.h"
</span>
<span class="c1">// The max number of levels in the stack trace</span>
<span class="cp">#define STACK_TRACE_MAX_LEVELS 100
#define BUFFER_LENGTH 4096
</span>
<span class="k">typedef</span> <span class="nf">void</span> <span class="p">(</span><span class="o">*</span><span class="n">signal_handler_t</span><span class="p">)(</span><span class="kt">int</span> <span class="n">signo</span><span class="p">,</span> <span class="n">siginfo_t</span> <span class="o">*</span><span class="n">info</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">ctx</span><span class="p">);</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">register_crash_handlers</span><span class="p">();</span>
<span class="k">static</span> <span class="kt">int</span> <span class="nf">backtrace_symbol_write</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">text</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">addr</span><span class="p">);</span>
<span class="c1">// store process full path.</span>
<span class="k">static</span> <span class="kt">char</span> <span class="n">program_path</span><span class="p">[</span><span class="n">PATH_MAX</span><span class="p">];</span>
<span class="c1">// the current program binary symbol table.</span>
<span class="k">static</span> <span class="n">symbol_table_t</span> <span class="n">symtab</span><span class="p">;</span>
<span class="c1">// initialize stacktrace library.</span>
<span class="kt">int</span> <span class="nf">init_stacktrace</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">n</span><span class="p">;</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">readlink</span><span class="p">(</span><span class="s">"/proc/self/exe"</span><span class="p">,</span> <span class="n">program_path</span><span class="p">,</span> <span class="n">PATH_MAX</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o"><</span> <span class="mi">0</span> <span class="o">||</span> <span class="n">n</span> <span class="o">>=</span> <span class="n">PATH_MAX</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">program_path</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'\0'</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">symbol_table_build</span><span class="p">(</span><span class="n">program_path</span><span class="p">,</span> <span class="o">&</span><span class="n">symtab</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">register_crash_handlers</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span>
<span class="nf">stack_trace_dump</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">,</span> <span class="n">btl</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">**</span><span class="n">strings</span><span class="p">;</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">stack</span><span class="p">[</span><span class="n">STACK_TRACE_MAX_LEVELS</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
<span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">msg</span> <span class="o">=</span> <span class="s">" - STACK TRACE: </span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">STDERR_FILENO</span><span class="p">,</span> <span class="n">program_path</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">program_path</span><span class="p">))</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">STDERR_FILENO</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">msg</span><span class="p">))</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">memset</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">stack</span><span class="p">));</span>
<span class="k">if</span> <span class="p">((</span><span class="n">btl</span> <span class="o">=</span> <span class="n">backtrace</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span> <span class="n">STACK_TRACE_MAX_LEVELS</span><span class="p">))</span> <span class="o">></span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
<span class="n">strings</span> <span class="o">=</span> <span class="n">backtrace_symbols</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span> <span class="n">btl</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strings</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">btl</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="n">backtrace_symbol_write</span><span class="p">(</span><span class="n">STDERR_FILENO</span><span class="p">,</span> <span class="n">strings</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">stack</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="p">}</span>
<span class="n">free</span><span class="p">(</span><span class="n">strings</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">backtrace_symbols_fd</span><span class="p">(</span><span class="n">stack</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="n">btl</span> <span class="o">-</span> <span class="mi">2</span><span class="p">,</span> <span class="n">STDERR_FILENO</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// Reset a signal handler to the default handler.</span>
<span class="k">static</span> <span class="kt">void</span>
<span class="nf">signal_reset_default</span><span class="p">(</span><span class="kt">int</span> <span class="n">signo</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">sigaction</span> <span class="n">act</span><span class="p">;</span>
<span class="n">act</span><span class="p">.</span><span class="n">sa_handler</span> <span class="o">=</span> <span class="n">SIG_DFL</span><span class="p">;</span>
<span class="n">act</span><span class="p">.</span><span class="n">sa_flags</span> <span class="o">=</span> <span class="n">SA_NODEFER</span> <span class="o">|</span> <span class="n">SA_ONSTACK</span> <span class="o">|</span> <span class="n">SA_RESETHAND</span><span class="p">;</span>
<span class="n">sigemptyset</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">act</span><span class="p">.</span><span class="n">sa_mask</span><span class="p">));</span>
<span class="n">assert</span><span class="p">(</span><span class="n">sigaction</span><span class="p">(</span><span class="n">signo</span><span class="p">,</span> <span class="o">&</span><span class="n">act</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span>
<span class="nf">signal_crash_handler</span><span class="p">(</span><span class="kt">int</span> <span class="n">signo</span><span class="p">,</span> <span class="n">siginfo_t</span> <span class="o">*</span><span class="n">siginfo</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">data</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">stack_trace_dump</span><span class="p">();</span>
<span class="n">signal_reset_default</span><span class="p">(</span><span class="n">signo</span><span class="p">);</span>
<span class="c1">// throw signal to default handler.</span>
<span class="n">raise</span><span class="p">(</span><span class="n">signo</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span>
<span class="nf">set_signal</span><span class="p">(</span><span class="kt">int</span> <span class="n">signo</span><span class="p">,</span> <span class="n">signal_handler_t</span> <span class="n">handler</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">sigaction</span> <span class="n">act</span><span class="p">;</span>
<span class="n">act</span><span class="p">.</span><span class="n">sa_handler</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">act</span><span class="p">.</span><span class="n">sa_sigaction</span> <span class="o">=</span> <span class="n">handler</span><span class="p">;</span>
<span class="n">act</span><span class="p">.</span><span class="n">sa_flags</span> <span class="o">=</span> <span class="n">SA_SIGINFO</span><span class="p">;</span>
<span class="n">sigemptyset</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">act</span><span class="p">.</span><span class="n">sa_mask</span><span class="p">));</span>
<span class="n">assert</span><span class="p">(</span><span class="n">sigaction</span><span class="p">(</span><span class="n">signo</span><span class="p">,</span> <span class="o">&</span><span class="n">act</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">void</span>
<span class="nf">register_crash_handlers</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">set_signal</span><span class="p">(</span><span class="n">SIGBUS</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span>
<span class="n">set_signal</span><span class="p">(</span><span class="n">SIGSEGV</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span>
<span class="n">set_signal</span><span class="p">(</span><span class="n">SIGILL</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span>
<span class="n">set_signal</span><span class="p">(</span><span class="n">SIGTRAP</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span>
<span class="n">set_signal</span><span class="p">(</span><span class="n">SIGFPE</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span>
<span class="n">set_signal</span><span class="p">(</span><span class="n">SIGABRT</span><span class="p">,</span> <span class="n">signal_crash_handler</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">int</span>
<span class="nf">backtrace_symbol_format</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">len</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">prefix</span><span class="p">,</span> <span class="n">frame_record_t</span> <span class="n">fr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">n</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">p</span> <span class="o">=</span> <span class="n">buf</span><span class="p">;</span>
<span class="c1">// file name</span>
<span class="k">if</span> <span class="p">(</span><span class="n">fr</span><span class="p">.</span><span class="n">filename</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">"%s %s"</span><span class="p">,</span> <span class="n">prefix</span><span class="p">,</span> <span class="n">fr</span><span class="p">.</span><span class="n">filename</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">"%s ??"</span><span class="p">,</span> <span class="n">prefix</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">p</span> <span class="o">+=</span> <span class="n">n</span><span class="p">;</span>
<span class="n">len</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span>
<span class="c1">// function name</span>
<span class="k">if</span> <span class="p">(</span><span class="n">fr</span><span class="p">.</span><span class="n">functionname</span> <span class="o">!=</span> <span class="nb">NULL</span> <span class="o">&&</span> <span class="o">*</span><span class="n">fr</span><span class="p">.</span><span class="n">functionname</span> <span class="o">!=</span> <span class="sc">'\0'</span><span class="p">)</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">" %s()"</span><span class="p">,</span> <span class="n">fr</span><span class="p">.</span><span class="n">functionname</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">" ??"</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">p</span> <span class="o">+=</span> <span class="n">n</span><span class="p">;</span>
<span class="n">len</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span>
<span class="c1">// line</span>
<span class="k">if</span> <span class="p">(</span><span class="n">fr</span><span class="p">.</span><span class="n">line</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">":%u"</span><span class="p">,</span> <span class="n">fr</span><span class="p">.</span><span class="n">line</span><span class="p">);</span>
<span class="n">p</span> <span class="o">+=</span> <span class="n">n</span><span class="p">;</span>
<span class="n">len</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// discriminator</span>
<span class="k">if</span> <span class="p">(</span><span class="n">fr</span><span class="p">.</span><span class="n">discriminator</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">" (discriminator %u)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">fr</span><span class="p">.</span><span class="n">discriminator</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">p</span> <span class="o">+=</span> <span class="n">n</span><span class="p">;</span>
<span class="n">len</span> <span class="o">-=</span> <span class="n">n</span><span class="p">;</span>
<span class="k">return</span> <span class="n">p</span> <span class="o">-</span> <span class="n">buf</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">int</span>
<span class="nf">backtrace_symbol_write</span><span class="p">(</span><span class="kt">int</span> <span class="n">fd</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">text</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">addr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">frame_record_t</span> <span class="n">fr</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">n</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="n">BUFFER_LENGTH</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">symbol_table_find</span><span class="p">(</span><span class="o">&</span><span class="n">symtab</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span> <span class="o">&</span><span class="n">fr</span><span class="p">))</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">backtrace_symbol_format</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">BUFFER_LENGTH</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="n">fr</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">snprintf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">BUFFER_LENGTH</span><span class="p">,</span> <span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">text</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">buf</span><span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'\0'</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">write</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span> <span class="o">==</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="崩溃栈输出">崩溃栈输出</h3>
<p>在 example.c 中,有个访问空指针的代码,导致 crash,完整代码如下:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#include <stdio.h></span>
<span class="c">#include "stacktrace.h"</span>
static void bar<span class="o">()</span>
<span class="o">{</span>
// 访问空指针,导致 crash
char <span class="k">*</span>p <span class="o">=</span> 0<span class="p">;</span>
<span class="k">*</span>p <span class="o">=</span> <span class="s1">'a'</span><span class="p">;</span>
<span class="o">}</span>
static void foo<span class="o">()</span>
<span class="o">{</span>
bar<span class="o">()</span><span class="p">;</span>
<span class="o">}</span>
int main<span class="o">()</span>
<span class="o">{</span>
// 初始化 library
init_stacktrace<span class="o">()</span><span class="p">;</span>
foo<span class="o">()</span><span class="p">;</span>
<span class="k">return </span>0<span class="p">;</span>
<span class="o">}</span>
</code></pre></div></div>
<p>运行后,example 进程 crash 输出:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@localhost libstacktrace]# ./example
/home/sharp/libstacktrace/example - STACK TRACE:
/lib64/libc.so.6<span class="o">(</span>+0x36450<span class="o">)</span> <span class="o">[</span>0x7fca8db52450]
./example<span class="o">()</span> <span class="o">[</span>0x4036f0] example.c bar<span class="o">()</span>
./example<span class="o">()</span> <span class="o">[</span>0x403703] example.c foo<span class="o">()</span>
./example<span class="o">()</span> <span class="o">[</span>0x40371d] ?? main<span class="o">()</span>
/lib64/libc.so.6<span class="o">(</span>__libc_start_main+0xf5<span class="o">)</span> <span class="o">[</span>0x7fca8db3e555]
./example<span class="o">()</span> <span class="o">[</span>0x402c9a] ?? _start<span class="o">()</span>
Segmentation fault
</code></pre></div></div>
<p>如果使用 -g 编译,crash 输出更详细,包括崩溃的具体代码行:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span>root@localhost libstacktrace]# ./example
/home/sharp/libstacktrace/example - STACK TRACE:
/lib64/libc.so.6<span class="o">(</span>+0x36450<span class="o">)</span> <span class="o">[</span>0x7fa21a2e0450]
./example<span class="o">()</span> <span class="o">[</span>0x4036f0] /home/sharp/libstacktrace/example.c bar<span class="o">()</span>:8
./example<span class="o">()</span> <span class="o">[</span>0x403703] /home/sharp/libstacktrace/example.c foo<span class="o">()</span>:15
./example<span class="o">()</span> <span class="o">[</span>0x40371d] /home/sharp/libstacktrace/example.c main<span class="o">()</span>:24
/lib64/libc.so.6<span class="o">(</span>__libc_start_main+0xf5<span class="o">)</span> <span class="o">[</span>0x7fa21a2cc555]
./example<span class="o">()</span> <span class="o">[</span>0x402c9a] ?? _start<span class="o">()</span>
Segmentation fault
</code></pre></div></div>
<h3 id="参考">参考</h3>
<p><a href="https://github.com/cppcoffee/libstacktrace">https://github.com/cppcoffee/libstacktrace</a></p>
<p><a href="https://man7.org/linux/man-pages/man1/gdb.1.html">https://man7.org/linux/man-pages/man1/gdb.1.html</a></p>
<p><a href="https://man7.org/linux/man-pages/man1/addr2line.1.html">https://man7.org/linux/man-pages/man1/addr2line.1.html</a></p>
<p><a href="https://github.com/apache/trafficserver/blob/master/src/tscore/signals.cc">https://github.com/apache/trafficserver/blob/master/src/tscore/signals.cc</a></p>
<p><a href="https://sourceware.org/binutils/docs/bfd/">https://sourceware.org/binutils/docs/bfd/</a></p>Sharp LiuProcess crash print stacktrace – C LibraryLock-Free Stack Implement2021-04-07T00:00:00+00:002021-04-07T00:00:00+00:00https://cppcoffee.github.io/datastructure/2021/04/07/lock-free-stack-implement<h2 id="lock-free-stack-implement">Lock-Free Stack Implement</h2>
<h3 id="无锁链式栈">无锁链式栈</h3>
<p>栈是一种 LIFO (Last In First Out) 的数据结构,常见的实现有数组的方式,操作数组索引进行出入栈;还有另外一种是链式栈实现,操作指针进行出入栈。本文将讨论的是基于链式栈实现无锁操作。</p>
<p>链式栈是一种单向链表的结构体,每个节点有一个 next 指针,指向当前栈的下一个栈节点。</p>
<p>最基本操作:栈的初始化、入栈、出栈。</p>
<h3 id="实现">实现</h3>
<p>这里采用 Rust 实现,<strong>crossbeam-epoch</strong> crate 来解决无锁结构体的 ABA 问题和内存回收问题。</p>
<h4 id="结构体">结构体</h4>
<p>栈的结构体需要有一个指针指向当前栈的栈顶,由于只需要原子操作一个栈顶指针,实现起来将会变得简单。</p>
<p>栈结构体和栈节点的结构体定义如下:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 链式栈节点结构体</span>
<span class="k">struct</span> <span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">:</span> <span class="n">Send</span><span class="o">></span> <span class="p">{</span>
<span class="nl">next:</span> <span class="n">Atomic</span><span class="o"><</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>></span><span class="p">,</span> <span class="c1">// 下一个节点</span>
<span class="nl">value:</span> <span class="n">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span> <span class="c1">// 存储的值</span>
<span class="p">}</span>
<span class="c1">// 栈对象结构体</span>
<span class="n">pub</span> <span class="k">struct</span> <span class="n">Stack</span><span class="o"><</span><span class="n">T</span><span class="o">:</span> <span class="n">Send</span><span class="o">></span> <span class="p">{</span>
<span class="nl">top:</span> <span class="n">Atomic</span><span class="o"><</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>></span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="初始化">初始化</h4>
<p>初始化不需要原子操作,这里提供两个方法:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">impl</span><span class="o"><</span><span class="n">T</span><span class="o">:</span> <span class="n">Send</span><span class="o">></span> <span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="c1">// 普通节点</span>
<span class="n">fn</span> <span class="n">new</span><span class="p">(</span><span class="n">v</span><span class="o">:</span> <span class="n">T</span><span class="p">)</span> <span class="o">-></span> <span class="n">Self</span> <span class="p">{</span>
<span class="n">Self</span> <span class="p">{</span>
<span class="nl">next:</span> <span class="n">Atomic</span><span class="o">::</span><span class="n">null</span><span class="p">(),</span>
<span class="nl">value:</span> <span class="n">Some</span><span class="p">(</span><span class="n">v</span><span class="p">),</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// 哨兵节点</span>
<span class="n">fn</span> <span class="n">sentinel</span><span class="p">()</span> <span class="o">-></span> <span class="n">Self</span> <span class="p">{</span>
<span class="n">Self</span> <span class="p">{</span>
<span class="nl">next:</span> <span class="n">Atomic</span><span class="o">::</span><span class="n">null</span><span class="p">(),</span>
<span class="nl">value:</span> <span class="n">None</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="push压栈">push/压栈</h4>
<p>压栈操作是将栈顶指针设置为新压入的栈节点。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pub</span> <span class="n">fn</span> <span class="nf">push</span><span class="p">(</span><span class="o">&</span><span class="n">self</span><span class="p">,</span> <span class="n">v</span><span class="o">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span>
<span class="n">unsafe</span> <span class="p">{</span> <span class="n">self</span><span class="p">.</span><span class="n">try_push</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="p">}</span>
<span class="p">}</span>
<span class="n">unsafe</span> <span class="n">fn</span> <span class="nf">try_push</span><span class="p">(</span><span class="o">&</span><span class="n">self</span><span class="p">,</span> <span class="n">v</span><span class="o">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span>
<span class="n">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="o">&</span><span class="n">epoch</span><span class="o">::</span><span class="n">pin</span><span class="p">();</span>
<span class="n">let</span> <span class="n">node</span> <span class="o">=</span> <span class="n">Owned</span><span class="o">::</span><span class="n">new</span><span class="p">(</span><span class="n">Node</span><span class="o">::</span><span class="n">new</span><span class="p">(</span><span class="n">v</span><span class="p">)).</span><span class="n">into_shared</span><span class="p">(</span><span class="n">guard</span><span class="p">);</span>
<span class="n">loop</span> <span class="p">{</span>
<span class="n">let</span> <span class="n">top_ptr</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">top</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span>
<span class="c1">// 新节点的下一个节点指向栈顶</span>
<span class="p">(</span><span class="o">*</span><span class="n">node</span><span class="p">.</span><span class="n">as_raw</span><span class="p">()).</span><span class="n">next</span><span class="p">.</span><span class="n">store</span><span class="p">(</span><span class="n">top_ptr</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">);</span>
<span class="c1">// 设置 top 为新节点</span>
<span class="k">if</span> <span class="n">self</span>
<span class="p">.</span><span class="n">top</span>
<span class="p">.</span><span class="n">compare_exchange</span><span class="p">(</span><span class="n">top_ptr</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span>
<span class="p">.</span><span class="n">is_ok</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="pop出栈">pop/出栈</h4>
<p>出栈操作是将栈顶指针设置为栈顶的下一个栈节点</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pub</span> <span class="n">fn</span> <span class="nf">pop</span><span class="p">(</span><span class="o">&</span><span class="n">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="n">unsafe</span> <span class="p">{</span> <span class="n">self</span><span class="p">.</span><span class="n">try_pop</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
<span class="n">unsafe</span> <span class="n">fn</span> <span class="n">try_pop</span><span class="p">(</span><span class="o">&</span><span class="n">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="n">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="o">&</span><span class="n">epoch</span><span class="o">::</span><span class="n">pin</span><span class="p">();</span>
<span class="n">loop</span> <span class="p">{</span>
<span class="n">let</span> <span class="n">top_ptr</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">top</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span>
<span class="n">let</span> <span class="n">next_ptr</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">top_ptr</span><span class="p">.</span><span class="n">as_raw</span><span class="p">()).</span><span class="n">next</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span>
<span class="k">if</span> <span class="n">next_ptr</span><span class="p">.</span><span class="n">is_null</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">None</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// 设置栈顶指针为栈顶的下一个栈节点</span>
<span class="k">if</span> <span class="n">self</span>
<span class="p">.</span><span class="n">top</span>
<span class="p">.</span><span class="n">compare_exchange</span><span class="p">(</span><span class="n">top_ptr</span><span class="p">,</span> <span class="n">next_ptr</span><span class="p">,</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span>
<span class="p">.</span><span class="n">is_ok</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">let</span> <span class="n">top_ptr</span> <span class="o">=</span> <span class="n">top_ptr</span><span class="p">.</span><span class="n">as_raw</span><span class="p">()</span> <span class="n">as</span> <span class="o">*</span><span class="n">mut</span> <span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span><span class="o">*</span><span class="n">top_ptr</span><span class="p">).</span><span class="n">value</span><span class="p">.</span><span class="n">take</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="err">}</span>
</code></pre></div></div>
<p>完整代码链接放在文末 <strong>参考</strong> 字段</p>
<h3 id="性能测试">性能测试</h3>
<p>lib.rs Stack 与 标准库的 Mutex<LinkedList> 类型进行压测对比</LinkedList></p>
<p>笔记本电脑 CPU 参数如下:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>machdep.cpu.brand_string: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
machdep.cpu.core_count: 2
machdep.cpu.thread_count: 4
</code></pre></div></div>
<h4 id="压测描述">压测描述</h4>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>stack_loop_n(n):一个 stack 对象,循环 n 次入栈出栈
stack_thread_n_m(n, m):同一个 stack 对象, n 个线程入栈和出栈,循环 m 次数据
</code></pre></div></div>
<h4 id="结果对比">结果对比</h4>
<table>
<thead>
<tr>
<th>压测类型</th>
<th>总耗时</th>
<th>平均耗时</th>
</tr>
</thead>
<tbody>
<tr>
<td>stack_loop_n(100000)</td>
<td>56.523467ms</td>
<td>565ns</td>
</tr>
<tr>
<td>list_loop_n(100000)</td>
<td>67.573497ms</td>
<td>675ns</td>
</tr>
<tr>
<td>stack_thread_n_m(2, 100000)</td>
<td>115.590207ms</td>
<td>577ns</td>
</tr>
<tr>
<td>list_thread_n_m(2, 100000)</td>
<td>161.359683ms</td>
<td>806ns</td>
</tr>
<tr>
<td>stack_thread_n_m(4, 100000)</td>
<td>440.585874ms</td>
<td>1.101µs</td>
</tr>
<tr>
<td>list_thread_n_m(4, 100000)</td>
<td>562.439723ms</td>
<td>1.406µs</td>
</tr>
<tr>
<td>stack_thread_n_m(8, 100000)</td>
<td>1.886768172s</td>
<td>2.358µs</td>
</tr>
<tr>
<td>list_thread_n_m(8, 100000)</td>
<td>2.120945074s</td>
<td>2.651µs</td>
</tr>
</tbody>
</table>
<h3 id="参考">参考</h3>
<p><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.8674">Implementing Lock-Free Queues (1994)</a></p>
<p><a href="https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html">https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html</a></p>
<p><a href="https://lib.rs/crates/crossbeam-epoch">https://lib.rs/crates/crossbeam-epoch</a></p>
<p><a href="https://github.com/cppcoffee/stack-rs">https://github.com/cppcoffee/stack-rs</a></p>Sharp LiuLock-Free Stack ImplementLock-Free Queues Implement2021-03-25T00:00:00+00:002021-03-25T00:00:00+00:00https://cppcoffee.github.io/datastructure/2021/03/25/lock-free-queues-implements<h2 id="lock-free-queues-implement">Lock-Free Queues Implement</h2>
<p>队列是一种FIFO的抽象数据结构,这里提到的无锁队列实现是 <code class="language-plaintext highlighter-rouge">Implementing Lock-Free Queues(1994)</code> 这篇论文提出来的。</p>
<p>无锁队列操作依靠 CPU 的 CAS (Compare And Swap) 指令,CAS 对应的 Intel CPU 指令是 <code class="language-plaintext highlighter-rouge">lock cmpxchg</code>,前缀 <code class="language-plaintext highlighter-rouge">lock</code> 表明这是一条原子操作指令。</p>
<p>现在许多新语言都有自带 CAS 相关函数;底层基础库也有提供内建函数,例如 GCC 提供的内建 CAS 函数:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bool __sync_bool_compare_and_swap (type *ptr, type oldval, type newval, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval, type newval, ...)
</code></pre></div></div>
<h3 id="队列结构体">队列结构体</h3>
<p>实现无锁队列需要有两个指针:一个 head 指针,指向队列头部;一个 tail 指针,指向队列尾部。</p>
<p>节点结构体有一个 next 指针,指向下一个节点,形成链式队列。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// 队列节点结构体</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">node_s</span> <span class="n">node_t</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">node_s</span> <span class="p">{</span>
<span class="n">node_t</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">value</span><span class="p">;</span>
<span class="p">};</span>
<span class="c1">// 队列结构体</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">queue_s</span> <span class="n">queue_t</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">queue_s</span> <span class="p">{</span>
<span class="n">node_t</span> <span class="o">*</span><span class="n">head</span><span class="p">;</span>
<span class="n">node_t</span> <span class="o">*</span><span class="n">tail</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<h3 id="初始化">初始化</h3>
<p>论文提到初始化的时候生成一个 dummy 节点作为 head 和 tail 的初始值。</p>
<p>dummy 节点为了防止在空队列或只有一个节点的时候出现边界问题。</p>
<p>初始化的实现就如下:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">queue_init</span><span class="p">(</span><span class="n">queue_t</span> <span class="o">*</span><span class="n">q</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">node_t</span> <span class="o">*</span><span class="n">dummy</span> <span class="o">=</span> <span class="p">(</span><span class="n">node_t</span> <span class="o">*</span><span class="p">)</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">node_t</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="n">dummy</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">memset</span><span class="p">(</span><span class="n">dummy</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">node_t</span><span class="p">));</span>
<span class="n">q</span><span class="o">-></span><span class="n">head</span> <span class="o">=</span> <span class="n">q</span><span class="o">-></span><span class="n">tail</span> <span class="o">=</span> <span class="n">dummy</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>函数中需要判断内存分配错误。</p>
<h3 id="入队列">入队列</h3>
<p>根据论文的伪代码实现如下:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="nf">enqueue</span><span class="p">(</span><span class="n">queue_t</span> <span class="o">*</span><span class="n">q</span><span class="p">,</span> <span class="kt">void</span> <span class="o">*</span><span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">node_t</span> <span class="o">*</span><span class="n">node</span><span class="p">,</span> <span class="o">*</span><span class="n">tail</span><span class="p">,</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span>
<span class="n">node</span> <span class="o">=</span> <span class="p">(</span><span class="n">node_t</span> <span class="o">*</span><span class="p">)</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">node_t</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="n">node</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">-</span><span class="n">ENOMEM</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">node</span><span class="o">-></span><span class="n">value</span> <span class="o">=</span> <span class="n">x</span><span class="p">;</span>
<span class="n">node</span><span class="o">-></span><span class="n">next</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">tail</span> <span class="o">=</span> <span class="n">q</span><span class="o">-></span><span class="n">tail</span><span class="p">;</span>
<span class="n">next</span> <span class="o">=</span> <span class="n">tail</span><span class="o">-></span><span class="n">next</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">tail</span> <span class="o">!=</span> <span class="n">q</span><span class="o">-></span><span class="n">tail</span><span class="p">)</span> <span class="p">{</span>
<span class="k">continue</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">tail</span><span class="o">-></span><span class="n">next</span><span class="p">),</span> <span class="n">next</span><span class="p">,</span> <span class="n">node</span><span class="p">))</span> <span class="p">{</span>
<span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">q</span><span class="o">-></span><span class="n">tail</span><span class="p">),</span> <span class="n">tail</span><span class="p">,</span> <span class="n">node</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">q</span><span class="o">-></span><span class="n">tail</span><span class="p">),</span> <span class="n">tail</span><span class="p">,</span> <span class="n">next</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="出队列">出队列</h3>
<p>出队列的实现会比较简单:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="o">*</span><span class="nf">dequeue</span><span class="p">(</span><span class="n">queue_t</span> <span class="o">*</span><span class="n">q</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span> <span class="o">*</span><span class="n">v</span><span class="p">;</span>
<span class="n">node_t</span> <span class="o">*</span><span class="n">head</span><span class="p">,</span> <span class="o">*</span><span class="n">tail</span><span class="p">,</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span> <span class="p">;;</span> <span class="p">)</span> <span class="p">{</span>
<span class="n">head</span> <span class="o">=</span> <span class="n">q</span><span class="o">-></span><span class="n">head</span><span class="p">;</span>
<span class="n">tail</span> <span class="o">=</span> <span class="n">q</span><span class="o">-></span><span class="n">tail</span><span class="p">;</span>
<span class="n">next</span> <span class="o">=</span> <span class="n">head</span><span class="o">-></span><span class="n">next</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">head</span> <span class="o">!=</span> <span class="n">q</span><span class="o">-></span><span class="n">head</span><span class="p">)</span> <span class="p">{</span>
<span class="k">continue</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">head</span> <span class="o">==</span> <span class="n">tail</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">q</span><span class="o">-></span><span class="n">tail</span><span class="p">),</span> <span class="n">tail</span><span class="p">,</span> <span class="n">next</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">next</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="k">continue</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">next</span><span class="o">-></span><span class="n">value</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">__sync_bool_compare_and_swap</span><span class="p">(</span><span class="o">&</span><span class="p">(</span><span class="n">q</span><span class="o">-></span><span class="n">head</span><span class="p">),</span> <span class="n">head</span><span class="p">,</span> <span class="n">next</span><span class="p">))</span> <span class="p">{</span>
<span class="c1">// FIXME: 释放会引发并发结构经典的 ABA 和内存回收问题</span>
<span class="c1">//free(head);</span>
<span class="k">return</span> <span class="n">v</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="aba-问题">ABA 问题</h3>
<p>在多线程中,ABA 问题发生在同步期间,当一个位置被读取两次,两次读取的值都是一样的,“值是一样的”被用来表示“没有变化”。然而,另一个线程可以在两次读取之间执行,并改变值,做其他工作,然后把值改回来,从而欺骗第一个线程,使其认为“没有变化”,即使第二个线程所做的工作违反了这个假设:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>T1 从共享内存中读取 A=Load(A) 后被暂停
T2 被调度执行
T2 修改共享内存 CAS(A, B) 将 A 修改成 B,并在被系统调度前 CAS(B, A) B 再被修改成 A
T1 再次被调度执行,从而看到 A 并没有被改变过
</code></pre></div></div>
<p>这需要保证内存不能立即释放(还有线程饮用它),也不能立即被重用,这就是无锁结构 CAS 最常见的坑,实际项目中,通常配合 128 位 CAS 来避免 ABA 问题,而支持 128 位 CAS 的硬件并不通用,所以需要做指针压缩(TaggedPointer)</p>
<h4 id="tagged-pointer">Tagged Pointer</h4>
<p>在 x86_64 机器上,指针高位地址用于在内核层表示,在应用层空间中就能够使用高位地址来作为 tag。</p>
<p>如下是 64 位长度的地址:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0000 0000 0000 0000
</code></pre></div></div>
<p>根据 linux mm 文档中描述,应用程序虚拟内存范围是 0000000000000000 - 00007fffffffffff</p>
<p>也就是说高 16 位是可以用来作为 tag.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0000 FFFF FFFF FFFF
^^^^
Free Data!
</code></pre></div></div>
<h3 id="内存回收问题">内存回收问题</h3>
<p>在多线程操作中,内存不能直接释放,由于有其他线程在访问它,这样会造成 <strong>释放后访问</strong> 的问题:</p>
<blockquote>
<p>T1 执行到 next = tail->next; 时被调度走
T2 执行 dequeue,将 tail 指向的内存释放
T1 再次被调度到,此时访问 tail->next 将造成 内存释放后再访问的问题</p>
</blockquote>
<p>这种情况需要保证内存访问的安全性,可以使用 引用计数、hazard pointers 和 epoch based reclamation 等内存延迟回收算法。</p>
<h3 id="rust-实现">Rust 实现</h3>
<p>最后附上一版使用 Rust 实现无锁队列的完整代码,这里使用 <strong>crossbeam_epoch</strong> crate 来解决 ABA 问题和内存回收问题。</p>
<p>lib.rs</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">sync</span><span class="p">::</span><span class="nn">atomic</span><span class="p">::</span><span class="nn">Ordering</span><span class="p">::{</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">Release</span><span class="p">};</span>
<span class="k">use</span> <span class="nn">crossbeam_epoch</span><span class="p">::{</span><span class="k">self</span> <span class="k">as</span> <span class="n">epoch</span><span class="p">,</span> <span class="n">Atomic</span><span class="p">,</span> <span class="nb">Owned</span><span class="p">,</span> <span class="n">Shared</span><span class="p">};</span>
<span class="k">unsafe</span> <span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">></span> <span class="n">Sync</span> <span class="k">for</span> <span class="n">Queue</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{}</span>
<span class="k">struct</span> <span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">></span> <span class="p">{</span>
<span class="n">next</span><span class="p">:</span> <span class="n">Atomic</span><span class="o"><</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>></span><span class="p">,</span>
<span class="n">data</span><span class="p">:</span> <span class="nb">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">></span> <span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">fn</span> <span class="nf">new</span><span class="p">(</span><span class="n">v</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="k">-></span> <span class="n">Self</span> <span class="p">{</span>
<span class="n">Self</span> <span class="p">{</span>
<span class="n">next</span><span class="p">:</span> <span class="nn">Default</span><span class="p">::</span><span class="nf">default</span><span class="p">(),</span>
<span class="n">data</span><span class="p">:</span> <span class="nf">Some</span><span class="p">(</span><span class="n">v</span><span class="p">),</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">sentinel</span><span class="p">()</span> <span class="k">-></span> <span class="n">Self</span> <span class="p">{</span>
<span class="n">Self</span> <span class="p">{</span>
<span class="n">next</span><span class="p">:</span> <span class="nn">Atomic</span><span class="p">::</span><span class="nf">null</span><span class="p">(),</span>
<span class="n">data</span><span class="p">:</span> <span class="nb">None</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">pub</span> <span class="k">struct</span> <span class="n">Queue</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">></span> <span class="p">{</span>
<span class="n">head</span><span class="p">:</span> <span class="n">Atomic</span><span class="o"><</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>></span><span class="p">,</span>
<span class="n">tail</span><span class="p">:</span> <span class="n">Atomic</span><span class="o"><</span><span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">>></span><span class="p">,</span>
<span class="p">}</span>
<span class="k">impl</span><span class="o"><</span><span class="n">T</span><span class="p">:</span> <span class="nb">Send</span><span class="o">></span> <span class="n">Queue</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">new</span><span class="p">()</span> <span class="k">-></span> <span class="n">Self</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">q</span> <span class="o">=</span> <span class="n">Queue</span> <span class="p">{</span>
<span class="n">head</span><span class="p">:</span> <span class="nn">Atomic</span><span class="p">::</span><span class="nf">null</span><span class="p">(),</span>
<span class="n">tail</span><span class="p">:</span> <span class="nn">Atomic</span><span class="p">::</span><span class="nf">null</span><span class="p">(),</span>
<span class="p">};</span>
<span class="k">let</span> <span class="n">sentinel</span> <span class="o">=</span> <span class="nn">Owned</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Node</span><span class="p">::</span><span class="nf">sentinel</span><span class="p">());</span>
<span class="k">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&</span><span class="nn">epoch</span><span class="p">::</span><span class="nf">unprotected</span><span class="p">()</span> <span class="p">};</span>
<span class="k">let</span> <span class="n">sentinel</span> <span class="o">=</span> <span class="n">sentinel</span><span class="nf">.into_shared</span><span class="p">(</span><span class="n">guard</span><span class="p">);</span>
<span class="n">q</span><span class="py">.head</span><span class="nf">.store</span><span class="p">(</span><span class="n">sentinel</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">);</span>
<span class="n">q</span><span class="py">.tail</span><span class="nf">.store</span><span class="p">(</span><span class="n">sentinel</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">);</span>
<span class="n">q</span>
<span class="p">}</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">enq</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">,</span> <span class="n">v</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span> <span class="k">self</span><span class="nf">.try_enq</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">try_enq</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">,</span> <span class="n">v</span><span class="p">:</span> <span class="n">T</span><span class="p">)</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="o">&</span><span class="nn">epoch</span><span class="p">::</span><span class="nf">pin</span><span class="p">();</span>
<span class="k">let</span> <span class="n">node</span> <span class="o">=</span> <span class="nn">Owned</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">Node</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="n">v</span><span class="p">))</span><span class="nf">.into_shared</span><span class="p">(</span><span class="n">guard</span><span class="p">);</span>
<span class="k">loop</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="k">self</span><span class="py">.tail</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span>
<span class="py">.next</span>
<span class="nf">.compare_exchange</span><span class="p">(</span><span class="nn">Shared</span><span class="p">::</span><span class="nf">null</span><span class="p">(),</span> <span class="n">node</span><span class="p">,</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span>
<span class="nf">.is_ok</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">let</span> <span class="mi">_</span> <span class="o">=</span> <span class="k">self</span><span class="py">.tail</span><span class="nf">.compare_exchange</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="n">Release</span><span class="p">,</span> <span class="n">Relaxed</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="k">let</span> <span class="mi">_</span> <span class="o">=</span> <span class="k">self</span><span class="py">.tail</span><span class="nf">.compare_exchange</span><span class="p">(</span>
<span class="n">p</span><span class="p">,</span>
<span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span><span class="py">.next</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">),</span>
<span class="n">Release</span><span class="p">,</span>
<span class="n">Relaxed</span><span class="p">,</span>
<span class="n">guard</span><span class="p">,</span>
<span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">pub</span> <span class="k">fn</span> <span class="nf">deq</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">unsafe</span> <span class="p">{</span> <span class="k">self</span><span class="nf">.try_deq</span><span class="p">()</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">unsafe</span> <span class="k">fn</span> <span class="nf">try_deq</span><span class="p">(</span><span class="o">&</span><span class="k">self</span><span class="p">)</span> <span class="k">-></span> <span class="nb">Option</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="p">{</span>
<span class="k">let</span> <span class="n">guard</span> <span class="o">=</span> <span class="o">&</span><span class="nn">epoch</span><span class="p">::</span><span class="nf">pin</span><span class="p">();</span>
<span class="k">loop</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">p</span> <span class="o">=</span> <span class="k">self</span><span class="py">.head</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span><span class="py">.next</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span><span class="nf">.is_null</span><span class="p">()</span> <span class="p">{</span>
<span class="k">return</span> <span class="nb">None</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="k">self</span>
<span class="py">.head</span>
<span class="nf">.compare_exchange</span><span class="p">(</span>
<span class="n">p</span><span class="p">,</span>
<span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span><span class="py">.next</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">),</span>
<span class="n">Release</span><span class="p">,</span>
<span class="n">Relaxed</span><span class="p">,</span>
<span class="n">guard</span><span class="p">,</span>
<span class="p">)</span>
<span class="nf">.is_ok</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">let</span> <span class="n">next</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="nf">.as_raw</span><span class="p">())</span><span class="py">.next</span><span class="nf">.load</span><span class="p">(</span><span class="n">Acquire</span><span class="p">,</span> <span class="n">guard</span><span class="p">)</span><span class="nf">.as_raw</span><span class="p">()</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="n">Node</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span><span class="o">*</span><span class="n">next</span><span class="p">)</span><span class="py">.data</span><span class="nf">.take</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="benchmark">benchmark</h3>
<p>lib.rs Queue 与 标准库的 Mutex<LinkedList> 类型进行压测对比</p>
<h4 id="压测代码">压测代码</h4>
<p>Queue 压测代码 与 Mutex<LinkedList> 实现大同小异,不同的只是 enq 操作对应 push_front,deq 操作对应 pop_back。</p>
<p>这里贴两个 Queue 的压测相关代码,更多详细内容见文末的 queue-rs 仓库链接。</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// n 次压测操作</span>
<span class="k">fn</span> <span class="nf">queue_loop_n</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-></span> <span class="n">Duration</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">q</span> <span class="o">=</span> <span class="nn">Queue</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
<span class="k">let</span> <span class="n">earler</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span>
<span class="k">for</span> <span class="n">i</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">n</span> <span class="p">{</span>
<span class="n">q</span><span class="nf">.enq</span><span class="p">(</span><span class="n">i</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">for</span> <span class="mi">_</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">n</span> <span class="p">{</span>
<span class="n">q</span><span class="nf">.deq</span><span class="p">();</span>
<span class="p">}</span>
<span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">()</span><span class="nf">.duration_since</span><span class="p">(</span><span class="n">earler</span><span class="p">)</span>
<span class="p">}</span>
<span class="c">// n 线程 + m 次操作</span>
<span class="k">fn</span> <span class="nf">queue_thread_n_m</span><span class="p">(</span><span class="n">n</span><span class="p">:</span> <span class="nb">u32</span><span class="p">,</span> <span class="n">m</span><span class="p">:</span> <span class="nb">u32</span><span class="p">)</span> <span class="k">-></span> <span class="n">Duration</span> <span class="p">{</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">handles</span> <span class="o">=</span> <span class="nn">Vec</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
<span class="k">let</span> <span class="n">elapsed</span> <span class="o">=</span> <span class="nn">Arc</span><span class="p">::</span><span class="nf">new</span><span class="p">(</span><span class="nn">AtomicU64</span><span class="p">::</span><span class="nf">default</span><span class="p">());</span>
<span class="k">for</span> <span class="mi">_</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">n</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">q</span> <span class="o">=</span> <span class="nn">Queue</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
<span class="k">let</span> <span class="n">elapsed_clone</span> <span class="o">=</span> <span class="n">elapsed</span><span class="nf">.clone</span><span class="p">();</span>
<span class="n">handles</span><span class="nf">.push</span><span class="p">(</span><span class="nn">thread</span><span class="p">::</span><span class="nf">spawn</span><span class="p">(</span><span class="k">move</span> <span class="p">||</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">start</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">();</span>
<span class="k">for</span> <span class="n">i</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">m</span> <span class="p">{</span>
<span class="n">q</span><span class="nf">.enq</span><span class="p">(</span><span class="n">i</span> <span class="k">as</span> <span class="o">*</span><span class="k">mut</span> <span class="nb">u8</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">for</span> <span class="mi">_</span> <span class="n">in</span> <span class="mi">0</span><span class="o">..</span><span class="n">m</span> <span class="p">{</span>
<span class="n">q</span><span class="nf">.deq</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">let</span> <span class="n">nanos</span> <span class="o">=</span> <span class="nn">Instant</span><span class="p">::</span><span class="nf">now</span><span class="p">()</span><span class="nf">.duration_since</span><span class="p">(</span><span class="n">start</span><span class="p">)</span><span class="nf">.as_nanos</span><span class="p">();</span>
<span class="n">elapsed_clone</span><span class="nf">.fetch_add</span><span class="p">(</span><span class="n">nanos</span> <span class="k">as</span> <span class="nb">u64</span><span class="p">,</span> <span class="nn">Ordering</span><span class="p">::</span><span class="n">SeqCst</span><span class="p">);</span>
<span class="p">}));</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">handle</span> <span class="n">in</span> <span class="n">handles</span> <span class="p">{</span>
<span class="n">handle</span><span class="nf">.join</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="p">}</span>
<span class="nn">Duration</span><span class="p">::</span><span class="nf">from_nanos</span><span class="p">(</span><span class="nn">Arc</span><span class="p">::</span><span class="nf">try_unwrap</span><span class="p">(</span><span class="n">elapsed</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span><span class="nf">.into_inner</span><span class="p">())</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="结果对比">结果对比</h4>
<p>笔记本电脑 CPU 参数如下:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>machdep.cpu.brand_string: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
machdep.cpu.core_count: 2
machdep.cpu.thread_count: 4
</code></pre></div></div>
<p><strong>备注</strong>: 领先的数据加黑标注</p>
<p>输出结果:</p>
<table>
<thead>
<tr>
<th>压测类型</th>
<th>总耗时</th>
<th>平均耗时</th>
</tr>
</thead>
<tbody>
<tr>
<td>queue_loop_n(100000)</td>
<td><strong>17.843828ms</strong></td>
<td><strong>178ns</strong></td>
</tr>
<tr>
<td>list_loop_n(100000)</td>
<td>23.066353ms</td>
<td>230ns</td>
</tr>
<tr>
<td>queue_thread_n_m(2, 100000)</td>
<td><strong>64.018836ms</strong></td>
<td><strong>320ns</strong></td>
</tr>
<tr>
<td>list_thread_n_m(2, 100000)</td>
<td>74.660454ms</td>
<td>373ns</td>
</tr>
<tr>
<td>queue_thread_n_m(4, 100000)</td>
<td><strong>149.736868ms</strong></td>
<td><strong>374ns</strong></td>
</tr>
<tr>
<td>list_thread_n_m(4, 100000)</td>
<td>189.6352ms</td>
<td>474ns</td>
</tr>
<tr>
<td>queue_thread_n_m(8, 100000)</td>
<td><strong>544.476377ms</strong></td>
<td><strong>680ns</strong></td>
</tr>
<tr>
<td>list_thread_n_m(8, 100000)</td>
<td>980.688619ms</td>
<td>1225ns</td>
</tr>
</tbody>
</table>
<h3 id="参考">参考</h3>
<p><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.8674">Implementing Lock-Free Queues (1994)</a></p>
<p><a href="https://en.wikipedia.org/wiki/ABA_problem">https://en.wikipedia.org/wiki/ABA_problem</a></p>
<p><a href="https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html">https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html</a></p>
<p><a href="https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-579.pdf">Keir Fraser’s epoch-based reclamation</a></p>
<p><a href="https://lib.rs/crates/crossbeam-epoch">crossbeam-epoch crate</a></p>
<p><a href="https://en.wikipedia.org/wiki/Tagged_pointer">https://en.wikipedia.org/wiki/Tagged_pointer</a></p>
<p><a href="https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt">https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt</a></p>
<p><a href="https://github.com/cppcoffee/queue-rs">https://github.com/cppcoffee/queue-rs</a></p>Sharp LiuLock-Free Queues ImplementThread Condition Signal 的两个陷阱2021-02-27T00:00:00+00:002021-02-27T00:00:00+00:00https://cppcoffee.github.io/system/program/2021/02/27/Thread-Condition-Signal-%E7%9A%84%E4%B8%A4%E4%B8%AA%E9%99%B7%E9%98%B1<h2 id="thread-condition-signal">Thread Condition Signal</h2>
<p>当接触 线程条件信号 时,通常是实现生产者和消费者的场景。翻看 man 手册后,很疑惑为什么 cond 需要依赖外部的 mutex?</p>
<p>在 man 手册中没有 example 可以参考,很容易不假思索的写成下面这样子有陷阱的代码:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// producer</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
<span class="n">pthread_cond_signal</span><span class="p">(</span><span class="o">&</span><span class="n">cond</span><span class="p">);</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
<span class="c1">// consumer</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
<span class="n">pthread_cond_wait</span><span class="p">(</span><span class="o">&</span><span class="n">cond</span><span class="p">,</span> <span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
</code></pre></div></div>
<p>这样子写会步入 <em>信号丢失的陷阱</em> 中。</p>
<h3 id="信号丢失的陷阱">信号丢失的陷阱</h3>
<p>当 signal 发生于 wait 之前,信号就会丢失</p>
<pre><code class="language-flow"> + +----------+ +----------+
| | producer | | consumer |
| +----------+ +----------+
|
| +----------+
| | lock |
| +----------+
| +----------+
| | signal |
| +----------+
Time | +----------+
| | unlock |
| +----------+
|-------------------------------
| +----------+
| | lock |
| +----------+
| +----------+
| | wait |
| +----------+
| +----------+
| | unlock |
v +----------+
</code></pre>
<p>这里是一个生产者,一个消费者的场景。
producer 优先执行,导致了信号丢失,consumer 一直在 wait 中。</p>
<h3 id="虚假唤醒的陷阱">虚假唤醒的陷阱</h3>
<p>man <code class="language-plaintext highlighter-rouge">pthread_cond_broadcast</code> 文档中,<code class="language-plaintext highlighter-rouge">Multiple Awakenings by Condition Signal</code> 段落提到的 <code class="language-plaintext highlighter-rouge">spurious wakeup</code> 问题。</p>
<p>考虑到一个生产者,多个消费者的场景:</p>
<p>一个线程尝试等待条件变量,另一个线程并发执行到了 <code class="language-plaintext highlighter-rouge">pthread_cond_signal</code>,第三个线程已经在等待中。</p>
<p>如下伪代码实现与执行步骤(末尾数字):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pthread_cond_wait</span><span class="p">(</span><span class="n">mutex</span><span class="p">,</span> <span class="n">cond</span><span class="p">)</span><span class="o">:</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">cond</span><span class="o">-></span><span class="n">value</span><span class="p">;</span> <span class="cm">/* 1 */</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 2 */</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="n">cond</span><span class="o">-></span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 10 */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">value</span> <span class="o">==</span> <span class="n">cond</span><span class="o">-></span><span class="n">value</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* 11 */</span>
<span class="n">me</span><span class="o">-></span><span class="n">next_cond</span> <span class="o">=</span> <span class="n">cond</span><span class="o">-></span><span class="n">waiter</span><span class="p">;</span>
<span class="n">cond</span><span class="o">-></span><span class="n">waiter</span> <span class="o">=</span> <span class="n">me</span><span class="p">;</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="n">cond</span><span class="o">-></span><span class="n">mutex</span><span class="p">);</span>
<span class="n">unable_to_run</span><span class="p">(</span><span class="n">me</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span>
<span class="nf">pthread_mutex_unlock</span><span class="p">(</span><span class="n">cond</span><span class="o">-></span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 12 */</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 13 */</span>
<span class="n">pthread_cond_signal</span><span class="p">(</span><span class="n">cond</span><span class="p">)</span><span class="o">:</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="n">cond</span><span class="o">-></span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 3 */</span>
<span class="n">cond</span><span class="o">-></span><span class="n">value</span><span class="o">++</span><span class="p">;</span> <span class="cm">/* 4 */</span>
<span class="k">if</span> <span class="p">(</span><span class="n">cond</span><span class="o">-></span><span class="n">waiter</span><span class="p">)</span> <span class="p">{</span> <span class="cm">/* 5 */</span>
<span class="n">sleeper</span> <span class="o">=</span> <span class="n">cond</span><span class="o">-></span><span class="n">waiter</span><span class="p">;</span> <span class="cm">/* 6 */</span>
<span class="n">cond</span><span class="o">-></span><span class="n">waiter</span> <span class="o">=</span> <span class="n">sleeper</span><span class="o">-></span><span class="n">next_cond</span><span class="p">;</span> <span class="cm">/* 7 */</span>
<span class="n">able_to_run</span><span class="p">(</span><span class="n">sleeper</span><span class="p">);</span> <span class="cm">/* 8 */</span>
<span class="p">}</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="n">cond</span><span class="o">-></span><span class="n">mutex</span><span class="p">);</span> <span class="cm">/* 9 */</span>
</code></pre></div></div>
<p>调用一次 <code class="language-plaintext highlighter-rouge">pthread_cond_signal</code>,导致了多个 consumer 线程在 <code class="language-plaintext highlighter-rouge">pthread_cond_wait</code> 或者 <code class="language-plaintext highlighter-rouge">pthread_cond_timedwait</code> 返回,这现象称为 <code class="language-plaintext highlighter-rouge">spurious wakeup</code>。</p>
<h3 id="解决方法">解决方法</h3>
<p>当实现 Thread Condition Signal 逻辑时,外部的 mutex 锁是为了保证正确性,加入一个条件变量以保证唤醒信号不会丢失。</p>
<p>如下正确的写法:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// producer</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
<span class="n">condition_</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="n">pthread_cond_signal</span><span class="p">(</span><span class="o">&</span><span class="n">cond</span><span class="p">,</span> <span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
<span class="c1">// consumer</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">condition_</span><span class="p">)</span> <span class="p">{</span>
<span class="n">pthread_cond_wait</span><span class="p">(</span><span class="o">&</span><span class="n">cond</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">condition_</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&</span><span class="n">mutex</span><span class="p">);</span>
</code></pre></div></div>
<p>如果是多个生产者多个消费者的情况,可以将条件改成 count 计数器。</p>
<h3 id="参考">参考</h3>
<p><a href="https://man7.org/linux/man-pages/man3/pthread_cond_broadcast.3p.html">https://man7.org/linux/man-pages/man3/pthread_cond_broadcast.3p.html</a></p>
<p><a href="https://code.woboq.org/userspace/glibc/nptl/pthread_cond_wait.c.html">https://code.woboq.org/userspace/glibc/nptl/pthread_cond_wait.c.html</a></p>Sharp LiuThread Condition SignalLinux 文件碎片 top 工具 – Rust实现2021-02-11T00:00:00+00:002021-02-11T00:00:00+00:00https://cppcoffee.github.io/filesystem/rust/2021/02/11/Linux%E6%96%87%E4%BB%B6%E7%A2%8E%E7%89%87top%E5%B7%A5%E5%85%B7--Rust%E5%AE%9E%E7%8E%B0<p>Linux 文件碎片 top 工具 – Rust实现</p>
<h2 id="fragtop-rs">fragtop-rs</h2>
<p>上一篇提到 Linux 下的 <code class="language-plaintext highlighter-rouge">filefrag</code> 工具的实现方式,可以用它来查看文件碎片,它没有提供一个扫描目录进行 top 碎片数量排序的功能,既然这样,那就动手做一个玩。</p>
<p>工具项目名为 <code class="language-plaintext highlighter-rouge">fragtop-rs</code>,寓意是跟 top 工具一样。<code class="language-plaintext highlighter-rouge">fragtop-rs</code> 能够根据 glob 匹配的文件进行碎片统计,并进行 top 排序输出。</p>
<p><code class="language-plaintext highlighter-rouge">fragtop-rs</code> 采用 Rust 实现,Rust 有一个 <code class="language-plaintext highlighter-rouge">fiemap</code> 的 crate 可以使用 <a href="https://docs.rs/fiemap/">https://docs.rs/fiemap/</a>,接口干净整洁,可以拿过来使用。</p>
<p>功能包括:需要指定 glob pattern,遍历匹配的文件进行碎片查询,最后给出 top-n 的列表。</p>
<h3 id="cargotoml">Cargo.toml</h3>
<p>首先创建项目,并进入项目目录中</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span> <span class="n">cargo</span> <span class="n">new</span> <span class="o">--</span><span class="n">bin</span> <span class="n">fragtop</span><span class="o">-</span><span class="n">rs</span>
<span class="n">Created</span> <span class="nf">binary</span> <span class="p">(</span><span class="n">application</span><span class="p">)</span> <span class="err">`</span><span class="n">fragtop</span><span class="o">-</span><span class="n">rs</span><span class="err">`</span> <span class="n">package</span>
<span class="o">%</span> <span class="n">cd</span> <span class="n">fragtop</span><span class="o">-</span><span class="n">rs</span>
</code></pre></div></div>
<p>需要的 crate 列表如下:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">clap</code>: 用于命令行操作</li>
<li><code class="language-plaintext highlighter-rouge">glob</code>: 匹配文件路径</li>
<li><code class="language-plaintext highlighter-rouge">anyhow</code>: 错误处理</li>
<li><code class="language-plaintext highlighter-rouge">fiemap</code>: Linux 文件碎片查找</li>
</ul>
<p>逐个添加依赖的 crate</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span> <span class="n">cargo</span> <span class="n">add</span> <span class="n">clap</span>
<span class="n">Updating</span> <span class="nv">'https</span><span class="p">:</span><span class="c">//github.com/rust-lang/crates.io-index' index</span>
<span class="n">Adding</span> <span class="n">clap</span> <span class="n">v2</span><span class="na">.33.3</span> <span class="n">to</span> <span class="n">dependencies</span>
<span class="o">%</span> <span class="n">cargo</span> <span class="n">add</span> <span class="n">glob</span>
<span class="o">%</span> <span class="n">cargo</span> <span class="n">add</span> <span class="n">anyhow</span>
<span class="o">%</span> <span class="n">cargo</span> <span class="n">add</span> <span class="n">fiemap</span>
</code></pre></div></div>
<p>添加完 crate 后,<code class="language-plaintext highlighter-rouge">Cargo.toml</code> 的 <code class="language-plaintext highlighter-rouge">dependencies</code> 字段如下所示:</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[dependencies]</span>
<span class="py">clap</span> <span class="p">=</span> <span class="s">"2.33"</span>
<span class="py">glob</span> <span class="p">=</span> <span class="s">"0.3"</span>
<span class="py">anyhow</span> <span class="p">=</span> <span class="s">"1.0"</span>
<span class="py">fiemap</span> <span class="p">=</span> <span class="s">"0.1"</span>
</code></pre></div></div>
<h3 id="逻辑实现">逻辑实现</h3>
<p>增加 clap 命令行处理,该工具需要 <code class="language-plaintext highlighter-rouge">-p</code> 参数来指定 glob pattern 路径和 <code class="language-plaintext highlighter-rouge">-n</code> 来指定 top-n 数量。</p>
<p>其中 <code class="language-plaintext highlighter-rouge">-p</code> 是要求必须指定;<code class="language-plaintext highlighter-rouge">-n</code> 默认值为 20,如果文件过多,就只显示 top 20 的文件</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">#[macro_use]</span>
<span class="k">extern</span> <span class="n">crate</span> <span class="n">clap</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">clap</span><span class="p">::</span><span class="n">Arg</span><span class="p">;</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-></span> <span class="nn">anyhow</span><span class="p">::</span><span class="n">Result</span><span class="o"><</span><span class="p">()</span><span class="o">></span> <span class="p">{</span>
<span class="k">let</span> <span class="n">matches</span> <span class="o">=</span> <span class="nn">clap</span><span class="p">::</span><span class="nd">app_from_crate!</span><span class="p">()</span>
<span class="nf">.arg</span><span class="p">(</span>
<span class="nn">Arg</span><span class="p">::</span><span class="nf">with_name</span><span class="p">(</span><span class="s">"path"</span><span class="p">)</span>
<span class="nf">.short</span><span class="p">(</span><span class="s">"p"</span><span class="p">)</span>
<span class="nf">.help</span><span class="p">(</span><span class="s">"Set the glob file path"</span><span class="p">)</span>
<span class="nf">.required</span><span class="p">(</span><span class="k">true</span><span class="p">)</span>
<span class="nf">.takes_value</span><span class="p">(</span><span class="k">true</span><span class="p">),</span>
<span class="p">)</span>
<span class="nf">.arg</span><span class="p">(</span>
<span class="nn">Arg</span><span class="p">::</span><span class="nf">with_name</span><span class="p">(</span><span class="s">"top-n"</span><span class="p">)</span>
<span class="nf">.short</span><span class="p">(</span><span class="s">"n"</span><span class="p">)</span>
<span class="nf">.help</span><span class="p">(</span><span class="s">"Top fragment file"</span><span class="p">)</span>
<span class="nf">.default_value</span><span class="p">(</span><span class="s">"20"</span><span class="p">)</span>
<span class="nf">.takes_value</span><span class="p">(</span><span class="k">true</span><span class="p">),</span>
<span class="p">)</span>
<span class="nf">.get_matches</span><span class="p">();</span>
<span class="k">let</span> <span class="n">path</span> <span class="o">=</span> <span class="n">matches</span><span class="nf">.value_of</span><span class="p">(</span><span class="s">"path"</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">();</span>
<span class="k">let</span> <span class="n">top_n</span> <span class="o">=</span> <span class="n">matches</span><span class="nf">.value_of</span><span class="p">(</span><span class="s">"top-n"</span><span class="p">)</span><span class="nf">.unwrap</span><span class="p">()</span><span class="py">.parse</span><span class="p">::</span><span class="o"><</span><span class="nb">usize</span><span class="o">></span><span class="p">()</span><span class="o">?</span><span class="p">;</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"path: {}, top: {}"</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">top_n</span><span class="p">);</span>
<span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>
<p>运行输出:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% cargo run -- -p /tmp/
path: /tmp/, top: 20
</code></pre></div></div>
<p>输出结果正常,接下来要添加遍历匹配 glob pattern 的文件,并记录对应文件的碎片数量。</p>
<p>使用 <code class="language-plaintext highlighter-rouge">BTreeSet<Tuple(fragments, path)></code> 来记录文件和它对应的碎片数量,BTreeSet 的好处是可以按照从小到大进行遍历,如果 <code class="language-plaintext highlighter-rouge">rev()</code> 则可以从大到小进行遍历。</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">...</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">collections</span><span class="p">::</span><span class="n">BTreeSet</span><span class="p">;</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-></span> <span class="nn">anyhow</span><span class="p">::</span><span class="n">Result</span><span class="o"><</span><span class="p">()</span><span class="o">></span> <span class="p">{</span>
<span class="o">...</span>
<span class="k">let</span> <span class="k">mut</span> <span class="n">records</span> <span class="o">=</span> <span class="nn">BTreeSet</span><span class="p">::</span><span class="nf">new</span><span class="p">();</span>
<span class="k">for</span> <span class="n">entry</span> <span class="n">in</span> <span class="nn">glob</span><span class="p">::</span><span class="nf">glob</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="o">?</span> <span class="p">{</span>
<span class="k">let</span> <span class="n">entry</span> <span class="o">=</span> <span class="n">entry</span><span class="o">?</span><span class="p">;</span>
<span class="c">// 输出正在处理的文件</span>
<span class="c">// \r 开头则使用同一行进行替换输出</span>
<span class="nd">print!</span><span class="p">(</span><span class="s">"</span><span class="se">\r</span><span class="s">In progress: {}"</span><span class="p">,</span> <span class="n">entry</span><span class="nf">.display</span><span class="p">());</span>
<span class="c">// 获取文件碎片,并保存文件碎片数和文件路径</span>
<span class="k">let</span> <span class="n">count</span> <span class="o">=</span> <span class="nn">fiemap</span><span class="p">::</span><span class="nf">fiemap</span><span class="p">(</span><span class="o">&</span><span class="n">entry</span><span class="p">)</span><span class="o">?</span><span class="nf">.count</span><span class="p">();</span>
<span class="n">records</span><span class="nf">.insert</span><span class="p">((</span><span class="n">count</span><span class="p">,</span> <span class="n">entry</span><span class="p">));</span>
<span class="p">}</span>
<span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>
<p>已经有了文件路径和它对应的碎片数量,最后就是对这些信息的总结输出(遍历)。</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="k">-></span> <span class="nn">anyhow</span><span class="p">::</span><span class="n">Result</span><span class="o"><</span><span class="p">()</span><span class="o">></span> <span class="p">{</span>
<span class="o">...</span>
<span class="k">if</span> <span class="n">records</span><span class="nf">.len</span><span class="p">()</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span>
<span class="k">return</span> <span class="nf">Err</span><span class="p">(</span><span class="nd">anyhow!</span><span class="p">(</span><span class="s">"no files are scanned."</span><span class="p">));</span>
<span class="p">}</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">Scan total file: {}"</span><span class="p">,</span> <span class="n">records</span><span class="nf">.len</span><span class="p">());</span>
<span class="c">// rev() 倒序(从大到小),只取 take(top_n) 项</span>
<span class="k">for</span> <span class="p">(</span><span class="n">count</span><span class="p">,</span> <span class="n">entry</span><span class="p">)</span> <span class="n">in</span> <span class="n">records</span><span class="nf">.iter</span><span class="p">()</span><span class="nf">.rev</span><span class="p">()</span><span class="nf">.take</span><span class="p">(</span><span class="n">top_n</span><span class="p">)</span> <span class="p">{</span>
<span class="nd">println!</span><span class="p">(</span><span class="s">"{:<48} {}"</span><span class="p">,</span> <span class="n">entry</span><span class="nf">.display</span><span class="p">(),</span> <span class="n">count</span><span class="p">);</span>
<span class="p">}</span>
<span class="nf">Ok</span><span class="p">(())</span>
<span class="p">}</span>
</code></pre></div></div>
<p>以上,<code class="language-plaintext highlighter-rouge">fragtop-rs</code> 的代码完成了。</p>
<p>用它来查看 <code class="language-plaintext highlighter-rouge">/var/log/</code> 目录下面的所有日志文件,并根据碎片数量输出</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>./target/debug/fragtop-rs <span class="nt">-p</span> <span class="s1">'/var/log/**/*'</span>
In progress: /var/log/yum.log-20210101
Scan total file: 657
/var/log/access.log 266
/var/log/wtmp 39
/var/log/messages-20210131 22
/var/log/messages-20210207 21
/var/log/messages-20210117 20
/var/log/audit/audit.log.2 20
/var/log/audit/audit.log.4 19
/var/log/messages-20210124 18
/var/log/audit/audit.log.1 18
/var/log/nginx/access.log 17
/var/log/audit/audit.log.3 17
/var/log/cron-20210207 14
/var/log/cron-20210131 14
/var/log/cron-20210124 14
/var/log/cron-20210117 14
/var/log/messages 13
/var/log/audit/audit.log 13
/var/log/yum.log-20200511 10
/var/log/tuned/tuned.log 10
/var/log/grubby 8
</code></pre></div></div>
<h3 id="参考">参考</h3>
<p><a href="https://docs.rs/fiemap/0.1.1/fiemap/">https://docs.rs/fiemap/0.1.1/fiemap/</a></p>
<p><a href="https://github.com/cppcoffee/fragtop-rs">https://github.com/cppcoffee/fragtop-rs</a></p>Sharp LiuLinux 文件碎片 top 工具 – Rust实现