<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://pchaigno.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://pchaigno.github.io/" rel="alternate" type="text/html" /><updated>2026-05-13T10:45:36+00:00</updated><id>https://pchaigno.github.io/feed.xml</id><title type="html">pchaigno</title><subtitle>Paul Chaignon&apos;s blog</subtitle><entry><title type="html">BPF Selftests: Troubleshooting vmtest.sh</title><link href="https://pchaigno.github.io/ebpf/2026/05/05/bpf-selftests-troubleshooting-vmtest.sh.html" rel="alternate" type="text/html" title="BPF Selftests: Troubleshooting vmtest.sh" /><published>2026-05-05T08:26:10+00:00</published><updated>2026-05-05T08:26:10+00:00</updated><id>https://pchaigno.github.io/ebpf/2026/05/05/bpf-selftests-troubleshooting-vmtest.sh</id><content type="html" xml:base="https://pchaigno.github.io/ebpf/2026/05/05/bpf-selftests-troubleshooting-vmtest.sh.html"><![CDATA[<p>I recently had to reinstall my setup to contribute to Linux on a new laptop, and as usual, ran into a few issues with the BPF selftests.
This short blog post is my attempt to document these issues—and a couple others I had ran into previously—both for my future self and for anyone else interested.
If you run into something not documented here, please give me a shout! I’ll try to keep this up-to-date.</p>

<p>Running the BPF selftests is as simple as:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tools/testing/selftests/bpf/vmtest.sh
</code></pre></div></div>
<p>or, to run specific tests:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Run the mcpu=v4 version of the verifier_bounds selftests.</span>
tools/testing/selftests/bpf/vmtest.sh <span class="nt">--</span> ./test_progs-cpuv4 <span class="nt">-t</span> verifier_bounds
</code></pre></div></div>
<p>Unfortunately, because the selftests have a number of dependencies (LLVM, pahole, libelf, etc.) and things sometimes break, it is not that uncommon to run into compilation, linking, and runtime errors.</p>

<p><br /></p>

<h3 id="conflicting-kfunc-declarations">Conflicting kfunc declarations</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>In file included from progs/stream.c:8:
bpf_arena_common.h:47:15: error: conflicting types for 'bpf_arena_alloc_pages'
47 | void __arena* bpf_arena_alloc_pages(void *map, void __arena *addr, __u32 page_cnt,
| ^
tools/include/vmlinux.h:152158:14: note: previous declaration is here
152158 | extern void *bpf_arena_alloc_pages(void *p__map, void *addr__ign, u32 page_cnt, int node_id, u64 flags) __weak __ksym;
| ^
</code></pre></div></div>
<p>In my case, LLVM was too old and generated different prototypes to what the test already had.
The same sort of errors can also happen with older pahole versions.</p>

<p><strong>Solution:</strong> Upgrade <a href="https://apt.llvm.org/">LLVM</a> and <a href="https://git.kernel.org/pub/scm/devel/pahole/pahole.git">pahole</a> to the latest stable versions.
<br /><br /></p>

<h3 id="undeclared-kfuncs">Undeclared kfuncs</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>progs/bpf_iter_tasks.c:98:8: error: call to undeclared function 'bpf_copy_from_user_task_str'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
   98 |         ret = bpf_copy_from_user_task_str((char *)task_str1, sizeof(task_str1), ptr, task, 0);
      |               ^
1 error generated.
</code></pre></div></div>
<p>Pahole v1.27 or newer is needed to parse vmlinux and discover exported kfuncs.
With older versions, you’ll run into the above error because the kfuncs declarations are missing.</p>

<p><strong>Solution:</strong> Upgrade <a href="https://git.kernel.org/pub/scm/devel/pahole/pahole.git">pahole</a>.
<br /><br /></p>

<h3 id="shared-object-file-not-found">Shared object file not found</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./test_progs: error while loading shared libraries: libpcap.so.0.8: cannot open shared object file: No such file or directory
</code></pre></div></div>
<p>That can happen if the selftests binaries (ex., <code class="language-plaintext highlighter-rouge">test_progs</code> here) have been linked against a library that isn’t present in the selftests VM.</p>

<p><strong>Solution:</strong> Use static linking:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh
</code></pre></div></div>
<p><strong>Workaround:</strong> If the error is specific to libpcap, you can also uninstall libpcap-dev as that library is optional and only required to run some selftests.
<br /><br /></p>

<h3 id="incompatible-glibc-version">Incompatible glibc version</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./test_progs -t verifier_xdp
./test_progs: /usr/lib/libc.so.6: version `GLIBC_2.38' not found (required by ./test_progs)
</code></pre></div></div>
<p>That will happen if your host system has a newer glibc version than the selftests VM.</p>

<p><strong>Solution:</strong> Use static linking as above.
<br /><br /></p>

<h3 id="cannot-find-libsystemd">Cannot find libsystemd</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/usr/bin/ld: cannot find -lsystemd: No such file or directory
</code></pre></div></div>
<p>Since Linux v6.12, the selftests may optionally rely on libpcap-dev.
When linking statically, that can cause the above error because libpcap-dev pulls in a lot of dependencies, including libsystemd.
On some distros<sup id="fnref:include-ubuntu24" role="doc-noteref"><a href="#fn:include-ubuntu24" class="footnote" rel="footnote">1</a></sup>, installing libsystemd doesn’t fix it because its packaging is broken<sup id="fnref:libcap-dep-bug" role="doc-noteref"><a href="#fn:libcap-dep-bug" class="footnote" rel="footnote">2</a></sup>, causing the following errors:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/usr/bin/ld: (.text.change_capability+0x71): undefined reference to `cap_set_flag'
/usr/bin/ld: (.text.change_capability+0x80): undefined reference to `cap_set_proc'
/usr/bin/ld: (.text.change_capability+0x9f): undefined reference to `cap_free'
</code></pre></div></div>

<p><strong>Solution:</strong> Install libsystemd-dev.<br />
<strong>Workaround:</strong> Uninstall libpcap-dev if libsystemd-dev is broken on your distro.
<br /><br /></p>

<h3 id="no-rule-to-make-target">No rule to make target</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make: *** No rule to make target 'bpf_arena_common.h', needed by 'tools/testing/selftests/bpf/arena_htab.test.o'.  Stop.
make: *** Waiting for unfinished jobs....
</code></pre></div></div>
<p>or</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make[5]: *** No rule to make target 'str_error.h', needed by 'tools/bpf/resolve_btfids/libbpf/staticobjs/libbpf.o'.  Stop.
make[4]: *** [Makefile:152: tools/bpf/resolve_btfids/libbpf/staticobjs/libbpf-in.o] Error 2
make[3]: *** [Makefile:62: tools/bpf/resolve_btfids//libbpf/libbpf.a] Error 2
</code></pre></div></div>
<p>These errors can happen after changing branches.
They happen because stale <code class="language-plaintext highlighter-rouge">.cmd</code> files remain, referencing header files that no longer exist.
For instance, in my second example, I tried to run the selftests on bpf-next after running them on v6.6.
It fails because <code class="language-plaintext highlighter-rouge">tools/bpf/resolve_btfids/libbpf/staticobjs/.libbpf.o.cmd</code> from my v6.6 run refers to <code class="language-plaintext highlighter-rouge">str_error.h</code>, but that header file was removed in v6.18.</p>

<p><strong>Solution:</strong> Run <code class="language-plaintext highlighter-rouge">make -C tools/testing/selftests/bpf clean &amp;&amp; make -C tools/bpf/resolve_btfids clean</code> to clean up stale object files.
<br /><br /></p>

<h3 id="undefined-references-to-zstd">Undefined references to zstd</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/13/../../../x86_64-linux-gnu/libelf.a(elf_compress.o): in function `__libelf_compress':
(.text+0x113): undefined reference to `ZSTD_createCCtx'
/usr/bin/ld: (.text+0x2a9): undefined reference to `ZSTD_compressStream2'
/usr/bin/ld: (.text+0x2b4): undefined reference to `ZSTD_isError'
/usr/bin/ld: (.text+0x2db): undefined reference to `ZSTD_freeCCtx'
</code></pre></div></div>
<p>If linking statically, you might hit this on kernels before v6.8 because newer versions of libelf require libzstd, but the kernel doesn’t include it in LD flags.</p>

<p><strong>Solution:</strong> <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8998a479fd96"><code class="language-plaintext highlighter-rouge">8998a479fd96</code></a> (v6.8+).<br />
<strong>Workaround:</strong> If the above commit doesn’t apply cleanly, you can also just add <code class="language-plaintext highlighter-rouge">-lzstd</code> to <code class="language-plaintext highlighter-rouge">LDLIBS</code> manually.
<br /><br /></p>

<h3 id="llvm-not-detected">LLVM not detected</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Auto-detecting system features:
...                                    llvm: [ OFF ]
</code></pre></div></div>
<p>There’s a number of reasons this can happen<sup id="fnref:llvm-off-reasons" role="doc-noteref"><a href="#fn:llvm-off-reasons" class="footnote" rel="footnote">3</a></sup>, but one I encountered recently is when the version of <code class="language-plaintext highlighter-rouge">llvm-config</code> doesn’t match the libllvm version you installed.</p>

<p><strong>Solution:</strong> Update <code class="language-plaintext highlighter-rouge">llvm-config</code> to match the libllvm &amp; clang versions.
<br /><br /></p>

<h3 id="unexpected-__counted_by-attribute">Unexpected <code class="language-plaintext highlighter-rouge">__counted_by</code> attribute</h3>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>In file included from test_tag.c:18:
/usr/include/linux/if_alg.h:45:22: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘__counted_by’
   45 |         __u8    iv[] __counted_by(ivlen);
</code></pre></div></div>
<p>With commit <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dacbfc167808"><code class="language-plaintext highlighter-rouge">dacbfc167808</code></a> (v7.0), two selftests started indirectly relying on the <code class="language-plaintext highlighter-rouge">__counted_by</code> macro that wasn’t defined in the tools headers.
That is only an issue if the installed UAPI headers (ex., <code class="language-plaintext highlighter-rouge">/usr/include/linux/if_alg.h</code> above) include commit <code class="language-plaintext highlighter-rouge">dacbfc167808</code>.
It’s also unlikely to affect many people because the issue was fixed shortly after.</p>

<p><strong>Solution:</strong> <a href="https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=0c7ae130698e"><code class="language-plaintext highlighter-rouge">0c7ae130698e</code></a> (bpf tree).<br />
<strong>Workaround:</strong> Define <code class="language-plaintext highlighter-rouge">__counted_by</code> manually in <code class="language-plaintext highlighter-rouge">tools/include/uapi/linux/stddef.h</code> or install an older version of the UAPI headers via <code class="language-plaintext highlighter-rouge">make headers_install INSTALL_HDR_PATH=/usr</code>.
<br /><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:include-ubuntu24" role="doc-endnote">
      <p>Including Ubuntu 24.04. <a href="#fnref:include-ubuntu24" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:libcap-dep-bug" role="doc-endnote">
      <p>It doesn’t declare libcap-dev as a dependency, leading to linking errors. <a href="#fnref:libcap-dep-bug" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:llvm-off-reasons" role="doc-endnote">
      <p>I’m happy to document others if people run into them in the context of BPF selftests. <a href="#fnref:llvm-off-reasons" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ebpf" /><summary type="html"><![CDATA[This post covers the most common errors when trying to run the eBPF selftests using vmtest.sh, including compilation errors on kfuncs declarations, missing or mismatched libraries, and other miscellaneous issues.]]></summary></entry><entry><title type="html">BPF Verifier State Pruning: Timeline</title><link href="https://pchaigno.github.io/ebpf/2026/01/20/state-pruning-timeline.html" rel="alternate" type="text/html" title="BPF Verifier State Pruning: Timeline" /><published>2026-01-20T08:26:10+00:00</published><updated>2026-01-20T08:26:10+00:00</updated><id>https://pchaigno.github.io/ebpf/2026/01/20/state-pruning-timeline</id><content type="html" xml:base="https://pchaigno.github.io/ebpf/2026/01/20/state-pruning-timeline.html"><![CDATA[<script>
$(document).ready(function() {
  $('.timeline-item').each(function(i) {
    $(this).delay(i * 200).animate({ opacity: 1 }, 500);
  });
  $('.timeline-content a').each(function(i) {
    if ($(this).text().indexOf('*') != -1) {
      $(this).attr('title', 'This commit was backported to at least one LTS kernel.');
    }
  });
  $('li.up').each(function(i) {
    $(this).attr('title', 'This change mostly increased complexity.');
  })
  $('li.down').each(function(i) {
    $(this).attr('title', 'This change mostly decreased complexity.');
  })
  $('li.neutral').each(function(i) {
    $(this).attr('title', 'This change didn\'t have an impact on complexity.');
  })
});
</script>

<p><em>This article is part of a series of notes that Mahé Tardy and I wrote to prepare a <a href="https://lpc.events/event/19/contributions/2162/">presentation introducing the BPF verifier state pruning</a> for Linux Plumbers 2025 in Tokyo.
You can also find <a href="https://lpc.events/event/19/contributions/2162/attachments/1820/3904/LPC25_State_Pruning.pdf">the slides</a> and <a href="http://www.youtube.com/watch?v=EoEBkFJ3St4">the video recording</a> of the presentation.</em></p>

<p>State pruning is what helps the BPF verifier in Linux scale to larger programs.
It mitigates <a href="https://en.wikipedia.org/wiki/Path_explosion">the path explosion problem</a> by pruning paths that are equivalent to already-verified paths.
State pruning evolved alongside the verifier for the past decade.
As we illustrated in our talk, it started as a simple optimization and grew into a more complex and efficient component of the verifier.</p>

<p>This timeline tracks the main changes state pruning went through.
Each commit is prefixed with a symbol to indicate the overall impact the change had on the complexity (the “cost” of verification).
For example, <code class="language-plaintext highlighter-rouge">↓</code> means the commit decreased complexity and therefore helped the verifier scale.
Commits suffixed with a <code class="language-plaintext highlighter-rouge">*</code> have been backported to at least one LTS kernel.</p>

<p>As we publish more articles with Mahé, I’ll include more link to Read more on important changes.</p>

<!--  -->
<div class="timeline">
  <div class="timeline-item">
    <div class="timeline-date">v3.18, 2014</div>
    <ul class="timeline-content">
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=17a5267067f3">17a526</a>&nbsp;Initial verifier logic.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f1bca824dabb">f1bca8</a>&nbsp;Initial state pruning logic.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v4.7, 2016</div>
    <ul class="timeline-content">
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=07016151a446">070161</a>&nbsp;Add a pruning point after call instructions, in addition to existing pruning points on conditional jump target and after <code class="language-plaintext highlighter-rouge">BPF_JA</code>.<br /><a href="https://mtardy.com/posts/prune-points/" target="_blank" class="read-more">Read more</a></span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v4.11, 2017</div>
    <ul class="timeline-content">
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f0318d01b694">f0318d</a>&nbsp;Track offseted map value pointers while on the stack.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v4.12, 2017</div>
    <ul class="timeline-content">
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c2ce60bdd3d">3c2ce6*</a>&nbsp;Add a pruning point on conditional jump instructions.<br /><a href="https://mtardy.com/posts/prune-points/" target="_blank" class="read-more">Read more</a></span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v4.14, 2017</div>
    <ul class="timeline-content">
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dc503a8ad984">dc503a</a>&nbsp;Track register liveness.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v4.16, 2017</div>
    <ul class="timeline-content">
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f18f62ee164">2f18f6</a>&nbsp;Fix liveness propagation in case of register with a pointer filled from the stack.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3bf15921c58d">3bf159</a>&nbsp;Detect dead <code class="language-plaintext highlighter-rouge">BPF_JNE</code> branches.</span></li>
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4d7e40a5b71">f4d7e4</a>&nbsp;Verification of subprogs (BPF-to-BPF function calls).</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc2b14d51053">cc2b14</a>&nbsp;Recognize zero-initialized stack slots.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v4.20, 2018</div>
    <ul class="timeline-content">
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4f7b3e82589e">4f7b3e*</a>&nbsp;Extend dead-branch detection to all types of numeric conditions.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v5.0, 2018</div>
    <ul class="timeline-content">
      <li class="unknown"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e434b8cdf788">e434b8*</a>&nbsp;Track scalars through 32-bit assignments.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19e2dbb7dd97">19e2db</a>&nbsp;Don't assume a larger stack means verifier states aren't equivalent.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v5.2, 2019</div>
    <ul class="timeline-content">
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f4686c41bdf">9f4686</a>&nbsp;Least-recently used mechanism to limit number of saved states at pruning points.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v5.3, 2019</div>
    <ul class="timeline-content">
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f7cf25b2026d">f7cf25*</a>&nbsp;Track constant scalar registers as they are spilled to the stack.</span></li>
      <li class="unknown"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2589726d12a1">258972</a>&nbsp;Support bounded loops, introduce heuristic to decide when to save the verifier state on a pruning point.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b5dc0163d8fd">b5dc01</a>&nbsp;Introduce precise tracking of scalar values.</span></li>
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3ce685dd01a">a3ce68</a>&nbsp;Fix precision propagation in case of pruned paths.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6754172c208d">675417</a>&nbsp;Fix precise tracking in case of subprogs.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v5.5, 2019</div>
    <ul class="timeline-content">
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc52d9140aa9">cc52d9</a>&nbsp;Require precise tracking of map lookup key for tail calls.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v5.6, 2020</div>
    <ul class="timeline-content">
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=51c39bb1d5d1">51c39b</a>&nbsp;Support function-by-function verification.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v5.10, 2020</div>
    <ul class="timeline-content">
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=75748837b7e5">757488</a>&nbsp;Allocate an ID for scalars to propagate infered ranges to identical scalars.</span></li>
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5689d49b71ad">5689d4</a>&nbsp;Track 64-bit bounded scalar registers as they are spilled to the stack.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v5.15, 2021</div>
    <ul class="timeline-content">
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfc6bb74e4f1">bfc6bb</a>&nbsp;Add a pruning point on calls to asynchronous callback functions.<br /><a href="https://mtardy.com/posts/prune-points/" target="_blank" class="read-more">Read more</a></span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v5.16, 2021</div>
    <ul class="timeline-content">
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=354e8f1970f8">354e8f*</a>&nbsp;Track all bounded scalar registers as they are spilled to the stack.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.2, 2022</div>
    <ul class="timeline-content">
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3b666bfa9c9">a3b666*</a>&nbsp;Fix precision propagation in case of ALU operations.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be2ef8161572">be2ef8*</a>&nbsp;Do not completely disable precise tracking whenever subprogs are used.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f63181b6ae79">f63181*</a>&nbsp;Improve accuracy of precision propagation.</span></li>
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bffdeaa8a5af">bffdea*</a>&nbsp;Decouple jump history from pruning points.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a830b53c17b">7a830b*</a>&nbsp;Improve accuracy of precision propagation by actively forgetting precise marks.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.3, 2022–2023</div>
    <ul class="timeline-content">
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4633a0068258">4633a0</a>&nbsp;Fix register comparison in state pruning to take into account ID remapping between paths.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6715df8d5d24">6715df*</a>&nbsp;Relax slack slot equivalence when running with <code class="language-plaintext highlighter-rouge">CAP_PERFMON</code>.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.4, 2023</div>
    <ul class="timeline-content">
  <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4b5ce570dbef">4b5ce5</a>&nbsp;Introduce force pruning points for <code class="language-plaintext highlighter-rouge">iter_next</code> kfuncs.</span></li>
  <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=13fbcee55706">13fbce</a>&nbsp;Improve <code class="language-plaintext highlighter-rouge">BPF_JEQ</code> and <code class="language-plaintext highlighter-rouge">BPF_JNE</code> dead-branch detection.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.5, 2023</div>
    <ul class="timeline-content">
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=407958a0e980">407958</a>&nbsp;Introduce <code class="language-plaintext highlighter-rouge">struct backtrack_state</code> to track the precise marking through backtracking.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fde2a3882bd0">fde2a3</a>&nbsp;Support precise tracking for subprogs, including callback functions.</span></li>
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=904e6ddf4133">904e6d</a>&nbsp;Share precise mark between all scalars with the same ID.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.7, 2023</div>
    <ul class="timeline-content">
      <li class="unknown"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2793a8b015f7">2793a8*</a>&nbsp;For <code class="language-plaintext highlighter-rouge">iter_next</code> loops, require exact state match in state pruning and introduce widening of registers.</span></li>
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=42d31dd601fa">42d31d</a>&nbsp;Improve <code class="language-plaintext highlighter-rouge">BPF_JEQ</code> and <code class="language-plaintext highlighter-rouge">BPF_JNE</code> dead-branch detection by using signed ranges.</span></li>
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab5cfac139ab">ab5cfa*</a>&nbsp;Add a pruning point on calls to synchronous callback functions, fix callback function verification to verify all iterations.<br /><a href="https://mtardy.com/posts/prune-points/" target="_blank" class="read-more">Read more</a></span></li>
      <li class="unknown"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cafe2c21508a">cafe2c*</a>&nbsp;Extend use of register widening to synchronous callback functions.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.8, 2023</div>
    <ul class="timeline-content">
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0acd03a5bd18">0acd03*</a>&nbsp;Require precise tracking of R0 on callback function return.</span></li>
      <li class="up"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eabe518de533">eabe51</a>&nbsp;Require precise tracking of R0 when checking return code is within expected range.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=41f6f64e6999">41f6f6</a>&nbsp;Precise tracking on spill to the stack even if using non-R10 register.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18a433b62061">18a433</a>&nbsp;Don't trigger precise tracking whenever writing zero register to aligned stack slots.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.9, 2024</div>
    <ul class="timeline-content">
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a4c57f52b5e">9a4c57</a>&nbsp;Don't trigger precise tracking whenever writing zero immediate to aligned stack slots, a pattern common for <code class="language-plaintext highlighter-rouge">mcpu=v4</code>.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6efbde200bf3">6efbde</a>&nbsp;Improve state pruning when comparing unbounded spilled register to misc. stack slots.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.12, 2024</div>
    <ul class="timeline-content">
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4bf79f9be434">4bf79f</a>&nbsp;Improving precise tracking at conditional jumps in case of linked registers.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.15, 2025</div>
    <ul class="timeline-content">
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14c8552db644">14c855</a>&nbsp;Data-flow analysis for register liveness, before the actual program analysis.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.17, 2025</div>
    <ul class="timeline-content">
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=96c6aa4c63af">96c6aa</a>&nbsp;Compute Strongly Connected Components (SCCs) of control-flow graph.</span></li>
      <li class="unknown"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c9e31900b54c">c9e319</a>&nbsp;Use SCC to improve read and precise marks propagation in case of loops.</span></li>
    </ul>
  </div>
  <div class="timeline-item">
    <div class="timeline-date">v6.18, 2025</div>
    <ul class="timeline-content">
      <li class="neutral"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3698c356ad9">b3698c</a>&nbsp;Introduce path-insensitive data flow analysis for liveness tracking.</span></li>
      <li class="down"><span><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f41345f47fb2">f41345*</a>&nbsp;Use tnum information to improve <code class="language-plaintext highlighter-rouge">BPF_JEQ</code> and <code class="language-plaintext highlighter-rouge">BPF_JNE</code> dead-branch detection.</span></li>
    </ul>
  </div>
</div>
<!-- ) -->]]></content><author><name></name></author><category term="ebpf" /><summary type="html"><![CDATA[This blog post presents a timeline of the main changes to state pruning in the eBPF verifier. State pruning allows the verifier to scale to larger BPF programs by eliminating paths.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://pchaigno.github.io/assets/illustration-state-pruning-timeline.png" /><media:content medium="image" url="https://pchaigno.github.io/assets/illustration-state-pruning-timeline.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Test Verifier Changes on Cilium’s BPF Programs</title><link href="https://pchaigno.github.io/ebpf/2025/09/23/test-verifier-changes-on-cilium-bpf-programs.html" rel="alternate" type="text/html" title="Test Verifier Changes on Cilium’s BPF Programs" /><published>2025-09-23T08:26:10+00:00</published><updated>2025-09-23T08:26:10+00:00</updated><id>https://pchaigno.github.io/ebpf/2025/09/23/test-verifier-changes-on-cilium-bpf-programs</id><content type="html" xml:base="https://pchaigno.github.io/ebpf/2025/09/23/test-verifier-changes-on-cilium-bpf-programs.html"><![CDATA[<p>At <a href="https://conferences.sigcomm.org/sigcomm/2025/workshop/ebpf/">SIGCOMM’25</a>, I was asked on two occasions how to test verifier changes on Cilium’s BPF programs.
That has been a recurring ask for a while and it makes a lot of sense to want to do this.
Cilium probably still has the largest open sourced BPF programs out there.
There are many heuristics in the Linux verifier, most notably around state pruning.
Thus, changes to the verifier can have hard to predict impacts, on the ability to verify programs or on the <em>complexity</em>, i.e., the number of instructions the verifier has to walk to analyze a program.
Testing your changes on Cilium is one way to evaluate them.</p>

<p>Testing verifier changes on Cilium is also a well-established practice of the kernel community<sup id="fnref:testing-verifier-with-cilium" role="doc-noteref"><a href="#fn:testing-verifier-with-cilium" class="footnote" rel="footnote">1</a></sup>.
Yet, it can be difficult to know how to test on Cilium.
Its BPF programs can be compiled with many different configurations and only a few really maximize the size and complexity.</p>

<p>In this post, I’ll show how to run Cilium’s complexity test suite on your patched kernel.
The complexity test suite is built to try and maximize the complexity, in an effort to spot complexity issues before they reach users.</p>

<p>We’ll start by building a test VM with your changes, but you can also skip to <a href="#run-the-complexity-tests">Run the Complexity Tests</a> if you prefer to boot on your patched kernel directly.
The VM is mostly useful if you don’t want to boot on your kernel or if you want to run the full Cilium end-to-end tests.</p>

<!--  -->
<ul id="toc" class="section-nav">
<li class="toc-entry toc-h3"><a href="#download-repositories">Download Repositories</a></li>
<li class="toc-entry toc-h3"><a href="#cilium-test-vm">Cilium Test VM</a>
<ul>
  <li class="toc-entry toc-h4"><a href="#build-the-vm">Build the VM</a></li>
  <li class="toc-entry toc-h4"><a href="#boot-and-prepare-the-vm">Boot and Prepare the VM</a></li>
</ul>
</li>
<li class="toc-entry toc-h3"><a href="#cilium-complexity-tests">Cilium Complexity Tests</a>
<ul>
<li class="toc-entry toc-h4"><a href="#run-the-complexity-tests">Run the Complexity Tests</a></li>
<li class="toc-entry toc-h4"><a href="#parse-the-results">Parse the Results</a></li>
<li class="toc-entry toc-h4"><a href="#comparing-results-across-versions">Comparing Results Across Versions</a></li>
</ul>
</li>
<li class="toc-entry toc-h3"><a href="#run-ciliums-end-to-end-tests">Run Cilium’s End-to-End Tests</a></li>
</ul>
<!--  -->

<p><br /></p>

<h3 id="download-repositories">Download Repositories</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">WORKDIR</span><span class="o">=</span>/tmp
git clone <span class="nt">--depth</span> 1 <span class="nt">-b</span> v1.19.0-pre.0 https://github.com/cilium/cilium
<span class="c"># Only needed if building the test VM:</span>
git clone <span class="nt">--depth</span> 1 https://github.com/cilium/little-vm-helper-images
git clone <span class="nt">--depth</span> 1 <span class="nt">-b</span> v0.0.26 https://github.com/cilium/little-vm-helper
</code></pre></div></div>

<p><br /></p>

<h3 id="cilium-test-vm">Cilium Test VM</h3>

<p>This part is only required if you don’t want to boot on your patched kernel, or if you want to <a href="#run-ciliums-end-to-end-tests">run Cilium’s end-to-end tests</a> afterward.</p>

<h4 id="build-the-vm">Build the VM</h4>
<p>We will build the VM image using <a href="https://github.com/cilium/little-vm-helper">Little VM Helper</a> (LVH), the tool used to run end-to-end tests in Cilium’s CI.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$WORKDIR</span>/little-vm-helper-images/
vim _data/kernels.json
</code></pre></div></div>
<p>We first need to edit the configuration to add our patched kernel.
Note how the URL follows the format <code class="language-plaintext highlighter-rouge">your_repo_url?depth=1#your_branch</code>.</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my-kernel"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"url"</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://github.com/pchaigno/linux?depth=1#change-pruning-point-heuristic"</span><span class="w">
</span><span class="p">}</span><span class="err">,</span><span class="w">
</span></code></pre></div></div>
<p>With that, we can build the kernel and extract the VM image:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">KERNEL_VERSIONS</span><span class="o">=</span><span class="s2">"my-kernel"</span> make complexity-test
<span class="nv">c</span><span class="o">=</span><span class="si">$(</span>docker create quay.io/lvh-images/complexity-test-ci:my-kernel<span class="si">)</span>
docker <span class="nb">cp</span> <span class="nv">$c</span>:/data/images/complexity-test_my-kernel.qcow2.zst /tmp
zstd <span class="nt">--decompress</span> /tmp/complexity-test_my-kernel.qcow2.zst
</code></pre></div></div>
<p>If you want to compare complexity numbers between different kernel versions, you may want to build multiple kernels at once.
For example, if your patch is based on bpf-next, you may want to pass run:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">KERNEL_VERSIONS</span><span class="o">=</span><span class="s2">"bpf-next my-kernel"</span> make complexity-test
</code></pre></div></div>

<h4 id="boot-and-prepare-the-vm">Boot and Prepare the VM</h4>
<p>We can then run the VM image using LVH:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$WORKDIR</span>/little-vm-helper/
make
./lvh run <span class="nt">--host-mount</span> ~/cilium <span class="nt">--image</span> /tmp/complexity-test_my-kernel.qcow2
</code></pre></div></div>
<p>Username is <code class="language-plaintext highlighter-rouge">root</code>. There’s no password.</p>

<p>Finally, we need to extract the LLVM version used by Cilium to compile its BPF programs:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/host/contrib/scripts/extract-llvm.sh /tmp/llvm
<span class="nb">mv</span> /tmp/llvm/usr/local/bin/<span class="o">{</span>clang,llc<span class="o">}</span> /bin/
<span class="nb">rm</span> <span class="nt">-r</span> /tmp/llvm
<span class="nb">mkdir</span> <span class="nt">-p</span> /host/datapath-verifier
</code></pre></div></div>

<p><br /></p>

<h3 id="cilium-complexity-tests">Cilium Complexity Tests</h3>

<h4 id="run-the-complexity-tests">Run the Complexity Tests</h4>
<p>If you’re running in the LVH VM, the following command will execute Cilium’s complexity test suite:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /host/
<span class="nb">export </span><span class="nv">PRIVILEGED_TESTS</span><span class="o">=</span><span class="nb">true
</span>go <span class="nb">test</span> <span class="nt">-v</span> <span class="nt">-timeout</span><span class="o">=</span>20m ./pkg/datapath/loader <span class="nt">-run</span> <span class="s2">"TestPrivilegedVerifier"</span> <span class="se">\</span>
  <span class="nt">--cilium-base-path</span> /host <span class="nt">--result-dir</span> /host/datapath-verifier <span class="se">\</span>
  <span class="nt">--kernel-version</span> netnext
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">/host</code> points to the base of the Cilium clone.
If running this on the host (if you booted on your patched kernel), you’ll need to modify <code class="language-plaintext highlighter-rouge">/host</code> in the command to point to the Cilium clone.</p>

<p>The argument <code class="language-plaintext highlighter-rouge">--kernel-version</code> points to the set of configurations used by Cilium and, unless testing an old kernel (&lt;= v6.1), it should remain set to <code class="language-plaintext highlighter-rouge">netnext</code>.</p>

<h4 id="parse-the-results">Parse the Results</h4>
<p>Results are found in the <code class="language-plaintext highlighter-rouge">datapath-verifier/</code> directory in the Cilium clone and take the form of a JSON file.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$WORKDIR</span>/cilium/datapath-verifier/
<span class="nb">cat </span>verifier-complexity.json
</code></pre></div></div>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w">
  </span><span class="p">{</span><span class="w">
    </span><span class="nl">"collection"</span><span class="p">:</span><span class="w"> </span><span class="s2">"lxc"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"build"</span><span class="p">:</span><span class="w"> </span><span class="s2">"5"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"load"</span><span class="p">:</span><span class="w"> </span><span class="s2">"0"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"program"</span><span class="p">:</span><span class="w"> </span><span class="s2">"cil_from_container"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"insns_processed"</span><span class="p">:</span><span class="w"> </span><span class="mi">748</span><span class="p">,</span><span class="w">
    </span><span class="nl">"insns_limit"</span><span class="p">:</span><span class="w"> </span><span class="mi">1000000</span><span class="p">,</span><span class="w">
    </span><span class="nl">"max_states_per_insn"</span><span class="p">:</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w">
    </span><span class="nl">"total_states"</span><span class="p">:</span><span class="w"> </span><span class="mi">57</span><span class="p">,</span><span class="w">
    </span><span class="nl">"peak_states"</span><span class="p">:</span><span class="w"> </span><span class="mi">57</span><span class="p">,</span><span class="w">
    </span><span class="nl">"mark_read"</span><span class="p">:</span><span class="w"> </span><span class="mi">13</span><span class="p">,</span><span class="w">
    </span><span class="nl">"verification_time_microseconds"</span><span class="p">:</span><span class="w"> </span><span class="mi">306</span><span class="p">,</span><span class="w">
    </span><span class="nl">"stack_depth"</span><span class="p">:</span><span class="w"> </span><span class="mi">64</span><span class="w">
  </span><span class="p">},</span><span class="w">
  </span><span class="err">...</span><span class="w">
</span></code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">collection</code> refers to a set of BPF programs (typically one of Cilium’s <code class="language-plaintext highlighter-rouge">bpf/bpf_[collection].c</code> files). <code class="language-plaintext highlighter-rouge">program</code> is the name of the BPF program being tested. <code class="language-plaintext highlighter-rouge">build</code> and <code class="language-plaintext highlighter-rouge">load</code> are the IDs of the build-time and load-time configurations used for this test case. The combination of <code class="language-plaintext highlighter-rouge">program</code>, <code class="language-plaintext highlighter-rouge">build</code>, and <code class="language-plaintext highlighter-rouge">load</code> can serve as an index for the complexity results.</p>

<p><code class="language-plaintext highlighter-rouge">insns_processed</code> is the number of instructions the verifier had to walk to verify the program. It’s typically referred to as the complexity of the program for that kernel. <code class="language-plaintext highlighter-rouge">insns_limit</code> is the complexity limit for that kernel (1M on recent kernels). <code class="language-plaintext highlighter-rouge">verification_time_microseconds</code> is the total verification time in microseconds.</p>

<p><code class="language-plaintext highlighter-rouge">max_states_per_insn</code> is the maximum number of verifier states the verifier attached to an instruction in the program. A <code class="language-plaintext highlighter-rouge">max_states_per_insn</code> of 5 means that there is at least one instruction in the program for which the verifier saved 5 different states (for different paths). <code class="language-plaintext highlighter-rouge">total_states</code> is the number of verifier states that were allocated during verification. Given states can also be freed during the analysis, <code class="language-plaintext highlighter-rouge">peak_states</code> gives the maximum number of verifier states that existed at any point in time; it is closely related to the verifier’s memory consumption.</p>

<p><code class="language-plaintext highlighter-rouge">mark_read</code> is the size of the longest parentage chain the verifier had to walk for liveness tracking. <code class="language-plaintext highlighter-rouge">stack_depth</code> is the maximum stack depth used by the BPF program.</p>

<h4 id="comparing-results-across-versions">Comparing Results Across Versions</h4>

<p>The largest BPF programs are typically found in the <code class="language-plaintext highlighter-rouge">lxc</code> and <code class="language-plaintext highlighter-rouge">host</code> collections.</p>

<p>The following command can be used to compare results between different kernels.
It will emit a large number of plots, with comparisons for each program and each configuration.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$WORKDIR</span>/cilium/
python ./contrib/scripts/verifier_diff.py <span class="se">\</span>
  datapath-verifier/verifier-complexity-bpf-next.json <span class="se">\</span>
  datapath-verifier/verifier-complexity-my-kernel.json
</code></pre></div></div>

<p>The following image shows an example plot, for the patched kernel I’m using, in the case of the bpf_host Cilium program.</p>

<div style="text-align: center;">
<img src="/assets/illustration-cilium-test-verifier-changes.png" title="" alt="" style="width: 100%;" class="zoomable" />
</div>

<p><br /></p>

<h3 id="run-ciliums-end-to-end-tests">Run Cilium’s End-to-End Tests</h3>

<p>You can also use LVH images with your patched kernel to run end-to-end tests in Cilium’s CI.
To that end, you will need to build additional images:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">KERNEL_VERSIONS</span><span class="o">=</span><span class="s2">"my-kernel"</span> make kind
</code></pre></div></div>
<p>This command will create a new image <code class="language-plaintext highlighter-rouge">quay.io/lvh-images/kind-ci:my-kernel@sha256:xxxxxxx</code>.
You will need to retag and push this image to a Docker repository.</p>

<p>To have Cilium’s CI run your kernel, we just need a few changes.
First, we have to edit the <code class="language-plaintext highlighter-rouge">kernel:</code> lines in <code class="language-plaintext highlighter-rouge">.github/actions/e2e/configs.yaml</code> and <code class="language-plaintext highlighter-rouge">.github/actions/e2e/ipsec_configs.yaml</code> to refer to your kernel (i.e., <code class="language-plaintext highlighter-rouge">kernel: "my-kernel"</code>).
Then, apply the following diff, with whatever Docker repository you used (<code class="language-plaintext highlighter-rouge">docker.io/pchaigno</code> in my case):</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh">diff --git a/.github/actions/lvh-kind/action.yaml b/.github/actions/lvh-kind/action.yaml
index ecc8896cd4..7a7925686e 100644
</span><span class="gd">--- a/.github/actions/lvh-kind/action.yaml
</span><span class="gi">+++ b/.github/actions/lvh-kind/action.yaml
</span><span class="p">@@ -58,6 +58,8 @@</span> runs:
       uses: cilium/little-vm-helper@01debd6cb7e5514cfdb4a33e776bdc647bc5306e # v0.0.27
       with:
         test-name: $
<span class="gi">+        image-repo: 'docker.io/pchaigno'
+        image: 'kind-ci'
</span>         image-version: $
         images-folder-parent: "/tmp"
         host-mount: ./
</code></pre></div></div>

<p>Commit, open a draft pull request on <a href="https://github.com/cilium/cilium">Cilium’s repositories</a>, and ping your favorite Cilium committer to trigger the end-to-end tests.</p>

<p><br /></p>

<p>Thanks to Simone Magnani for the <code class="language-plaintext highlighter-rouge">verifier_diff.py</code> script, to Mahé Tardy for his help running LVH, and to all my colleagues who contributed to our complexity test suite over time!</p>

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:testing-verifier-with-cilium" role="doc-endnote">
      <p>See examples of using Cilium to test the complexity impact of verifier changes in commits <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18a433b62061"><code class="language-plaintext highlighter-rouge">18a433b62061</code></a>, <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6715df8d5d24"><code class="language-plaintext highlighter-rouge">6715df8d5d24</code></a>, <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3ce685dd01a"><code class="language-plaintext highlighter-rouge">a3ce685dd01a</code></a>, and <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=979d63d50c0c"><code class="language-plaintext highlighter-rouge">979d63d50c0c</code></a>. <a href="#fnref:testing-verifier-with-cilium" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ebpf" /><summary type="html"><![CDATA[This post describes how to use Cilium's large BPF programs to test and evaluate your changes to the Linux BPF verifier or to any other aspect of the kernel.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://pchaigno.github.io/assets/illustration-cilium-test-verifier-changes.png" /><media:content medium="image" url="https://pchaigno.github.io/assets/illustration-cilium-test-verifier-changes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">eBPF Workshops: Slides &amp;amp; Papers</title><link href="https://pchaigno.github.io/ebpf/2025/09/17/ebpf-workshops-slides-papers.html" rel="alternate" type="text/html" title="eBPF Workshops: Slides &amp;amp; Papers" /><published>2025-09-17T08:26:10+00:00</published><updated>2026-02-28T10:31:00+00:00</updated><id>https://pchaigno.github.io/ebpf/2025/09/17/ebpf-workshops-slides-papers</id><content type="html" xml:base="https://pchaigno.github.io/ebpf/2025/09/17/ebpf-workshops-slides-papers.html"><![CDATA[<script>
jqueryReady(function() {
	$('.abstract-dropdown').click(function() {
		$(this).parent().find('blockquote').toggle();
		return false;
	});
})
</script>

<p>Last week, the SIGCOMM conference hosted the third edition of the eBPF workshop in Coimbra.
The SIGCOMM website has <a href="https://conferences.sigcomm.org/sigcomm/2025/workshop/papers-workshop-info/">links to the papers</a>, but cannot link to the presentation slides, so I’m writing this short blog post just to have everything in one place.
<br /><br /></p>

<!--  -->
<ul id="toc" class="section-nav">
<li class="toc-entry toc-h3"><a href="#ebpf26-fourth-edition">eBPF'26: Fourth Edition</a></li>
<li class="toc-entry toc-h3"><a href="#ebpf25-third-edition">eBPF'25: Third Edition</a></li>
<li class="toc-entry toc-h3"><a href="#ebpf24-second-edition">eBPF'24: Second Edition</a></li>
<li class="toc-entry toc-h3"><a href="#ebpf23-first-edition">eBPF'23: First Edition</a></li>
</ul>
<!-- ) -->

<p><br /></p>

<h3 id="ebpf26-fourth-edition">eBPF’26: Fourth Edition</h3>

<p>The call for papers for the fourth eBPF workshop has opened: <a href="https://ebpf.github.io/2026/cfp.html">ebpf.github.io/2026/cfp.html</a>.
This year, the workshop will be hosted by <a href="https://sigops.org/s/conferences/sosp/2026/">the ACM SOSP 2026 conference</a>.
The workshop will take place on the 29th of September in Prague.</p>

<p><br /></p>

<h3 id="ebpf25-third-edition">eBPF’25: Third Edition</h3>

<div class="papers">

<div class="paper">
<h4>uXDP: Frictionless XDP Deployments in Userspace</h4>
<span class="authors">Yusheng Zheng (UC Santa Cruz), Panayiotis Gavriil (The D. E. Shaw Group), Marios Kogias (Imperial College London)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748360" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1xrGEn8AYAfqRt8hh2uCSJCnN0B_no96l" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">Modern network function (NF) deployments face a fundamental trade-off: kernel-based extended Berkeley Packet Filter (eBPF) NFs provide safety, portability, and an extensive tooling ecosystem, but are limited in performance, while kernel-bypass frameworks deliver high throughput but lack integrated verification and ease of deployment. We present uXDP, a new runtime that unifies these worlds by running unmodified, verified XDP programs in userspace. uXDP ensures compatibility and preserves the verification-driven safety, portability, and familiar workflows of eBPF while moving execution into the userspace, enabling more aggressive optimizations and flexibility. Without recompiling eBPF code, uXDP achieves throughput gains of up to 3.3× over in-kernel execution and improves Meta's Katran load balancer performance by 40%, all while retaining the trusted eBPF development model and deployment simplicity.</blockquote>
<br />
</div>

<div class="paper">
<h4>No Two Snowflakes Are Alike: Studying eBPF Libraries' Performance, Fidelity and Resource Usage</h4>
<span class="authors">Carlos Machado, Bruno Gião (INESC TEC &amp; U. Minho), Sebastião Amaro, Miguel Matos (IST Lisbon &amp; INESC-ID), João Paulo, Tânia Esteves (INESC TEC &amp; U. Minho)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748364" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1HpOKxNvEFhHlCwtnkA8lNs8jmukIGVQJ" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">As different eBPF libraries keep emerging, developers are left with the hard task of choosing the right one. Until now, this choice has been based on functional requirements (e.g., programming language support, development workflow), while quantitative metrics have been left out of the equation. In this paper, we argue that efficiency metrics such as performance, resource usage, and data collection fidelity also need to be considered for making an informed decision. We show it through an experimental study comparing five popular libraries: bpftrace, BCC, libbpf, ebpf-go, and Aya. For each, we implement three representative eBPF-based tools and evaluate them under different storage I/O workloads. Our results show that each library has its own strengths and weaknesses, as their specific features lead to distinct trade-offs across the selected efficiency metrics. These results further motivate experimental studies to increase the community's understanding of the eBPF ecosystem.</blockquote>
<br />
</div>

<div class="paper">
<h4>Performance Implications at the Intersection of AF_XDP and Programmable NICs</h4>
<span class="authors">Marco Molè, Farbod Shahinfar, Francesco Maria Tranquillo, Davide Zoni (Politecnico di Milano), Aurojit Panda (NYU), Gianni Antichi (Politecnico di Milano)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748359" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1Ec4Ycd_JMVZyFfcyewtkS0rpCMYoB-Hf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">AF_XDP is emerging as an easier way to implement zero-copy network bypass applications. This is because it allows mixed-mode deployments, where zero-copy and socket-based applications share the same NIC. However, AF_XDP relies on NIC hardware and driver features, but implementing these features on programmable NICs adds resource overheads and increases development complexity and thus might not be desirable. To address this, we examine the feasibility of using eBPF based kernel extensibility to implement the required features, and report on the tradeoff between an eBPF and a native NIC implementation. Our analysis involved updating the OpenNIC driver to support the loading of eBPF/XDP programs and zero-copy AF_XDP. Our implementation is of independent interest because it makes it easier to develop and evaluate alternate designs for mixed-mode zero-copy deployments, and new NIC accelerated applications. Our implementation is open-sourced.</blockquote>
<br />
</div>

<div class="paper">
<h4>Toward eBPF-Accelerated Pub-Sub Systems</h4>
<span class="authors">Beihao Zhou, Samer Al-Kiswany, Mina Tahmasbi Arashloo (University of Waterloo)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748365" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1GIUFAJ7HQzkQma-twqanw2otmpuypoX_" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">Publish-subscribe (pub-sub) systems are a fundamental building block for real-time distributed applications, where high throughput and low latency are critical. Existing brokers can suffer performance bottlenecks as they operate in user space and rely on the socket API and full kernel stack traversal for every message. We present BPF-Broker, a novel pub-sub broker that leverages eBPF to accelerate message dissemination by decoupling the broker's control and data paths. Subscriber management is handled in user space, while message forwarding is done early in the kernel using the TC ingress and XDP hooks. Our evaluation shows that BPF-Broker achieves up to 3× higher throughput compared to our Socket-based baseline broker under high subscriber counts, and up to 2-10× lower end-to-end latency. These results highlight the potential of eBPF in accelerating pub-sub systems.</blockquote>
<br />
</div>

<div class="paper">
<h4>A Memory Pool Allocator for eBPF Applications</h4>
<span class="authors">Gyuyeong Kim (Sungshin Women's University), Dongsu Han (KAIST)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748370" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1iOxo9OS0ostBq5mUmnjqel3KpuTfN6l1" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">eBPF enables high-performance kernel-level execution by eliminating networking stack traversal and context switching. Despite the advantages, eBPF applications face strict memory management constraints due to the eBPF verifier requirements that mandate static memory allocation. This limitation imposes a fundamental tradeoff between application performance and memory efficiency, ultimately restricting the potential of eBPF. We present Kerby, a dynamic memory pool allocator for eBPF that enables eBPF applications to dynamically manage pre-allocated memory by representing variable-length data as collections of fixed-size blocks. This allows applications to increase the amount of kernel-resident data while minimizing internal fragmentation. Our preliminary evaluation with key-value store implementations demonstrates that Kerby achieves significant improvements in both memory utilization and throughput.</blockquote>
<br />
</div>

<div class="paper">
<h4>SchedBPF - Scheduling BPF programs</h4>
<span class="authors">Kavya Shekar, Dan Williams (Virginia Tech)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748366" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/15IJ0XNkNIeNnMiKiCcUaWMohiX9BDcGj" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">The Linux BPF framework enables the execution of verified custom bytecode in the critical path of various Linux kernel routines, allowing for efficient in-kernel extensions. The safety properties and low execution overhead of BPF programs have led to advancements in kernel extension use-cases that can be broadly categorized into tracing, custom kernel policies, and application acceleration. However, BPF is fundamentally event-driven and lacks native support for periodic or continuous tasks such as background tracing, metric aggregation, or kernel housekeeping. Existing approaches such as kernel modules with kthreads, userspace daemons, or BPF timers fail to satisfy all the essential requirements for periodic kernel extensions such as fine-grained CPU control, kernel safety, and minimal overhead. To address this gap, we propose SchedBPF—a conceptual framework that enables periodic execution of BPF programs on kernel threads. SchedBPF program executions are sandboxed and preemptible, as governed by the existing BPF verifier and JIT engine. They also adopt time-slice semantics, cgroup-style CPU quotas, and nice-level priority control, similar to kernel threads. SchedBPF aims to enable low-overhead, periodic execution of safe BPF code with fine-grained CPU resource management.</blockquote>
<br />
</div>

<div class="paper">
<h4>ChainIO: Bridging Disk and Network Domains with eBPF</h4>
<span class="authors">Zheng Cao, He Xuhang (UC Merced), Yanpeng Hu (ShanghaiTech University), Yusheng Zheng, Yiwei Yang (UC Santa Cruz), Jianchang Su, Wei Zhang (University of Connecticut), Andi Quinn (UC Santa Cruz)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748371" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1Xh24x8qTNTVjMrelVHItCFUfFejFGToq" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">Modern data-driven services from analytical databases and key-value stores to stream processors suffer high tail-latencies because each disk read and subsequent packet send/recv incurs a separate user-kernel crossing and redundant buffer copy. While Linux's io_uring now supports both block and socket I/O with asynchronous, batched submissions, it does not provide zero-copy transfers between storage and network domains; AF_XDP delivers high-performance packet I/O but is siloed to the network stack. No existing framework transparently unifies these mechanisms end-to-end. We present ChainIO, an eBPF-based system that intercepts and rewrites I/O syscalls, uses ring buffers to pass data descriptors directly between io_uring and AF_XDP, and orchestrates in-kernel execution to chain disk reads into network sends (and vice versa) with full POSIX semantics, fallback safety for unsupported cases, and zero application changes. Our prototype works with unmodified binaries and improves ClickHouse's TPC-H query throughput by up to 39%. ChainIO thus offers a general, safe, and high-performance path for cross-domain I/O optimization in diverse data-intensive workloads.</blockquote>
<br />
</div>

<div class="paper">
<h4>bpfCP: Efficient and Extensible Process Checkpointing via eBPF</h4>
<span class="authors">Juntong Deng (King's College London), Stephen Kell (King's College London)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748373" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1S7VHoJMSCiUNZ01I6yjn_T7Qd5tHqW2J" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">Live migration, snapshotting, and accelerated startup of applications or containers have long been implemented using checkpoint and restore primitives. To save or 'checkpoint', it is necessary to dump not only its userspace state, but also a large amount of state in the kernel. The current widely used implementation on Linux relies heavily on the /proc file system and special system call interfaces, but these suffer from poor performance and lack extensibility. In this paper, we propose bpfCP, a process checkpointing scheme that dumps in-kernel state via eBPF programs, which improves performance and extensibility. Our preliminary evaluation shows that bpfCP can achieve significant performance improvements in dumping multiple types of in-kernel state of processes.</blockquote>
<br />
</div>

<div class="paper">
<h4>Automatic Synthesis of Abstract Operators for eBPF</h4>
<span class="authors">Harishankar Vishwanathan, Matan Shachnai, Srinivas Narayana, Santosh Nagarakatte (Rutgers University)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748361" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1BbMrdjV7h4XZ_clO_FqqG6OPUC9No9EA" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">This paper proposes an approach to automatically synthesize sound and precise abstract operators for the static analyzer in the eBPF verifier. The eBPF verifier ensures that only safe user-defined programs are loaded into the kernel. An unsound operator can lead to unsafe programs being accepted, while an imprecise operator can cause safe programs to be rejected. Our approach starts by generating candidate operators using input-output examples tailored for the eBPF verifier's abstract operators and iteratively refines it for soundness and precision. Using this approach, we have generated more precise variants of existing operators. Our approach also generates numerous sound and unsound operators that can serve as test suites for existing eBPF verification and fuzzing frameworks.</blockquote>
<br />
</div>

<div class="paper">
<h4>Pairwise BPF Programs Should Be Optimized Together</h4>
<span class="authors">Milo Craun, Dan Williams (Virginia Tech)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748362" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1Li0O9MDSuuvRF3NFezzvtNU-2HSXgJYB" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">BPF programs are extensively used for tracing and observability in production systems where performance overheads matter. Many individual BPF programs do not incur serious performance degrading overhead on their own, but increasingly more than a single BPF program is used to understand production system performance. BPF deployments have begun to look more like distributed applications; however, this is a mismatch with the underlying Linux kernel, potentially leading to high overhead cost. In particular, we identify that many BPF programs follow a pattern based on pairwise program deployment where entry and exit probes will be attached to measure a single quantity. We find that the pairwise BPF program pattern results in unnecessary overheads. We identify three optimizations—BPF program inlining, context aware optimization, and intermediate state internalization—that apply to pairwise BPF programs. We show that applying these optimizations to an example pairwise BPF program can reduce overhead on random read throughput from 28.13% to 8.98% and on random write throughput from 26.97% to 8.60%. We then examine some key design questions that arise when seeking to integrate optimizations with the existing BPF system.</blockquote>
<br />
</div>

<div class="paper">
<h4>Kernel Extension DSLs Should Be Verifier-Safe!</h4>
<span class="authors">Franco Solleza, Justus Adam, Akshay Narayan, Malte Schwarzkopf (Brown University), Andrew Crotty (Northwestern University), Nesime Tatbul (Intel Labs and MIT)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748368" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1zmL2l_oSqL73kmACr25FP5AByQx4b3b2" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">eBPF allows developers to write safe operating system extensions, but writing these extensions remains challenging because it requires detailed knowledge of both the extension's domain and eBPF's programming interface. Most importantly, the extension must pass the eBPF verifier. This paper argues that DSLs for extensions should guarantee verifier-safety: valid DSL programs should result in eBPF code that always passes the verifier. This avoids complex debugging and the need for extension developers to be eBPF experts. We show that three existing DSLs for different domains are compatible with verifier-safety. Beyond verifier-safety, practical extension DSLs must also achieve good performance. Inspired by database query optimization, we sketch an approach to creating DSL-specific optimizers capable of maintaining verifier-safety. A preliminary evaluation shows that optimizing verifier-safe extension performance is feasible.</blockquote>
<br />
</div>

<div class="paper">
<h4>Offloading the Tedious Task of Writing eBPF Programs</h4>
<span class="authors">Xiangyu Gao, Xiangfeng Zhu (University of Washington), Bhavana Vannarth Shobhana (Rutgers University), Yiwei Yang (UC Santa Cruz), Arvind Krishnamurthy, Ratul Mahajan (University of Washington)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748369" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1CfV8BK53ZGBR9fdyONYQHDbP_2cET4cZ" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">eBPF offers a lightweight method to extend the Linux kernel without modifying the source code in existing modules. However, writing correct and efficient eBPF programs is hard due to its unique verifier constraints and cumbersome debugging processes specific to the kernel execution environment. To tackle such an obstacle, we present a system, SimpleBPF, aiming at offloading the tedious eBPF development task. Developers only need to express their intent in a high-level domain-specific language, while the underlying eBPF code generation is handled automatically. SimpleBPF integrates four key components: a concise DSL, an LLM-based generator, a semantic checker, and an LLM-based optimizer. We use few-shot prompting to build both the code generator and optimizer in SimpleBPF, and evaluate the system on programs written in a representative DSL. The preliminary evaluation result shows that SimpleBPF can generate valid eBPF programs that pass the kernel verifier and exhibit competitive runtime performance. We also outline future directions based on current findings.</blockquote>
<br />
</div>

<div class="paper">
<h4>Empowering machine-learning assisted kernel decisions with eBPF^ML</h4>
<span class="authors">Prabhpreet Singh Sodhi, Georgios Liargkovas, Kostis Kaffes (Columbia University)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748363" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1vQYckf00_sgAMjM__yner1tDs93E_6Y1" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">Machine-learning (ML) techniques can optimize core operating system paths—scheduling, I/O, power, and memory—yet practical deployments remain rare. Existing prototypes either (i) bake simple heuristics directly into the kernel or (ii) off-load inference to user space to exploit discrete accelerators, both of which incur unacceptable engineering or latency cost. We argue that eBPF, the Linux kernel's safe, hot-swappable byte-code runtime, is the missing substrate for moderately complex in-kernel ML. We present eBPFML, a design that (1) extends the eBPF instruction set with matrix-multiply helpers, (2) leverages upcoming CPU matrix engines such as Intel Advanced Matrix Extensions (AMX) through the eBPF JIT, and (3) retains verifier guarantees and CO-RE portability.</blockquote>
<br />
</div>

<div class="paper">
<h4>eInfer: Unlocking Fine-Grained Tracing for Distributed LLM Inference with eBPF</h4>
<span class="authors">Kexin Chu, Jianchang Su, Yifan Zhang (University of Connecticut), Chenxingyu Zhao (University of Washington), Yiwei Yang, Yusheng Zheng (UC Santa Cruz), Shengkai Lin, Shizhen Zhao (Shanghai Jiao Tong University), Wei Zhang (University of Connecticut)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748372" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1TOEShd7LJjyk0uLxwhw0FNX_xKxwuo1c" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">Modern large language model (LLM) inference workloads run on complex, heterogeneous distributed systems spanning CPUs, GPUs, multi-GPU setups, and network interconnects. Existing profiling tools either incur prohibitive overhead, provide limited visibility, or suffer from vendor lock-in, making real-time, fine-grained performance analysis impractical in production environments. We present eInfer, the first eBPF-based system enabling transparent, low-overhead end-to-end tracing of per-request performance across distributed LLM inference pipelines without requiring application modifications. eInfer uniquely correlates events across CPUs, accelerators, processes, and nodes, delivering unified, vendor-agnostic observability that approaches the accuracy of specialized GPU profiling tools. To address the challenges of scalability dynamic workloads, and instrumentation gaps on accelerators, we design a runtime-adaptive tracing mechanism that maintains comprehensive visibility in real time. Our initial evaluation demonstrates that eInfer delivers precise, low-overhead profiling, enabling critical insights to optimize LLM serving performance in production environments.</blockquote>
<br />
</div>

<div class="paper">
<h4>InXpect: Lightweight XDP Profiling</h4>
<span class="authors">Vladimiro Paschali, Andrea Monterubbiano, Francesco Fazzari (University of Rome "La Sapienza"), Michael Swift (University of Wisconsin—Madison), Salvatore Pontarelli (University of Rome "La Sapienza")</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748367" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1qsBSaMQPW3L3xBSgJfYqcLGKGNXjLoqf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">The eBPF eXpress Data Path (XDP) allows high-speed packet processing applications. Achieving high throughput requires careful design and profiling of XDP applications. However, existing profiling tools lack eBPF support. We introduce InXpect, a lightweight monitoring framework that profiles eBPF programs with fine granularity and minimal overhead, making it suitable for XDP-based in-production systems. We demonstrate how InXpect outperforms existing tools in profiling overhead and capabilities. InXpect is the first XDP/eBPF profiling system that provides real-time statistics streaming, enabling immediate detection of changes in program behavior.</blockquote>
<br />
</div>

<div class="paper">
<h4>BPFflow - Preventing information leaks from eBPF</h4>
<span class="authors">Chinecherem Dimobi, Rahul Tiwari, Zhengjie Ji, Dan Williams (Virginia Tech)</span><br />
<a href="https://dl.acm.org/doi/pdf/10.1145/3748355.3748374" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://drive.google.com/file/d/1B9tlw9ShgaWBGurTtayx37zkpHr-I676" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display:none;">eBPF has seen major industry adoption by enterprises to enhance observability, tracing, and monitoring by hooking at different points in the kernel. However, since the kernel is a critical resource, eBPF can also pose as a threat if misused, potentially leading to privilege escalation, information leaks and more. While effective to some extent, existing mitigation strategies like interface filtering are coarse-grained and often over-restrictive. We propose BPFflow, a flexible framework for the system administrator to define policies that specify sensitive data sources, trusted sinks and permitted flows between them. These policies are enforced by an Information Flow Control (IFC) system within the eBPF verifier to track the propagation of sensitive data to prevent unauthorized leakage to userspace or any other untrusted sinks without any runtime overhead.</blockquote>
<br />
</div>

</div>
<p><br /></p>

<h3 id="ebpf24-second-edition">eBPF’24: Second Edition</h3>

<div class="papers">

<div class="paper">
<h4>An Empirical Study on Challenges of eBPF Application Development</h4>
<span class="authors">Mugdha Deokar, Jingyang Men, Lucas Castanheira, Ayush Bhardwaj, Theophilus A. Benson</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673429" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-deokar.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">eBPF has become a crucial tool for the development of specialized and customized network functions, observability frameworks, and security tools. To support these growing use cases, the eBPF ecosystem (i.e., tool chains, set of language primitives, and kernel interfaces) has evolved at an extremely fast pace. Despite its rapid evolution, as a community, we understand very little about the challenges faced by developers in designing eBPF programs or the issues that hamper operators in managing them. This study aims to shed light on these challenges by analyzing eBPF issues on Stack Overflow along several eBPF-specific dimensions. We make several interesting observations that call attention to under-explored areas of the eBPF ecosystem, as well as highlight interesting research directions.</blockquote>
<br />
</div>

<div class="paper">
<h4>Understanding Performance of eBPF Maps</h4>
<span class="authors">Chang Liu, Byungchul Tak, Long Wang</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673430" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">The Linux community has witnessed the rapid development of eBPF technology that allows users to load custom programs into the Linux kernel to extend its capabilities. A key feature that makes eBPF powerful is eBPF maps, which provide data storage and communication capabilities for eBPF programs. However, despite being widely used in eBPF programs, the performance of eBPF maps has received little attention. To understand the performance characteristics of eBPF maps, we conduct a comprehensive benchmark on them. The benchmark results demonstrate the access overhead of different types of eBPF maps and reveal the impact of various factors on the access overhead. By analyzing the benchmark results, we derive some implications for eBPF users to use eBPF maps more efficiently.</blockquote>
<br />
</div>

<div class="paper">
<h4>Kgent: Kernel Extensions Large Language Model Agent</h4>
<span class="authors">Yusheng Zheng, Yiwei Yang, Maolin Chen, Andrew Quinn</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673434" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-zheng.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">The extended Berkeley Packet Filters (eBPF) ecosystem allows for the extension of Linux and Windows kernels, but writing eBPF programs is challenging due to the required knowledge of OS internals and programming limitations enforced by the eBPF verifier. These limitations ensure that only expert kernel developers can extend their kernels, making it difficult for junior sys admins, patch makers, and DevOps personnel to maintain extensions. This paper presents Kgent, an alternative framework that alleviates the difficulty of writing an eBPF program by allowing Kernel Extensions to be written in Natural language. Kgent uses recent advances in large language models (LLMs) to synthesize an eBPF program given a user's English language prompt. To ensure that LLM's output is semantically equivalent to the user's prompt, Kgent employs a combination of LLM-empowered program comprehension, symbolic execution, and a series of feedback loops. Kgent's key novelty is the combination of these techniques. In particular, the system uses symbolic execution in a novel structure that allows it to combine the results of program synthesis and program comprehension and build on the recent success that LLMs have shown for each of these tasks individually.
To evaluate Kgent, we develop a new corpus of natural language prompts for eBPF programs. We show that Kgent produces correct eBPF programs on 80%—which is an improvement of a factor of 2.67 compared to GPT-4 program synthesis baseline. Moreover, we find that Kgent very rarely synthesizes "false positive" eBPF programs—i.e., eBPF programs that Kgent verifies as correct but manual inspection reveals to be semantically incorrect for the input prompt. The code for Kgent is publicly accessible at https://github.com/eunomia-bpf/KEN.</blockquote>
<br />
</div>

<div class="paper">
<h4>Eliminating eBPF Tracing Overhead on Untraced Processes</h4>
<span class="authors">Milo Craun, Khizar Hussain, Uddhav Gautam, Zhengjie Ji, Tanuj Rao, Dan Williams</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673431" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-craun.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Current eBPF-based kernel extensions affect entire systems, and are coarse-grained. For some use cases, like tracing, operators are more interested in tracing a subset of processes (e.g., belonging to a container) rather than all processes. While overhead from tracing is expected for targeted processes, we find untraced processes—those that are not the target of tracing—also incur performance overhead. To better understand this overhead, we identify and explore three techniques for per-process filtering for eBPF: post-eBPF, in-eBPF, and pre-eBPF filtering, finding that all three approaches result in excessive overhead on untraced processes. Finally, we propose a system that allows for zero-untraced-overhead per-process eBPF tracing by modifying kernel virtual memory mappings to present per-process kernel views, effectively enabling untraced processes to execute on the kernel as if no eBPF programs are attached.</blockquote>
<br />
</div>

<div class="paper">
<h4>Honey for the Ice Bear - Dynamic eBPF in P4</h4>
<span class="authors">Manuel Simon, Henning Stubbe, Sebastian Gallenmüller, Georg Carle</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673436" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-simon.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Software updates typically require system reboots, leading to service downtimes. We aim to solve this problem for network components allowing updates while avoiding service degradation. In this paper, we explore the integration of eBPF into the P4 pipeline for efficient packet processing. This way, we combine the flexibility and dynamic adaptability of eBPF with the efficiency of P4. The integration enhances the power of applications and enables the network operator to provide customizable data paths as a service. Our solution allows updating the data path at runtime and without downtime. We implement the approach for the P4 target T4P4S, discuss different performance models, and share implementation insights. The evaluation focuses on the overhead in terms of throughput and the costs of code updates expressed in the latency of the related packets. We show that eBPF execution is possible with reasonable costs, promising dynamic network functions within P4.</blockquote>
<br />
</div>

<div class="paper">
<h4>Towards Functional Verification of eBPF Programs</h4>
<span class="authors">Dana Lu, Boxuan Tang, Michael Paper, Marios Kogias</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673435" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-lu.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">eBPF is being used to implement increasingly critical pieces of system logic. eBPF's verifier raises the cost of adoption of the technology, as making programs pass the verifier can be very effortful. We observe that the guarantees provided by the verifier have only been used for the narrow objective of verifying these programs' safety, despite them also enabling the automatic verification of program functional correctness. We envision a framework allowing developers to easily specify and automatically verify their eBPF programs with very little extra cost compared to simply passing the verifier.
We showcase our implementation of DRACO, built on top of KLEE. DRACO allows developers to fully or partially specify eBPF programs, add verification-time assert statements, and reason about multiple eBPF programs interacting with each other and userspace, all at minimal additional cost to the developers. We use DRACO to either fully or partially verify the correctness of several real-world or experimental XDP programs.</blockquote>
<br />
</div>

<div class="paper">
<h4>Unsafe Kernel Extension Composition via BPF Program Nesting</h4>
<span class="authors">Siddharth Chintamaneni, Sai Roop Somaraju, Dan Williams</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673440" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-chintamaneni.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">BPF programs provide the ability to extend the kernel while ensuring safety. The safety guarantees are provided by the in-kernel verifier. However, the verification guarantees may not hold when multiple BPF programs interact with each other through helper functions. This is because, while verifying a BPF program, the verifier considers each BPF program as an individual unit rather than part of a composite system. One aspect affected by this unsafe composition is the kernel stack. In this paper, we highlight how different possible nesting scenarios can affect the safety of the kernel stack. To address this problem, we propose a helper-rooted callgraph-based approach, which enables the verifier to have a global view of the system. By using the callgraph and maximum stack depth information during verification, the verifier will either accept or reject a program by considering all the possible nesting scenarios, which ensures runtime stack safety.</blockquote>
<br />
</div>

<div class="paper">
<h4>µBPF: Using eBPF for Microcontroller Compartmentalization</h4>
<span class="authors">Szymon Kubica, Marios Kogias</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673433" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-kubica.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Although eBPF (Extended Berkeley Packet Filter) started as a virtualization technology used in the Linux kernel to allow for executing user code inside the kernel in a safe way, it is a general purpose software fault isolation technology. The specification of eBPF instruction set is, also, suitable for using it as a VM for low-end network-enabled embedded devices to achieve software isolation, compartmentalization and allow for updating deployed firmware over-the-air. Existing solutions for running eBPF programs on microcontrollers use bytecode interpreters which incurs execution time and code size overhead compared to native code execution. Additionally, they don't support data relocations which limits the space of programs that can be executed. We implement μBPF - an eBPF virtual machine and a JIT compiler targeting ARMv7-eM architecture. μBPF is compatible with embedded operating systems capable of supporting SUIT firmware update protocol. We implement a secure program deployment pipeline for RIOT - an operating system commonly used in embedded IoT applications. Our evaluation shows that μBPF JIT achieves close-to-native performance and up to of 50% code size reduction compared to the eBPF binaries.</blockquote>
<br />
</div>

<div class="paper">
<h4>BOAD: Optimizing Distributed Communication with In-Kernel Broadcast and Aggregation</h4>
<span class="authors">Jianchang Su, Yifan Zhang, Linpu Huang, Wei Zhang</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673438" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-su.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Efficient communication is crucial for the performance of big data and distributed computing systems. Two key communication patterns in these systems are broadcasting, which involves sending data from one to multiple nodes, and aggregation, which combines data from multiple nodes into a single result. Traditional methods using socket-based communication often suffer from significant latency due to frequent user-kernel crossing and network stack processing, limiting the scalability and efficiency of these systems.
To address this issue, we propose BOAD, a new system designed to enhance distributed communication by leveraging eBPF (extended Berkeley Packet Filter) and kernel hooks such as XDP (eX-press Data Path) and TC (Traffic Control). By offloading broadcasting and aggregation tasks to the kernel space, BOAD minimizes the overhead caused by user-kernel crossings and network stack traversals. This innovative approach streamlines data transmission and aggregation, bypassing conventional network layers and substantially reducing communication latency. Our evaluations demonstrate that BOAD significantly enhances the efficiency and scalability of distributed systems, achieving up to 84.5% reduction in broadcast latency compared to baseline implementations.</blockquote>
<br />
</div>

<div class="paper">
<h4>hyDNS: Acceleration of DNS Through Kernel Space Resolution</h4>
<span class="authors">Joshua Bardinelli, Yifan Zhang, Jianchang Su, Linpu Huang, Aidan Parilla, Rachel Jarvi, Sameer G. Kulkarni, Wei Zhang</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673439" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">The Domain Name System (DNS) is a core component of Internet infrastructure, mapping domain names to IP addresses. The recursive resolver plays a critical role in this process, requiring high performance due to multiple request-response exchanges. However, its performance is hindered by costly message copying, user-kernel space transitions, and kernel stack traversal. Kernel bypass techniques can mitigate these issues but often result in resource waste or deployment challenges.
To overcome these limitations, We present hyDNS, a hybrid recursive resolver that combines eBPF offloading in the kernel with a user-space resolver. The DNS kernel cache allows most requests to be served before reaching the kernel network stack. To manage limited DMA memory, excess requests are passed to user space once a threshold is reached, enabling the system to handle high query loads. hyDNS uses programmable NICs to create a scalable kernel cache, implementing a lockless per-core eBPF hash map. Filters on the NIC direct requests to each core. Preliminary results show significant performance improvements with eBPF offloading, achieving up to 4.4× the throughput and a 65% reduction in latency compared to user space implementations.</blockquote>
<br />
</div>

<div class="paper">
<h4>Unlocking Path Awareness for Legacy Applications through SCION-IP Translation in eBPF</h4>
<span class="authors">Lars-Christian Schulz, Florian Gallrein, David Hausheer</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673437" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-schulz.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Path-aware networking (PAN) is a novel network paradigm enabling hosts to control network path selection. PAN has been realized on Internet-scale by the SCION routing protocol. Despite the increasing adoption of SCION by ISPs, only few applications offer native SCION support. The SCION-IP Gateway (SIG) tunnels legacy IP traffic over SCION, but does not allow for interoperability with native applications. To unlock PAN for legacy IP applications while maintaining compatibility with native SCION, we introduce SCION-IP translation. We present a network stack component that uses IPv6 sockets for path aware SCION communication and implement a prototype in eBPF. The translator offers more than five times the throughput of the open-source SIG for UDP and achieves 75% of native single-threaded IP application performance.</blockquote>
<br />
</div>

<div class="paper">
<h4>Custom Page Fault Handling With eBPF</h4>
<span class="authors">Tal Zussman, Teng Jiang, Asaf Cidon</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3672197.3673432" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="/assets/ebpf24-slides/ebpf24_slides-zussman.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Traditionally, page faults have been handled by the kernel, with a fixed set of handling routines for different types of faults. However, some applications may benefit from custom page fault handling routines, allowing them to implement advanced functionality, such as more efficient live virtual machine migration and application checkpointing. To this end, Linux introduced the userfaultfd() syscall, which allows applications to handle their page faults in userspace. While userfaultfd() has proven useful in several applications, we identify some key scalability limitations in its design, which limit both performance and adoption. We propose a system that allows using eBPF programs to handle page faults in-kernel, yielding a simpler and more scalable implementation while also enabling novel use cases, such as accelerating the startup of large position-independent executables like browsers.</blockquote>
<br />
</div>

</div>
<p><br /></p>

<h3 id="ebpf23-first-edition">eBPF’23: First Edition</h3>

<div class="papers">
<div class="paper">
<h4>TCP's Third Eye: Leveraging eBPF for Telemetry-Powered Congestion Control</h4>
<span class="authors">Jörn-Thorben Hinz, Vamsi Addanki (TU Berlin), Csaba Györgyi (University of Vienna), Theo Jepsen (Intel), Stefan Schmid (TU Berlin)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609295" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/2-TCP-CC.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">For years, congestion control algorithms have been navigating in the dark, blind to the actual state of the network. They were limited to the course-grained signals that are visible from the OS kernel, which are measured locally (e.g., RTT) or hints of imminent congestion (e.g., packet loss and ECN). As applications and OSs are becoming ever more distributed, it is only natural that the kernel have visibility beyond the host, into the network fabric. Network switches already collect telemetry, but it has been impractical to export it for the end-host to react.
Although some telemetry-based solutions have been proposed, they require changes to the end-host, like custom hardware or new protocols and network stacks. We address the challenges of efficiency and protocol compatibility, showing that it is possible and practical to run telemetry-based congestion control algorithms in the kernel. We designed a framework that uses eBPF to run CCAs that can execute different control laws by selecting different types of telemetry. It can be deployed in brownfield environments, without requiring all switches be telemetry-enabled, or kernel recompilation at the end-hosts. When our eBPF program is deployed on hosts without hardware or OS changes, TCP incast workloads experience less queuing (thus lower latency), faster convergence and better fairness.</blockquote>
<br />
</div>

<div class="paper">
<h4>On Augmenting TCP/IP Stack via eBPF</h4>
<span class="authors">Sepehr Abbasi Zadeh (University of Toronto, Huawei Technologies Canada Co. Ltd), Ali Munir, Mahmoud Mohamed Bahnasy, Shiva Ketabi (Huawei Technologies Canada Co. Ltd), Yashar Ganjali (University of Toronto, Huawei Technologies Canada Co. Ltd)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609300" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/3-TCP-IP.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">As the data center networks' bandwidth-delay product is increasing and the applications are moving to nano services (with many small flows), managing flows in the network is becoming more challenging. Current TCP/IP stack faces fundamental limitations to meet these challenges. First, it lacks the ability to accurately estimate the network state under dynamic network settings. Second, the current stack is not flexible enough to be extended easily. In this work, we propose a framework, Augmenter, that augments (i.e., increases the network visibility of) the TCP/IP stack to address these challenges. Leveraging eBPF, Augmenter gathers the state of ongoing flows and uses this information to manage other flows that are currently active or arriving in the future. We present one specific use case of setting the initial congestion window of flows dynamically based on network conditions. Our initial tests, show that Augmenter can improve the application performance by up to 1.4x compared to the fixed initial window-based solutions. Implementing Augmenter in the TCP/IP stack itself is not trivial. Augmenter employs eBPF to implement its desired functionality as it enables introducing such changes relatively easy. We discuss potential challenges and solutions in designing and implementing Augmenter applications.</blockquote>
<br />
</div>

<div class="paper">
<h4>Schooling NOOBs with eBPF</h4>
<span class="authors">Joel Sommers (Colgate University), Nolan Rudolph, Ramakrishnan Durairajan (University of Oregon)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609302" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/4-NOOBs.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">While networks have evolved in profound ways, the tools to measure them from end hosts have not kept pace. State-of-the-art tools are ill-suited for elucidating observed network performance impairments and path dynamics, and are susceptible to operational policies of the network. Consequently, the semantic gap between the application-view of network performance vs. actual conditions has resulted in network oblivious (NOOB) systems and applications.
To address this NOOB problem, we examine the Extended Berkeley Packet Filter (eBPF) as a new way to improve the practice of gathering fine-grained network telemetry from the edge. More specifically, by leveraging the safe and efficient in-kernel programming mechanism of eBPF, we design a high-performance telemetry framework called nooBpf with two tools—namely noobprobe and noobflow—to quantify the actual network performance from end hosts and offer unprecedented insights into the flow-level performance, including in-network queuing and routing-induced delays. We illustrate the potential of these two tools to address the NOOB problem through a variety of experiments. The results of our experiments strongly suggest eBPF as a promising foundation for high-performance telemetry and for addressing the NOOB problem.</blockquote>
<br />
</div>

<div class="paper">
<h4>Supercharge WebRTC: Accelerate TURN Services with eBPF/XDP</h4>
<span class="authors">Tamás Lévai (Budapest University of Technology and Economics, L7mp Technologies), Balázs Edvárd Kreith (Riverside.fm), Gábor Rétvári (Budapest University of Technology and Economics, L7mp Technologies)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609296" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/5-TURN.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Real-time communication (RTC) services, from videoconferencing to cloud gaming and remote rendering, are everywhere. WebRTC, an enabler technology for these applications, traditionally relies on a comprehensive NAT traversal protocol suite, most importantly, TURN, to interconnect clients and media servers behind NATs and firewalls. With the demise of residential public IP addresses, these massive-scale TURN services have become an indispensable component of WebRTC applications. Traditionally implemented as multi-protocol user-space packet relays, TURN servers are notoriously resource hungry. In this paper, we propose an eBPF/XDP offload engine to improve TURN server performance. We design a reusable eBPF/XDP TURN offload architecture, create a prototype on top of pion/turn, a popular WebRTC framework written in Go, and show on a fully functional WebRTC testbed that our offload significantly improves throughput and, more importantly, delay, by 2-3x compared to the state-of-the-art.</blockquote>
<br />
</div>

<div class="paper">
<h4>HEELS: A Host-Enabled eBPF-Based Load Balancing Scheme</h4>
<span class="authors">Rui Yang (EPFL), Marios Kogias (Imperial College London &amp; Azure Research)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609307" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/6-HEELS.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Layer 4 (L4) load balancing is crucial in cloud computing and elastic microservices. Existing L4 load balancer designs can be split into two main categories: centralized designs using a hardware or software middlebox, and decentralized designs in which every node can play the role of the load balancer. Centralized designs offer better scheduling policies and easier worker node management, but suffer from I/O and CPU limitations. Decentralized designs scale better, but are harder to manage. We introduce HEELS, a novel load balancing scheme designed for internal cloud workloads and microservices, achieving the best of both worlds. HEELS uses the load balancer only during the connection establishment and allows clients and servers to communicate directly after that. Supporting general L4 load balancers and requiring no kernel changes, HEELS is readily deployable on the public cloud. We implement HEELS as a set of eBPF programs split across the client and server. Our evaluation shows that HEELS introduces minimal overheads, works with off-the-shelf load balancers (e.g., Katran by Meta), and significantly reduces the costs of cloud load balancers.</blockquote>
<br />
</div>

<div class="paper">
<h4>eXpress Data Path Extensions for High-Capacity 5G User Plane Functions</h4>
<span class="authors">Christian Scheich, Marius Corici, Hauke Buhr, Thomas Magedanz (Fraunhofer FOKUS Institute)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609298" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/7-5G.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">In 5th Generation mobile networks, a dedicated User Plane Function (UPF) is responsible for connecting users in the Access Networks with the destination networks. In this work, we extend the UPF with eXpress Data Path enhancements to speed up the forwarding of user plane traffic in the GPRS Tunneling Protocol (GTP-U). Also, we develop a Receive Side Scaling method in XDP based on GTP-U header information to distribute incoming uplink traffic to the available CPUs.</blockquote>
<br />
</div>

<div class="paper">
<h4>PRAVEGA: Scaling Private 5G RAN via eBPF/XDP</h4>
<span class="authors">Udhaya Kumar Dayalan, Ziyan Wu, Gaurav Gautam, Feng Tian, Zhi-Li Zhang (University of Minnesota – Twin Cities, USA)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609303" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/8-PRAVEGA.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">We exploit eBPF+XDP to scale and accelerate software packet processing in (O-RAN compliant) disaggregated 5G RAN (Radio Access Network). We argue that the Central Unit User Plane (CU-UP) component is likely the bottleneck in the 5G RAN user plane data path and therefore focuses on optimizing its performance. We propose an eBPF/XDP-based framework, PRAVEGA, and discuss additional options for further improvements.</blockquote>
<br />
</div>

<div class="paper">
<h4>Seeing the Invisible: Auditing eBPF Programs in Hypervisor with HyperBee</h4>
<span class="authors">Yutian Wang, Dan Li (Tsinghua University), Li Chen (Zhongguancun Laboratory)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609305" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/10-HyperBee.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">The flexibility of eBPF makes it widely used in performance, security, and monitoring. However, this flexibility is a double-edged sword, allowing attackers to use eBPF for malicious purposes. Security researchers have discovered multiple backdoors built by eBPF. Detecting malicious eBPF programs is challenging since eBPF exploits are almost invisible to inspection in both the user and kernel space. To defend against malicious eBPF programs, auditing an operating system's eBPF programs externally at load time is a more efficient approach. We propose HyperBee, a system integrated into the hypervisor that enables auditing of eBPF programs loaded in guests without performance impact during the execution. Guests relinquish their ability to load eBPF programs and must complete verification and JIT compilation of their eBPF programs through HyperBee. We implement a prototype of HyperBee on KVM and the HyperBee-aware guest based on Linux and evaluate its performance when loading eBPF programs. Our results show that HyperBee only brings overhead at load time: 9% extra load time when there is no security policy and 17% extra load time when using security policies against known eBPF malicious programs.</blockquote>
<br />
</div>

<div class="paper">
<h4>Comparing Security in eBPF and WebAssembly</h4>
<span class="authors">Jules Dejaeghere (University of Namur), Bolaji Gbadamosi, Tobias Pulls (Karlstad University), Florentin Rochet (University of Namur)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609306" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/11-Security.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">This paper examines the security of eBPF and WebAssembly (Wasm), two technologies that have gained widespread adoption in recent years, despite being designed for very different use cases and environments. While eBPF is a technology primarily used within operating system kernels such as Linux, Wasm is a binary instruction format designed for a stack-based virtual machine with use cases extending beyond the web. Recognizing the growth and expanding ambitions of eBPF, Wasm may provide instructive insights, given its design around securely executing arbitrary untrusted programs in complex and hostile environments such as web browsers and clouds. We analyze the security goals, community evolution, memory models, and execution models of both technologies, and conduct a comparative security assessment, exploring memory safety, control flow integrity, API access, and side-channels. Our results show that eBPF has a history of focusing on performance first and security second, while Wasm puts more emphasis on security at the cost of some runtime overheads. Considering language-based restrictions for eBPF and a security model for API access are fruitful directions for future work.</blockquote>
<br />
</div>

<div class="paper">
<h4>Enabling BPF Runtime policies for better BPF management</h4>
<span class="authors">Raj Sahu, Dan Williams (Virginia Tech)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609297" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/12-Runtime.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">As eBPF increasingly and rapidly gains popularity for observability, performance, troubleshooting, and security in production environments, a problem is emerging around how to manage the multitude of BPF programs installed into the kernel. Operators of distributed systems are already beginning to use BPF-orchestration frameworks with which they can set load and access policies for who can load BPF programs and access their resultant data. However, other than a guarantee of eventual termination, operators currently have little to no visibility into the runtime characteristics of BPF programs and thus cannot set policies that ensure their systems still meet crucial performance targets when instrumented with BPF programs. In this paper, we propose that having a runtime estimate will enable better policies that will govern the allowed latency in critical paths. Our key insight is to leverage the existing architecture within the verifier to statically track the runtime cost of all possible branches. Along with dynamically determined runtime estimates for helper functions and knowledge of loop-based helpers' effects on control flow, we generate an accurate—although broad—range estimate for making runtime policy decisions. We further discuss some of the limitations of this approach, particularly in the case of broad estimate ranges as well as complementary tools for BPF runtime management.</blockquote>
<br />
</div>

<div class="paper">
<h4>Enabling eBPF on Embedded Systems Through Decoupled Verification</h4>
<span class="authors">Milo Craun, Adam Oswald, Dan Williams (Virginia Tech)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609299" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/13-Embedded.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">eBPF (Extended Berkeley Packet Filter) is a Linux kernel subsystem that aims to allow developers to write safe and efficient kernel extensions by employing an in-kernel verifier and just-in-time compiler (JIT). We find that verification is prohibitively expensive for resource-constrained embedded systems. To solve this we describe a system that allows for verification to occur outside of the embedded kernel and before BPF program load time. The in-kernel verifier and JIT are coupled so they must be decoupled together. A designated verifier kernel accepts a BPF program, then verifies, compiles, and signs a native precompiled executable. The executable can then be loaded onto an embedded device without needing the verifier and JIT on the embedded device. Decoupling verification and JIT from load-time opens the door to much more than running BPF programs on embedded devices. It allows larger and more expressive BPF programs to be verified, provides a way for new approaches to verification to be used without extensive kernel modification and creates the possibility for BPF program verification as a service.</blockquote>
<br />
</div>

<div class="paper">
<h4>Network Profiles for Detecting Application-Characteristic Behavior Using Linux eBPF</h4>
<span class="authors">Lars Wüstrich, Markus Schacherbauer, Markus Budeus, Dominik Freiherr von Künßberg, Sebastian Gallenmüller (Technical University of Munich), Marc-Oliver Pahl (IMT Atlantique), Georg Carle (Technical University of Munich)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609294" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/14-NetProfiles.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Applications often show unique communication behavior. Knowledge about this behavior is beneficial in various use cases, such as anomaly or dependency detection. In this paper, we present network profiles that characterize typical application behavior. This requires a reliable and accurate association of processes and applications, which is challenging. We, therefore, introduce an eBPF-based matcher for this task that enables the creation of network profiles. In our evaluation we show that eBPF allows us to efficiently collect the relevant data to build application profiles, addressing issues of other data collection approaches. We further evaluate our work by using a network profile to identify emulated botnet activity masqueraded as a benign process.</blockquote>
<br />
</div>

<div class="paper">
<h4>RingGuard: Guard io_uring with eBPF</h4>
<span class="authors">Wanning He (Southern University of Science and Technology), Hongyi Lu (Southern University of Science and Technology (SUSTech)/Hong Kong University of Science and Technology (HKUST)), Fengwei Zhang (Southern University of Science and Technology (SUSTech)), Shuai Wang (HKUST)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609304" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/15-RingGuard.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">io_uring offers a flexible yet efficient asynchronous I/O paradigm for Linux. Despite a significant performance improvement, it also brings many security concerns to the kernel. Not only does io_uring itself contain multiple vulnerabilities, but it can also be used to bypass existing security mechanisms such as seccomp. To address these problems, this paper proposes a security mechanism named RingGuard that safeguards io_uring with eBPF programs. RingGuard is carefully designed to reduce the overhead of I/O request submission and to ensure the security of inserted eBPF programs. Our evaluation shows that RingGuard provides encouraging security benefits with moderate overhead. For instance, the overhead of RingGuard in file I/O scenarios is merely 7.8%.</blockquote>
<br />
</div>

<div class="paper">
<h4>Unleashing Unprivileged eBPF Potential with Dynamic Sandboxing</h4>
<span class="authors">Soo Yee Lim (University of British Columbia), Xueyuan Han (Wake Forest University), Thomas Pasquier (University of British Columbia) </span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609301" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2023/files/workshop-ebpf/16-Sandboxing.pdf" title="Link to presentation slides"><i class="fa-solid fa-file-pdf"></i>&nbsp;Slides</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">For safety reasons, unprivileged users today have only limited ways to customize the kernel through the extended Berkeley Packet Filter (eBPF). This is unfortunate, especially since the eBPF framework itself has seen an increase in scope over the years. We propose SandBPF, a software-based kernel isolation technique that dynamically sandboxes eBPF programs to allow unprivileged users to safely extend the kernel, unleashing eBPF's full potential. Our early proof-of-concept shows that SandBPF can effectively prevent exploits missed by eBPF's native safety mechanism (i.e., static verification) while incurring 0%-10% overhead on web server benchmarks.</blockquote>
<br />
</div>

<div class="paper">
<h4>Practical and Flexible Kernel CFI Enforcement using eBPF</h4>
<span class="authors">Jinghao Jia, Michael V. Le, Salman Ahmed (IBM Research), Dan Williams (Virginia Tech), Hani Jamjoom (IBM Research)</span><br />
<a href="https://dl.acm.org/doi/epdf/10.1145/3609021.3609293" title="Link to paper"><i class="fa-solid fa-file-lines"></i>&nbsp;Paper</a>&nbsp;
<a href="" class="abstract-dropdown" title="Show abstract"><i class="fa-solid fa-square-caret-down"></i>&nbsp;Abstract</a>&nbsp;
<br />
<blockquote style="display: none;">Enforcing control flow integrity (CFI) in the kernel (kCFI) can prevent control-flow hijack attacks. Unfortunately, current kCFI approaches have high overhead or are inflexible and cannot support complex context-sensitive policies. To overcome these limitations, we propose a kCFI approach that makes use of eBPF (eKCFI) as the enforcement mechanism. The focus of this work is to demonstrate through implementation optimizations how to overcome the enormous performance overhead of this approach, thereby enabling the potential benefits with only modest performance tradeoffs.</blockquote>
<br />
</div>
</div>]]></content><author><name></name></author><category term="ebpf" /><summary type="html"><![CDATA[This post presents the material (papers and presentation slides) from the three editions of the eBPF workshop at ACM SIGCOMM, from 2023 to 2025.]]></summary></entry><entry><title type="html">eBPF Research Papers</title><link href="https://pchaigno.github.io/bpf/2025/01/07/research-papers-bpf.html" rel="alternate" type="text/html" title="eBPF Research Papers" /><published>2025-01-07T08:22:10+00:00</published><updated>2026-05-13T10:31:00+00:00</updated><id>https://pchaigno.github.io/bpf/2025/01/07/research-papers-bpf</id><content type="html" xml:base="https://pchaigno.github.io/bpf/2025/01/07/research-papers-bpf.html"><![CDATA[<p>When I started reading on BPF there weren’t many academic papers to describe how it worked, how it didn’t, or how it is used.
There are many blog posts and informal articles out there, but it’s harder to find self-contained papers with references to older, sometimes unsuspected, related works.
They have become more frequent though, so I wanted to draw up a list with one-sentence summaries for anyone looking for related works or otherwise interested.</p>

<p>I expect this list to only grow with time.
If I want to keep things manageable, I need a way to select papers.
Except I’d rather not be the one having to decide which papers are the “best papers”<sup id="fnref:cant-escape-choice" role="doc-noteref"><a href="#fn:cant-escape-choice" class="footnote" rel="footnote">1</a></sup>.
So I opted to follow the selection from CSRankings: I will only list papers from <a href="https://csrankings.org/">conference selected by CSRankings by default</a>.
CSRankings tends to put the bar fairly high, but I think there is at least consensus on the top conferences they selected.</p>

<p>I’ve sorted papers according to their <em>type</em> of contribution and the field or <em>area</em> they focus on.
For example, papers improving either the JIT compilers or verifier of eBPF will have <code class="language-plaintext highlighter-rouge">improving</code>, <code class="language-plaintext highlighter-rouge">jit</code>, and <code class="language-plaintext highlighter-rouge">verifier</code>
<!--  -->
(<a href="?sel=all-types_foundation_using_analysis_all-areas_offload_networking_storage_security_tracing_misc_" class="selectors-link">see those papers</a>).
<!--  --></p>

<p>If you notice any bug in the selectors, missing papers, or other opportunity for improvement, as usual, don’t hesitate to reach out via one of the contacts at the bottom of the page.</p>

<p><br /></p>

<script>
// Updates the display and state of buttons from a URL.
function applyURL() {
	var urlParams = new URLSearchParams(window.location.search);
	var sel = urlParams.get("sel");

	if (sel == null)
		return;

	$('.filter-button').each(function() {
		var id = this.id.replace('-switch', '');
		var isChecked = this.checked;
		var shouldBeChecked = sel.indexOf(id + '_') == -1;
		if (isChecked == shouldBeChecked)
			return;

		$(this).prop('checked', shouldBeChecked);
		toggleDisplay(this);
	});

	updateNbPapers();
}

// Updates the display based on a selector.
function toggleDisplay(selector) {
	var filterClass = $(selector).data('filter');
	var checked = $(selector).prop('checked');
	$('.' + filterClass).each(function() {
		if (typeof $(this).data('refcnt') === 'undefined') {
			// Type selector
			$(this).toggle(checked);
		} else {
			// Areas selector
			if (checked) {
				$(this).data('refcnt', $(this).data('refcnt') + 1);
				$(this).show();
			} else {
				$(this).data('refcnt', $(this).data('refcnt') - 1);
				if ($(this).data('refcnt') == 0) {
					$(this).hide();
				}
			}
		}
	});
}

function updateNbPapers() {
	$('#nb-papers').text($('.papers h4:visible').length);
}

// Updates the URL based on the state of buttons.
function updateURL() {
	var sel = '';
	$('.filter-button:not(:checked)').each(function() {
		sel += this.id.replace('-switch', '') + '_';
	});
	updateQueryString("sel", sel);
}

function updateQueryString(key, value) {
    var urlParams = new URLSearchParams(window.location.search);
    urlParams.set(key, value);
    var newUrl = `${window.location.pathname}?${urlParams.toString()}`;
    history.pushState({}, '', newUrl);
}

jqueryReady(function() {
	applyURL();

	$('.filter-button').click(function() {
		toggleDisplay(this);
		updateNbPapers();
		updateURL();
	});

	$('#all-types-switch').click(function() {
		var checked = this.checked;
		$('.type-button').each(function () {
			// Already in the expected state.
			if (this.checked == checked)
				return;
			$(this).prop('checked', checked);
			toggleDisplay(this);
		});
		updateNbPapers();
		updateURL();
	});
	$('#all-areas-switch').click(function() {
		var checked = this.checked;
		$('.area-button').each(function () {
			// Already in the expected state.
			if (this.checked == checked)
				return;
			$(this).prop('checked', checked);
			toggleDisplay(this);
		});
		updateNbPapers();
		updateURL();
	});

	$(".selectors-link").click(function() {
		// Need to remove "?sel=".
		var sel = $(this).attr('href').substr(5);
		updateQueryString("sel", sel);
		applyURL();
		event.preventDefault();
	})
})
</script>

<p><b>Type selector</b></p>
<div class="form-switch">
	<input type="checkbox" checked="" id="all-types-switch" class="filter-button" />
	<label class="slider" for="all-types-switch"></label>
	<label for="all-types-switch" title="All types">all</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="foundation-switch" class="filter-button type-button" data-filter="foundation" />
	<label class="slider" for="foundation-switch"></label>
	<label for="foundation-switch" title="Papers on the existing, upstream design of eBPF.">foundation</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="using-switch" class="filter-button type-button" data-filter="using" />
	<label class="slider" for="using-switch"></label>
	<label for="using-switch" title="Papers using eBPF as a key part of solving the research problem they focus on. Includes cases where the performance wouldn't be viable without BPF.">using</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="analysis-switch" class="filter-button type-button" data-filter="analysis" />
	<label class="slider" for="analysis-switch"></label>
	<label for="analysis-switch" title="Papers analyzing eBPF.">analysis</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="improving-switch" class="filter-button type-button" data-filter="improving" />
	<label class="slider" for="improving-switch"></label>
	<label for="improving-switch" title="Papers proposing improvements to the current design of eBPF or to the related tooling (ex., fuzzing tools).">improving</label>
</div>

<p><b>Areas selector</b></p>
<div class="form-switch">
	<input type="checkbox" checked="" id="all-areas-switch" class="filter-button" />
	<label class="slider" for="all-areas-switch"></label>
	<label for="all-areas-switch" title="All areas">all</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="verifier-switch" class="filter-button area-button" data-filter="verifier" />
	<label class="slider" for="verifier-switch"></label>
	<label for="verifier-switch" title="Papers about the verification of the eBPF bytecode.">verifier</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="offload-switch" class="filter-button area-button" data-filter="offload" />
	<label class="slider" for="offload-switch"></label>
	<label for="offload-switch" title="Papers offloading application logic to eBPF in the kernel.">offload</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="networking-switch" class="filter-button area-button" data-filter="networking" />
	<label class="slider" for="networking-switch"></label>
	<label for="networking-switch" title="Networking papers.">networking</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="storage-switch" class="filter-button area-button" data-filter="storage" />
	<label class="slider" for="storage-switch"></label>
	<label for="storage-switch" title="Storage papers (file systems, storage device drivers, IO stack in kernels, etc.).">storage</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="security-switch" class="filter-button area-button" data-filter="security" />
	<label class="slider" for="security-switch"></label>
	<label for="security-switch" title="Papers studying or defending against various attacks. Does not include all papers about the verifier; select 'verifier' if you want to see those as well.">security</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="tracing-switch" class="filter-button area-button" data-filter="tracing" />
	<label class="slider" for="tracing-switch"></label>
	<label for="tracing-switch" title="Tracing papers.">tracing</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="jit-switch" class="filter-button area-button" data-filter="jit" />
	<label class="slider" for="jit-switch"></label>
	<label for="jit-switch" title="Papers about the eBPF JIT compilers.">jit</label>
</div>
<div class="form-switch">
	<input type="checkbox" checked="" id="misc-switch" class="filter-button area-button" data-filter="misc" />
	<label class="slider" for="misc-switch"></label>
	<label for="misc-switch" title="Papers in other fields (virtualization, operating systems, etc.).">misc</label>
</div>

<p><br /></p>

<p><b>Selected <span id="nb-papers">82</span> papers.</b>
<br /><br /></p>

<div class="papers">

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper0">
<h4>Virtualizing eBPF with Late-Binding</h4>
<span class="venue">OSDI'26</span>&nbsp;
<a href="https://www.usenix.org/conference/osdi26/presentation/zhang-jing" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">J. Zhang, X. Song, D. Du, Y. Xia, B. Zang, H. Chen</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
N/A<br />
</div>
</div>
</div>

<div class="improving">
<div class="misc" data-refcnt="1">
<div id="paper1">
<h4>PeeR: First-Class Scheduling for Latency Critical eBPF Applications</h4>
<span class="venue">OSDI'26</span>&nbsp;
<a href="https://www.usenix.org/conference/osdi26/presentation/carin" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">J. Carin, B. Holmes, W. Wang, A. Bhardwaj, M. Ghobadi</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Makes BPF programs schedulable and preemptable using sched_ext and cooperative preemption at helper calls.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper2">
<h4>BeeQoS: A Cloud-Native QoS System for Adaptive and Scalable Multi-Priority Bandwidth Guarantees</h4>
<span class="venue">WWW'26</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3774904.3792487" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">J. Liu, S. Wu, H. Ma, C. Li, H. Yu, D. Jia, F. Li, P. Hu</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Implements a QoS system with multi-priority bandwidth guarantees using eBPF for traffic shaping and flow sampling.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper3">
<h4>Xkernel: Rethinking Performance Tunability of Operating System Kernels</h4>
<span class="venue">OSDI'26</span>&nbsp;
<a href="https://arxiv.org/pdf/2512.12530" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">Z. Chen, W. Zhang, Y. Tang, R. Shu, F. Ren, T. Xu, J. Liu</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Leverages eBPF kprobes to tune arbitrary performance constants in the kernel at runtime.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper4">
<h4>HybridMesh: A Hardware-software Hybrid Approach for Accelerating Service Mesh Ingress</h4>
<span class="venue">NSDI'26</span>&nbsp;
<a href="https://www.usenix.org/system/files/nsdi26-you.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">M. You, J. Nam, M. Seo, T. Park, S. Shin</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Relies on BPF to help improve performance of their service mesh, by redirecting traffic with tc-bpf and sk_msg, and to handle in-packet metadata.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper5">
<h4>Remote TCP Connection Offload and Applications</h4>
<span class="venue">NSDI'26</span>&nbsp;
<a href="https://micchie.net/files/xo-nsdi26.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">S. Li, S. W. D. Chien, T. Gao, M. Honda</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Relies on tc-bpf to implement flow steering while the tc-flower hardware offload is configured.<br />
</div>
</div>
</div>

<div class="improving">
<div class="security" data-refcnt="1">
<div id="paper6">
<h4>KRAKENGUARD: Towards Fine-Grained eBPF Isolation</h4>
<span class="venue">NSDI'26</span>&nbsp;
<a href="https://www.usenix.org/system/files/nsdi26-patel.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">J. Patel, L. G. Buhl-Nielsen, A. Ghosn, M. Kogias</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Implements a userspace policy enforcement engine for BPF programs that handles cross-program interactions.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking misc" data-refcnt="2">
<div id="paper7">
<h4>Tux: Efficient Drop-in Networking for Database Systems</h4>
<span class="venue">VLDB'25</span>&nbsp;
<a href="https://www.vldb.org/pvldb/vol19/p334-zhou.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">X. Zhou, V. Leis, X. Yu, M. Stonebraker</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Leverages AF_XDP to build a high-performance networking stack for database systems.<br />
</div>
</div>
</div>

<div class="improving">
<div class="misc storage" data-refcnt="2">
<div id="paper8">
<h4>BPF-DB: A Kernel-Embedded Transactional Database Management System For eBPF Applications</h4>
<span class="venue">SIGMOD'25</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3725272" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">M. Butrovich, S. Arch, W. S. Lim, W. Zhang, J. M. Patel, A. Pavlo</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;<code class="language-plaintext highlighter-rouge">storage</code>&nbsp;
<br />
Introduces an in-kernel database management system (DBMS) in eBPF, with ACID properties.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper9">
<h4>Approximation Enforced Execution of Untrusted Linux Kernel Extensions</h4>
<span class="venue">Sec'25</span>&nbsp;
<a href="https://www.usenix.org/system/files/usenixsecurity25-sun-hao.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">H. Sun, Z. Su</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Relies on state approximations from the Linux verifier to enforce runtime checks on BPF programs.<br />
</div>
</div>
</div>

<div class="using">
<div class="security misc" data-refcnt="2">
<div id="paper10">
<h4>Deprivileging Low-Level GPU Drivers Efficiently with User-Space Processes and CHERI Compartments</h4>
<span class="venue">CCS'25</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3719027.3765036" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">P. Metzger, A. T. Markettos, E. T. Napierała, M. Naylor, R. N. M. Watson, T. M. Jones</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Implements in-kernel interrupt handlers on a uBPF VM, for userspace GPU drivers.<br />
</div>
</div>
</div>

<div class="using">
<div class="security" data-refcnt="1">
<div id="paper11">
<h4>Rethinking Tamper-Evident Logging: A High-Performance, Co-Designed Auditing System</h4>
<span class="venue">CCS'25</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3719027.3765024" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">R. Zhao, M. Shoaib, V. T. Hoang, W. U. Hassan</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Designs a tamper-evident auditing system all in eBPF, including the authentication code computation<br />
</div>
</div>
</div>

<div class="using">
<div class="security tracing misc" data-refcnt="3">
<div id="paper12">
<h4>Intent-aware Fuzzing for Android Hardened Application</h4>
<span class="venue">CCS'25</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3719027.3744858" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">S Jeong, M. Choi, H. Cho, S. Choi, H. Kim, Y. Jeon</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;<code class="language-plaintext highlighter-rouge">tracing</code>&nbsp;<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
While fuzzing Android applications, relies on eBPF to trigger scheduled application behaviors and collect coverage.<br />
</div>
</div>
</div>

<div class="using">
<div class="storage misc" data-refcnt="2">
<div id="paper13">
<h4>Aeolia: A Fast and Secure Userspace Interrupt-Based Storage Stack</h4>
<span class="venue">SOSP'25</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3731569.3764816" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">C. Li, R. Yi, Z. Zhang, J. Liu, C. Min, J. Zhang, Y. Luo, X. Wang, Z. Wang, D. Zhou</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">storage</code>&nbsp;<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Leverages sched_ext to bridge the scheduling semantic gap between userspace and the kernel when using user interrupts.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper14">
<h4>Prove It to the Kernel: Precise Extension Analysis via Proof-Guided Abstraction Refinement</h4>
<span class="venue">SOSP'25</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3731569.3764796" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">H. Sun, Z. Su</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Improves the Linux verifier's precision by refining the abstract state in userspace when needed, and producing a formal proof for it.<br />
</div>
</div>
</div>

<div class="analysis">
<div class="security verifier" data-refcnt="2">
<div id="paper15">
<h4>SoK: Challenges and Paths Toward Memory Safety for eBPF</h4>
<span class="venue">S&amp;P'25</span>&nbsp;
<a href="https://nebelwelt.net/files/25Oakland.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">K. Huang, J. Sampson, M. Payer, G. Tan, Z. Qian, T. Jaeger</span><br />
<code class="language-plaintext highlighter-rouge">analysis</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Surveys and evaluates existing work on eBPF memory safety.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc security" data-refcnt="2">
<div id="paper16">
<h4>SwiftSweeper: Defeating Use-After-Free Bugs Using Memory Sweeper Without Stop-the-World</h4>
<span class="venue">S&amp;P'25</span>&nbsp;
<a href="https://drive.junhoahn.kr/index.php/s/rSitKKfcz79PX5t" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">J. Ahn, K. Lee, C. Park, H. Moon, Y. Kwon</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Designs a memory-sweeping allocator to prevent use-after-free bugs, with a BPF-based custom page fault handler to improve performance.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper17">
<h4>eBPF Misbehavior Detection: Fuzzing with a Specification-Based Oracle</h4>
<span class="venue">SOSP'25</span>&nbsp;
<a href="https://cs.uwaterloo.ca/~m285xu/assets/publication/ebpf_smt_fuzz-paper.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">T. Lyu, K. K. Dwivedi, T. Bourgeat, M. Payer, M. Xu, S. Kashyap</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Introduces a specification-based oracle to fuzz the BPF verifier.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper18">
<h4>FlexGuard: Fast Mutual Exclusion Independent of Subscription</h4>
<span class="venue">SOSP'25</span>&nbsp;
<a href="https://hal.science/hal-05241781v1/document" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">V. Laforet, S. Kashyap, C. Iorgulescu, J. Lawall, J.-P. Lozi</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Improves lock handover time by using eBPF to detect critical section preemptions.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper19">
<h4>cache_ext: Customizing the Page Cache with eBPF</h4>
<span class="venue">SOSP'25</span>&nbsp;
<a href="https://arxiv.org/pdf/2502.02750" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">T. Zussman, I. Zarkadas, J. Carin, A. Cheng, H. Franke, J. Pfefferle, A. Cidon</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Extends the kernel with new BPF hooks, kfuncs, and per-cgroup struct_ops to be able to customize the page cache policies.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper22">
<h4>Hermes: Enhancing Layer-7 Cloud Load Balancers with Userspace-Directed I/O Event Notification</h4>
<span class="venue">SIGCOMM'25</span>&nbsp;
<a href="https://ng-95.github.io/files/Hermes_SIGCOMM25.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=J22-2PSf04M" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">T. Pan, E. Song, Y. Zuo, S. Zhang, Y. Song, J. Zhao, W. Hou, J. Lu, X. Sun, S. Zhang, Y. Yang, J. Zhang, T. Huang, B. Lyu, X. Li, R. Wen, Z. Zong, S. Zhu</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Improves the performance of their L7 load balancer by customizing the kernel's connection dispatch using sk_reuseport BPF programs.<br />
</div>
</div>
</div>

<div class="using">
<div class="tracing misc" data-refcnt="2">
<div id="paper23">
<h4>Extending Applications Safely and Efficiently</h4>
<span class="venue">OSDI'25</span>&nbsp;
<a href="https://www.usenix.org/system/files/osdi25-zheng-yusheng.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=vhvFW_e_4I8" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Y. Zheng, T. Yu, Y. Yang, Y. Hu, X. Lai, D. Williams, A. Quinn</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">tracing</code>&nbsp;<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Proposes to implement safe userspace extension mechanisms and observability tools using the bpftime userspace eBPF VM for better efficiency.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper24">
<h4>Rex: Closing the language-verifier gap with safe and usable kernel extensions</h4>
<span class="venue">ATC'25</span>&nbsp;
<a href="https://www.usenix.org/system/files/atc25-jia.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=phJ-fb5lEA8" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">J. Jia, R. Qin, M. Craun, E. Lukiyanov, A. Bansal, M. Phan, M. V. Le, H. Franke, H. Jamjoom, T. Xu, D. Williams</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Reduces false positives by replacing the verifier by a language-based safety approach, with the Rust compiler and runtime checks.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper25">
<h4>Accelerating Nested Virtualization with HyperTurtle</h4>
<span class="venue">ATC'25</span>&nbsp;
<a href="https://www.usenix.org/system/files/atc25-zur.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=78g1bG3BOFs" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">O. B. Zur, J. Krebs, S. A. Bergman, M. Silberstein</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Improves nested virtualization performance by offloading logic from the guest hypervisor to the host hypervisor using eBPF.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper26">
<h4>PageFlex: Flexible and Efficient User-space Delegation of Linux Paging Policies with eBPF</h4>
<span class="venue">ATC'25</span>&nbsp;
<a href="https://www.usenix.org/system/files/atc25-yelam.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=ZyT1cB_8trw" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">A. Yelam, K. Wu, Z. Guo, S. Yang, R. Shashidhara, W. Xu, S. Novakovic, A. C. Snoeren, K. Keeton</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Extends the kernel, including with new writable tracepoints, to allow users to customize paging policies.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper27">
<h4>VEP: A Two-stage Verification Toolchain for Full eBPF Programmability</h4>
<span class="venue">NSDI'25</span>&nbsp;
<a href="https://www.usenix.org/system/files/nsdi25-wu-xiwei.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=5xA6BIEWjeM" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">X. Wu, Y. Feng, T. Huang, X. Lu, S. Lin, L. Xie, S. Zhao, Q. Cao</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Implements a proof-carrying code process for the verification of eBPF using annotations on the C source code.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper28">
<h4>eTran: Extensible Kernel Transport with eBPF</h4>
<span class="venue">NSDI'25</span>&nbsp;
<a href="https://www.usenix.org/system/files/nsdi25-chen-zhongjie.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=zb1lNTPxlrc" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Z. Chen, Q. Meng, C. Lao, Y. Liu, F. Ren, M. Yu, Y. Zhou</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Designs a framework based on AF_XDP and new networking hooks to allow users to implement custom transport protocols on top of Linux.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper29">
<h4>State-Compute Replication: Parallelizing High-Speed Stateful Packet Processing</h4>
<span class="venue">NSDI'25</span>&nbsp;
<a href="https://www.usenix.org/system/files/nsdi25-xu-qiongwen.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=FgSEmeqM0s8" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Q. Xu, S. Miano, X. Gao, T. Wang, A. Murugadass, S. Zhang, A. Sivaraman, G. Antichi, S. Narayana</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Proposes a method to scale the processing of single-flow traffic on multiple cores using XDP.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking security" data-refcnt="2">
<div id="paper30">
<h4>P4Control: Line-Rate Cross-Host Attack Prevention via In-Network Information Flow Control Enabled by Programmable Switches and eBPF</h4>
<span class="venue">S&amp;P'24</span>&nbsp;
<a href="https://people.cs.vt.edu/penggao/papers/p4control-sp24.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=5Q4pIojkLy4" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">O. Bajaber, B. Ji, P. Gao</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Designs a network defense system that tracks lateral movement across hosts and processes using P4 and eBPF hooks.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc security" data-refcnt="2">
<div id="paper31">
<h4>BUDAlloc: Defeating Use-After-Free Bugs by Decoupling Virtual Address Management from Kernel</h4>
<span class="venue">Sec'24</span>&nbsp;
<a href="https://www.usenix.org/system/files/usenixsecurity24-ahn.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=Y21_QmmIcI0" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">J. Ahn, J. Lee, K. Lee, W. Gwak, M. Hwang, Y. Kwon</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Designs a one-time allocator to detect use-after-free bugs, with a BPF-based custom page fault handler to improve performance.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier security" data-refcnt="2">
<div id="paper32">
<h4>BeeBox: Hardening BPF Against Transient Execution Attacks</h4>
<span class="venue">Sec'24</span>&nbsp;
<a href="https://cs.brown.edu/~vpk/papers/beebox.sec24.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=rZ47dOX2snI" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">D. Jin, A. J. Gaidis, V. P. Kemerlis</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Combines the verifier's static analysis with SFI-like runtime checks and memory copies to mitigate transient execution attacks.<br />
</div>
</div>
</div>

<div class="using">
<div class="security" data-refcnt="1">
<div id="paper33">
<h4>eAudit: A Fast, Scalable and Deployable Audit Data Collection System</h4>
<span class="venue">S&amp;P'24</span>&nbsp;
<a href="https://www.seclab.cs.sunysb.edu/seclab/pubs/eaudit.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=nxBhIxIRI9w" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">R. Sekar, H. Kimm, R. Aich</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Implements a faster auditing system using eBPF at tracepoints.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier jit" data-refcnt="2">
<div id="paper34">
<h4>Toss a Fault to BpfChecker: Revealing Implementation Flaws for eBPF runtimes with Differential Fuzzing</h4>
<span class="venue">CCS'24</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3658644.3690237" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">C. Peng, M. Jiang, L. Wu, Y. Zhou</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;<code class="language-plaintext highlighter-rouge">jit</code>&nbsp;
<br />
Designs a fuzzer for userspace eBPF runtimes, including Windows's, using differential fuzzing, verifier logs, and an intermediate representation of the eBPF bytecode.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper35">
<h4>NetEdit: An Orchestration Platform for eBPF Network Functions at Scale</h4>
<span class="venue">SIGCOMM'24</span>&nbsp;
<a href="https://cs.stanford.edu/~keithw/sigcomm2024/sigcomm24-final159-acmpaginated.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=y-SFpf8hnYU" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">T. A. Benson, P. Kannan, P. Gupta, B. Madhavan, K. S. Arora, J. Meng, M. Lau, A. Dhamija, R. Krishnamurthy, S. Sundaresan, N. Spring, Y. Zhang</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Describes an orchestration system for eBPF programs designed to tune the network stack of Meta's services.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper36">
<h4>Merlin: Multi-tier Optimization of eBPF Code for Performance and Compactness</h4>
<span class="venue">ASPLOS'24</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3620666.3651387" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://youtu.be/YHvfHZaHkwY?t=1903" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">J. Mao, H. Ding, J. Zhai, S. Ma</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Proposes new compiler optimization tailored to the eBPF bytecode.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking offload" data-refcnt="2">
<div id="paper37">
<h4>DINT: Fast In-Kernel Distributed Transactions with eBPF</h4>
<span class="venue">NSDI'24</span>&nbsp;
<a href="https://www.usenix.org/system/files/nsdi24-zhou-yang.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=EgdXrjwBdBA" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Y. Zhou, X. Xiang, M. Kiley, S. Dharanipragada, M. Yu</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;<code class="language-plaintext highlighter-rouge">offload</code>&nbsp;
<br />
Designs a new distributed transaction system that offloads common operations to tc and XDP.<br />
</div>
</div>
</div>

<div class="using">
<div class="security misc" data-refcnt="2">
<div id="paper38">
<h4>BlueSWAT: A Lightweight State-Aware Security Framework for Bluetooth Low Energy</h4>
<span class="venue">CCS'24</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3658644.3670397" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">X. Che, Y. He, X. Feng, K. Sun, K. Xu, Q. Li</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Proposes to use a userspace eBPF VM to facilitate the distribution of security patches to Bluetooth Low Energy (BLE) devices, to mitigate session-based attacks.<br />
</div>
</div>
</div>

<div class="using">
<div class="security misc" data-refcnt="2">
<div id="paper39">
<h4>SeaK: Rethinking the Design of a Secure Allocator for OS Kernel</h4>
<span class="venue">Sec'24</span>&nbsp;
<a href="https://www.usenix.org/system/files/sec24fall-prepub-1504-wang-zicheng.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=8P1IbL8L4oY" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Z. Wang, Y. Guang, Y. Chen, Z. Lin, M. Le, D. K Le, D. Williams, X. Xing, Z. Gu, H. Jamjoom</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Builds a secure allocator for the kernel, to separate security-sensitive objects, using new BPF helpers.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper41">
<h4>Rethinking Process Management for Interactive Mobile Systems</h4>
<span class="venue">MobiCom'24</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3636534.3649357" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">J. Zheng, Z. Li, F. Qian, W. Liu, H. Lin, Y. Liu, T. Xu, N. Zhang, J. Wang, C. Zhang</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Leverages eBPF to measure the usage of hardware resources by Android applications and investigate slow UI responsiveness problems.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper42">
<h4>MegaTE: Extending WAN Traffic Engineering to Millions of Endpoints in Virtualized Cloud</h4>
<span class="venue">SIGCOMM'24</span>&nbsp;
<a href="https://cs.stanford.edu/~keithw/sigcomm2024/sigcomm24-final406-acmpaginated.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=rVhATFzqJ4Q" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">C. Miao, Z. Zhong, Y. Xiao, F. Yang, S. Zhang, Y. Jiang, Z. Bai, C. Lu, J. Geng, Z. He, Y. Wang, X. Zou, C. Yang</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Relies on eBPF to identify traffic sources and enforce traffic engineering via segment routing across the WAN.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper43">
<h4>FetchBPF: Customizable Prefetching Policies in Linux with eBPF</h4>
<span class="venue">ATC'24</span>&nbsp;
<a href="https://www.usenix.org/system/files/atc24-cao.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=0lo-7dDVFB0" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">X. Cao, S. Patel, S. Y. Lim, X. Han, T. Pasquier</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Extends the kernel with new BPF hooks and helpers to be able to customize memory prefetching policies.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper44">
<h4>Validating the eBPF Verifier via State Embedding</h4>
<span class="venue">OSDI'24</span>&nbsp;
<a href="https://www.usenix.org/system/files/osdi24-sun-hao.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=sASFr_9uZ7Y" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">H. Sun, Z. Su</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Devises a test oracle to fuzz the eBPF verifier by embedding correctness checks in the BPF program.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier security" data-refcnt="2">
<div id="paper45">
<h4>Hive: A Hardware-assisted Isolated Execution Environment for eBPF on AArch64</h4>
<span class="venue">Sec'24</span>&nbsp;
<a href="https://www.usenix.org/system/files/usenixsecurity24-zhang-peihua.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=8jhFEv_wK0A" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">P. Zhang, C. Wu, X. Meng, Y. Zhang, M. Peng, S. Zhang, B. Hu, M. Xie, Y. Lai, Y. Kang, Z. Wang</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Proposes to replace the static analysis of the verifier with a hardware-based runtime isolation for ARM64.<br />
</div>
</div>
</div>

<div class="improving">
<div class="jit" data-refcnt="1">
<div id="paper46">
<h4>End-to-End Mechanized Proof of a JIT-Accelerated eBPF Virtual Machine for IoT</h4>
<span class="venue">CAV'24</span>&nbsp;
<a href="https://www.irisa.fr/prive/talpin/papers/cav24.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">S. Yuan, F. Besson, J.-P. Talpin</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">jit</code>&nbsp;
<br />
Correctness proof for the eBPF JIT compiler used in the micro-controller RIOT kernel.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper47">
<h4>Fast, Flexible, and Practical Kernel Extensions</h4>
<span class="venue">SOSP'24</span>&nbsp;
<a href="https://rs3lab.github.io/assets/papers/2024/dwivedi:kflex.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">K. K. Dwivedi, R. Iyer, S. Kashyap</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Extends the Linux verifier with limited runtime checks and in a backward compatible way, significantly improving eBPF's expressibility.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier security" data-refcnt="2">
<div id="paper49">
<h4>MOAT: Towards Safe BPF Kernel Extension</h4>
<span class="venue">Sec'24</span>&nbsp;
<a href="https://www.usenix.org/system/files/usenixsecurity24-lu-hongyi.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=__2WUqcTJjg" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">H. Lu, S. Wang, Y. Wu, W. He, F. Zhang</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Hardens eBPF in Linux by leveraging Intel MPK and adding runtime checks for helpers.<br />
</div>
</div>
</div>

<div class="analysis">
<div class="security" data-refcnt="1">
<div id="paper51">
<h4>Cross Container Attacks: The Bewildered eBPF on Clouds</h4>
<span class="venue">Sec'23</span>&nbsp;
<a href="https://www.usenix.org/system/files/usenixsecurity23-he.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=E7E8Y20q9ZU" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Y. He, R. Guo, Y. Xing, X. Che, K. Sun, Z. Liu, K. Xu, Q. Li</span><br />
<code class="language-plaintext highlighter-rouge">analysis</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Highlights that eBPF tracing programs can be used to escape container boundaries and the impact on cloud and online coding platforms.<br />
</div>
</div>
</div>

<div class="using">
<div class="storage" data-refcnt="1">
<div id="paper52">
<h4>λ-IO: A Unified IO Stack for Computational Storage</h4>
<span class="venue">FAST'23</span>&nbsp;
<a href="https://www.usenix.org/system/files/fast23-yang-zhe.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=xYENKqthG40" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Z. Yang, Y. Lu, X. Liao, Y. Chen, J. Li, S. He, J. Shu</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">storage</code>&nbsp;
<br />
Modifies eBPF to implement a unified IO stack spanning the kernel and storage devices, in the context of in-storage computing.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper53">
<h4>eHDL: Turning eBPF/XDP Programs into Hardware Designs for the NIC</h4>
<span class="venue">ASPLOS'23</span>&nbsp;
<a href="https://pontarelli.di.uniroma1.it/publication/asplos23/asplos23.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=35OAuR1NbxA" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">A. Rivitti, R. Bifulco, A. Tulumello, M. Bonola, S. Pontarelli</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Introduces a synthesis tool that generates FPGA pipelines for NICs from unmodified XDP programs.<br />
</div>
</div>
</div>

<div class="improving">
<div class="security misc" data-refcnt="2">
<div id="paper54">
<h4>Fuzz on the Beach: Fuzzing Solana Smart Contracts</h4>
<span class="venue">CCS'23</span>&nbsp;
<a href="https://arxiv.org/pdf/2309.03006" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">S. Smolka, J.-R. Giesen, P. Winkler, O. Draissi, L. Davi, G. Karame, K. Pohl</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Fuzzes Solana smart contracts, including those compiled to eBPF bytecode, by extending Solana's userspace eBPF VM with six bug oracles and coverage feedback.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper55">
<h4>xBGP: Faster Innovation in Routing Protocols</h4>
<span class="venue">NSDI'23</span>&nbsp;
<a href="https://www.usenix.org/system/files/nsdi23-wirtgen.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=d9azXXp-_O4" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">T. Wirtgen, T. Rousseaux, Q. De Coninck, N. Rybowski, R. Bush, L. Vanbever, A. Legay, O. Bonaventure</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Designs an extension mechanism for BGP using a userspace implementation of eBPF and several C verification tools to replace and extend the Linux verifier.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper56">
<h4>Taking 5G RAN Analytics and Control to a New Level</h4>
<span class="venue">MobiCom'23</span>&nbsp;
<a href="https://www.microsoft.com/en-us/research/uploads/prod/2022/12/JanusTechnicalReport.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">X. Foukas, B. Radunovic, M. Balkwill, Z. Lai</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Proposes to extend virtualized Radio Access Network (vRAN) functions using a userspace BPF implementation and the PREVAIL verifier, with a new runtime check to bound the execution time.<br />
</div>
</div>
</div>

<div class="using">
<div class="tracing" data-refcnt="1">
<div id="paper58">
<h4>Network-Centric Distributed Tracing with DeepFlow: Troubleshooting Your Microservices in Zero Code</h4>
<span class="venue">SIGCOMM'23</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3603269.3604823" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=Bz_HTqFYUYA" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">J. Shen H. Zhang, Y. Xiang, X. Shi, X. Li, Y. Shen, Z. Zhang, Y. Wu, X. Yin, J. Wang, M. Xu, Y. Li, J. Yin, J. Song, Z. Li, R. Nie</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">tracing</code>&nbsp;
<br />
Presents a distributed tracing framework for troubleshooting microservices that leverages eBPF for data collection.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking offload" data-refcnt="2">
<div id="paper60">
<h4>Tigger: A Database Proxy That Bounces with User-Bypass</h4>
<span class="venue">VLDB'23</span>&nbsp;
<a href="https://www.vldb.org/pvldb/vol16/p3335-butrovich.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">M. Butrovich, K. Ramanathan, J. Rollinson, W. S. Lim, W. Zhang, J. Sherry, A. Pavlo</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;<code class="language-plaintext highlighter-rouge">offload</code>&nbsp;
<br />
Offloads PostgreSQL connection pooling and mirroring to the kernel using sockmap BPF programs.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking offload" data-refcnt="2">
<div id="paper61">
<h4>Electrode: Accelerating Distributed Protocols with eBPF</h4>
<span class="venue">NSDI'23</span>&nbsp;
<a href="https://www.usenix.org/system/files/nsdi23-zhou.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=E6qcR2Lb1xE" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Y. Zhou, Z. Wang, S. Dharanipragada, M. Yu</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;<code class="language-plaintext highlighter-rouge">offload</code>&nbsp;
<br />
Offloads common Paxos networking operations to tc and XDP to improve performance.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper62">
<h4>Verifying the Verifier: eBPF Range Analysis Verification</h4>
<span class="venue">CAV'23</span>&nbsp;
<a href="https://people.cs.rutgers.edu/~sn349/papers/agni-cav2023.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">H. Vishwanathan, M. Shachnai, S. Narayana, S. Nagarakatte</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Automatically and formally proves the ranges analysis of the Linux verifier.<br />
</div>
</div>
</div>

<div class="analysis">
<div class="security" data-refcnt="1">
<div id="paper63">
<h4>EPF: Evil Packet Filter</h4>
<span class="venue">ATC'23</span>&nbsp;
<a href="https://www.usenix.org/system/files/atc23-jin.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=pkR5kJkFGkc" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">D. Jin, V. Atlidakis, V. P. Kemerlis</span><br />
<code class="language-plaintext highlighter-rouge">analysis</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Presents an approach to bypass various kernel isolation techniques by abusing the cBPF infrastructure.<br />
</div>
</div>
</div>

<div class="using">
<div class="tracing misc" data-refcnt="2">
<div id="paper64">
<h4>Tastes Great! Less Filling! High Performance and Accurate Training Data Collection for Self-Driving Database Management Systems</h4>
<span class="venue">SIGMOD'22</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3514221.3517845" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">M. Butrovich, W. S. Lim, L. Ma, J. Rollinson, W. Zhang, Y. Xia, A. Pavlo</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">tracing</code>&nbsp;<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Implements a BPF-based data collection framework for database management systems (DBMSes).<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper65">
<h4>Domain Specific Run Time Optimization for Software Data Planes</h4>
<span class="venue">ASPLOS'22</span>&nbsp;
<a href="https://sebymiano.github.io/publication/2022-morpheus/2022-morpheus_abstract.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=PDUVUv39CUQ" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">S. Miano, A. Sanaee, F. Risso, G. Rétvári, G. Antichi</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Optimizes datapath binaries, including eBPF bytecodes, based on traffic patterns.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper66">
<h4>End-to-end Mechanized Proof of an eBPF Virtual Machine for Micro-controllers</h4>
<span class="venue">CAV'22</span>&nbsp;
<a href="https://www.irisa.fr/prive/talpin/papers/cav22.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">S. Yuan, F. Besson, J.-P. Talpin, S. Hym, K. Zandberg, E. Baccelli</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Correctness proof for the eBPF interpreter and verifier used in the micro-controller RIOT kernel.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper67">
<h4>Application-Informed Kernel Synchronization Primitives</h4>
<span class="venue">OSDI'22</span>&nbsp;
<a href="https://www.usenix.org/system/files/osdi22-park.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=R25mDERjYWI" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">S. Park, D. Zhou, Y. Qian, I. Calciu, T. Kim, S. Kashyap</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Allows Linux users to customize kernel lock policies using eBPF and according to the applications' needs and hardware characteristics.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper68">
<h4>RapidPatch: Firmware Hotpatching for Real-Time Embedded Devices</h4>
<span class="venue">Sec'22</span>&nbsp;
<a href="https://www.usenix.org/system/files/sec22-he-yi.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=rrGBBDGicSo" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Y. He, Z. Zou, K. Sun, Z. Liu, K. Xu, Q. Wang, C. Shen, Z. Wang, Q. Li</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Implements a hotpatching mechanism for real-time OSes using eBPF, a modified verifier, and additional runtime checks.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper70">
<h4>SPRIGHT: Extracting the Server from Serverless Computing! High-Performance eBPF-Based Event-Driven, Shared-Memory Processing</h4>
<span class="venue">SIGCOMM'22</span>&nbsp;
<a href="https://dl.acm.org/doi/pdf/10.1145/3544216.3544259" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=YkErXPiy9cE" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">S. Qi, L. Monis, Z. Zeng, I.-C. Wang, K. K. Ramakrishnan</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Leverages various eBPF hooks to improve the performance of Knative, a container-based serverless platform.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper71">
<h4>Faster Software Packet Processing on FPGA NICs with eBPF Program Warping</h4>
<span class="venue">ATC'22</span>&nbsp;
<a href="https://www.usenix.org/system/files/atc22-bonola.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=ooWoaay4R-Q" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">M. Bonola, G. Belocchi, A. Tulumello, M. Spaziani Brunella, G. Siracusano, G. Bianchi, R. Bifulco</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Improves the performance of hXDP, an eBPF processor for FPGA NICs, via peephole optimization, thereby replacing series of instructions with optimized hardware implementations.<br />
</div>
</div>
</div>

<div class="using">
<div class="storage offload" data-refcnt="2">
<div id="paper72">
<h4>XRP: In-Kernel Storage Functions with eBPF</h4>
<span class="venue">OSDI'22</span>&nbsp;
<a href="https://www.usenix.org/system/files/osdi22-zhong_1.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=n6_QaWATz2A" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Y. Zhong, H. Li, Y. J. Wu, I. Zarkadas, J. Tao, E. Mesterhazy, M. Makris, J. Yang, A. Tai, R. Stutsman, A. Cidon</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">storage</code>&nbsp;<code class="language-plaintext highlighter-rouge">offload</code>&nbsp;
<br />
Offloads processing to the NVMe drivers using BPF, to reduce kernel overhead in storage applications<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper73">
<h4>Sound, Precise, and Fast Abstract Interpretation with Tristate Numbers</h4>
<span class="venue">CGO'22</span>&nbsp;
<a href="https://people.cs.rutgers.edu/~sn349/papers/cgo-2022.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=RVMMtGJLOVw" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">H. Vishwanathan, M. Shachnai, S. Narayana, S. Nagarakatte</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Formally proves and improves the Linux verifier operations on tristate numbers for the range analysis.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper74">
<h4>Synthesizing Safe and Efficient Kernel Extensions for Packet Processing</h4>
<span class="venue">SIGCOMM'21</span>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2021/files/papers/3452296.3472929.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=8R7rHZw6V2o" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">Q. Xu, M. D. Wong, T. Wagle, S. Narayana, A. Sivaraman</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Proposes a synthesis-based compiler that optimizes eBPF programs while ensuring they still pass the Linux verifier.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking offload" data-refcnt="2">
<div id="paper75">
<h4>BMC: Accelerating Memcached using Safe In-Kernel Caching and Pre-Stack Processing</h4>
<span class="venue">NSDI'21</span>&nbsp;
<a href="https://www.usenix.org/system/files/nsdi21-ghigoff.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=VKHIcTLoryk" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<a href="/ebpf/2021/04/12/bmc-accelerating-memcached-using-bpf-and-xdp.html" title="My summary of the paper"><i class="fa-solid fa-comment"></i>&nbsp;Summary</a>&nbsp;<span class="authors">Y. Ghigoff, J. Sopena, K. Lazri, A. Blin, G. Muller</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;<code class="language-plaintext highlighter-rouge">offload</code>&nbsp;
<br />
Speeds up Memcached with an XDP-based, transparent, first-level cache.<br />
</div>
</div>
</div>

<div class="analysis">
<div class="security" data-refcnt="1">
<div id="paper76">
<h4>An Analysis of Speculative Type Confusion Vulnerabilities in the Wild</h4>
<span class="venue">Sec'21</span>&nbsp;
<a href="https://www.usenix.org/system/files/sec21-kirzner.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=Gxv6LcabKrg" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">O. Kirzner, A. Morrison</span><br />
<code class="language-plaintext highlighter-rouge">analysis</code>
<code class="language-plaintext highlighter-rouge">security</code>&nbsp;
<br />
Describes how eBPF can be leveraged to create speculative type confusion gadgets in the kernel.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper77">
<h4>Syrup: User-Defined Scheduling Across the Stack</h4>
<span class="venue">SOSP'21</span>&nbsp;
<a href="https://www.scs.stanford.edu/~dm/home/papers/kaffes:syrup.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=v21FN6_uhv0" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">K. Kaffes, J. Humphries, D. Mazières, C. Kozyrakis</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Proposes an eBPF-based framework to enable users to write application-specific scheduling policies for threads, network packets, and network connections.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper78">
<h4>Revisiting the Open vSwitch Dataplane Ten Years Later</h4>
<span class="venue">SIGCOMM'21</span>&nbsp;
<a href="https://conferences.sigcomm.org/sigcomm/2021/files/papers/3452296.3472914.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">W. Tu, Y.-H. Wei, G. Antichi, B. Pfaff</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Describes how production experience with Open vSwitch over a decade led to the development of its new AF_XDP-based datapath.<br />
</div>
</div>
</div>

<div class="improving">
<div class="jit" data-refcnt="1">
<div id="paper79">
<h4>Synthesizing JIT Compilers for In-Kernel DSLs</h4>
<span class="venue">CAV'20</span>&nbsp;
<a href="https://unsat.cs.washington.edu/papers/geffen-jitsynth.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">J. Van Geffen, L. Nelson, I. Dillig, X. Wang, E. Torlak</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">jit</code>&nbsp;
<br />
Synthesizes eBPF and cBPF JIT compilers, which are proven to be formally correct, from DSL interpreters.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper80">
<h4>hXDP: Efficient Software Packet Processing on FPGA NICs</h4>
<span class="venue">OSDI'20</span>&nbsp;
<a href="https://www.usenix.org/system/files/osdi20-brunella.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=v8tGmlB2Zks" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<a href="/ebpf/2020/11/04/hxdp-efficient-software-packet-processing-on-fpga-nics.html" title="My summary of the paper"><i class="fa-solid fa-comment"></i>&nbsp;Summary</a>&nbsp;<span class="authors">M. Spaziani Brunella, G. Belocchi, M. Bonola, S. Pontarelli, G. Siracusano, G. Bianchi, A. Cammarano, A. Palumbo, L. Petrucci, R. Bifulco</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Investigates the execution of XDP programs on FPGA NICs by implementing an interpreter.<br />
</div>
</div>
</div>

<div class="improving">
<div class="jit" data-refcnt="1">
<div id="paper81">
<h4>Specification and Verification in the Field: Applying Formal Methods to BPF Just-in-Time Compilers in the Linux Kernel</h4>
<span class="venue">OSDI'20</span>&nbsp;
<a href="https://unsat.cs.washington.edu/papers/nelson-jitterbug.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=2V3ts5-W_9g" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">L. Nelson, J. Van Geffen, E. Torlak, X. Wang</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">jit</code>&nbsp;
<br />
Applies formal verification techniques to the eBPF JIT compilers and implements a new formally-verified JIT compiler for 32-bit RISC-V.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper82">
<h4>Scaling Symbolic Evaluation for Automated Verification of Systems Code with Serval</h4>
<span class="venue">SOSP'19</span>&nbsp;
<a href="https://unsat.cs.washington.edu/papers/nelson-serval.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">L. Nelson, J. Bornholt, R. Gu, A. Baumann, E. Torlak, X. Wang</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Proposes a framework to developing verifiers for system software, including eBPF, by lifting existing interpreters under symbolic execution.<br />
</div>
</div>
</div>

<div class="using">
<div class="storage offload" data-refcnt="2">
<div id="paper83">
<h4>Extension Framework for File Systems in User Space</h4>
<span class="venue">ATC'19</span>&nbsp;
<a href="https://www.usenix.org/system/files/atc19-bijlani.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=QwcF8q9LejU" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">A. Bijlani, U. Ramachandran</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">storage</code>&nbsp;<code class="language-plaintext highlighter-rouge">offload</code>&nbsp;
<br />
Enables eBPF support in the FUSE interface to improve the performance of user-space file systems by offloading operations to the kernel.<br />
</div>
</div>
</div>

<div class="using">
<div class="networking" data-refcnt="1">
<div id="paper84">
<h4>Pluginizing QUIC</h4>
<span class="venue">SIGCOMM'19</span>&nbsp;
<a href="https://dial.uclouvain.be/pr/boreal/object/boreal:218857/datastream/PDF_01/view" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">Q. De Coninck, F. Michel, M. Piraux, F. Rochet, T. Given-Wilson, A. Legay, O. Pereira, O. Bonaventure</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
Designs an extension mechanism for QUIC using a userspace implementation of eBPF with SFI-like runtime checks.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper86">
<h4>Simple and Precise Static Analysis of Untrusted Linux Kernel Extensions</h4>
<span class="venue">PLDI'19</span>&nbsp;
<a href="https://seahorn.github.io/papers/ebpf-pldi19.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=MHqKJaQouu8" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<a href="/ebpf/2023/09/06/prevail-understanding-the-windows-ebpf-verifier.html" title="My summary of the paper"><i class="fa-solid fa-comment"></i>&nbsp;Summary</a>&nbsp;<span class="authors">E. Gershuni, N. Amit, A. Gurfinkel, N. Narodytska, J. A. Navas, N. Rinetzky, L. Ryzhyk, M. Sagiv</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Introduces PREVAIL, an alternative to the Linux eBPF verifier based on abstract interpretation and now used in Windows.<br />
</div>
</div>
</div>

<div class="using">
<div class="misc" data-refcnt="1">
<div id="paper87">
<h4>The Design and Implementation of Hyperupcalls</h4>
<span class="venue">ATC'18</span>&nbsp;
<a href="https://www.usenix.org/system/files/conference/atc18/atc18-amit.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">N. Amit, M. Wei</span><br />
<code class="language-plaintext highlighter-rouge">using</code>
<code class="language-plaintext highlighter-rouge">misc</code>&nbsp;
<br />
Leverages eBPF to bridge the semantic gap of virtualization, by letting hypervisors execute verified code from the guests.<br />
</div>
</div>
</div>

<div class="improving">
<div class="jit" data-refcnt="1">
<div id="paper90">
<h4>Jitk: A Trustworthy In-Kernel Interpreter Infrastructure</h4>
<span class="venue">OSDI'14</span>&nbsp;
<a href="https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-wang_xi.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;
<a href="https://www.youtube.com/watch?v=aOl-skWkzf0" title="Link to conference talk"><i class="fa-brands fa-youtube"></i>&nbsp;Talk</a>&nbsp;
<span class="authors">X. Wang, D. Lazar, N. Zeldovich, A. Chlipala, Z. Tatlock</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">jit</code>&nbsp;
<br />
Proposes a formally-verified infrastructure to compile high-level rules into cBPF bytecode and machine code.<br />
</div>
</div>
</div>

<div class="improving">
<div class="verifier" data-refcnt="1">
<div id="paper92">
<h4>Safe Kernel Extensions Without Run-Time Checking</h4>
<span class="venue">OSDI'96</span>&nbsp;
<a href="http://www.cs.cmu.edu/~15712/papers//necula96.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">G. C. Necula, P. Lee</span><br />
<code class="language-plaintext highlighter-rouge">improving</code>
<code class="language-plaintext highlighter-rouge">verifier</code>&nbsp;
<br />
Proposes kernel extensions in the form of <i>proof-carrying code</i> and compares it to cBPF.<br />
</div>
</div>
</div>

<div class="foundation">
<div class="networking" data-refcnt="1">
<div id="paper93">
<h4>The BSD Packet Filter: A New Architecture for User-level Packet Capture</h4>
<span class="venue">USENIX Winter'93</span>&nbsp;
<a href="https://www.tcpdump.org/papers/bpf-usenix93.pdf" title="Link to PDF"><i class="fa-solid fa-file-pdf"></i>&nbsp;Paper</a>&nbsp;

<span class="authors">S. McCanne, V. Jacobson</span><br />
<code class="language-plaintext highlighter-rouge">foundation</code>
<code class="language-plaintext highlighter-rouge">networking</code>&nbsp;
<br />
The original cBPF paper, describing a register-based packet filter for BSD.<br />
</div>
</div>
</div>

</div>

<p><br /></p>
<hr />

<p><br /></p>

<p>Thanks to Kahina for her reviews and for reporting multiple bugs with the early version of the selectors.</p>

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:cant-escape-choice" role="doc-endnote">
      <p>Of course, I can’t really escape chosing a method to select papers, so it’s not as if this is completely objective either. <a href="#fnref:cant-escape-choice" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="bpf" /><summary type="html"><![CDATA[Interactive list of eBPF research papers from top conferences according to CSRankings. The list can be filtered according to types of publications (ex., improving, using) and areas (ex., networking, verifier, offload, security).]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://pchaigno.github.io/assets/illustration-list-papers.png" /><media:content medium="image" url="https://pchaigno.github.io/assets/illustration-list-papers.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">First Cilium Pull Request</title><link href="https://pchaigno.github.io/cilium/2024/12/05/first-cilium-pull-request.html" rel="alternate" type="text/html" title="First Cilium Pull Request" /><published>2024-12-05T08:26:10+00:00</published><updated>2025-01-07T10:31:00+00:00</updated><id>https://pchaigno.github.io/cilium/2024/12/05/first-cilium-pull-request</id><content type="html" xml:base="https://pchaigno.github.io/cilium/2024/12/05/first-cilium-pull-request.html"><![CDATA[<p>Cilium has a fairly large codebase, with many different features, implemented in two very different languages<sup id="fnref:c-and-go" role="doc-noteref"><a href="#fn:c-and-go" class="footnote" rel="footnote">1</a></sup>, and covered by an extensive CI.
Contributing for the first time can be a daunting task.
Nevertheless, Cilium has received code contributions from more than 800 people to date!</p>

<p>In every release cycle, many people are contributing for the first time.
I’m hoping this post can serve as a getting started guide for them, with some advice also useful to more experienced contributors.</p>

<p>This is not the official contribution guide, but my own advice. Especially when writing about Cilium, it’s worth restating that opinions expressed in this blog are my own and not the project’s or my employer’s.</p>

<p><br /></p>

<!--  -->
<ul id="toc" class="section-nav">
<li class="toc-entry toc-h3"><a href="#finding-what-to-work-on">Finding What to Work On</a></li>
<li class="toc-entry toc-h3"><a href="#larger-contributions">Larger Contributions</a>
<ul>
  <li class="toc-entry toc-h4"><a href="#cilium-feature-proposals">Cilium Feature Proposals</a></li>
  <li class="toc-entry toc-h4"><a href="#splitting-in-pull-requests">Splitting in Pull Requests</a></li>
</ul>
</li>
<li class="toc-entry toc-h3"><a href="#preparing-the-branch">Preparing the Branch</a></li>
<li class="toc-entry toc-h3"><a href="#opening-the-draft-pull-request">Opening the Draft Pull Request</a></li>
<li class="toc-entry toc-h3"><a href="#open-for-reviews">Open for reviews</a></li>
<li class="toc-entry toc-h3"><a href="#review-rounds">Review rounds</a></li>
<li class="toc-entry toc-h3"><a href="#passing-the-ci">Passing the CI</a></li>
<li class="toc-entry toc-h3"><a href="#merging">Merging!</a></li>
<li class="toc-entry toc-h3"><a href="#common-questions">Common Questions</a>
<ul>
  <li class="toc-entry toc-h4"><a href="#what-should-i-do-if-the-needs-rebase-label-is-added">What Should I Do If the `needs-rebase` Label is Added?</a></li>
  <li class="toc-entry toc-h4"><a href="#what-if-i-want-to-fix-something-that-is-broken-in-a-previous-release">What if I Want to Fix Something that is Broken in a Previous Release?</a></li>
  <li class="toc-entry toc-h4"><a href="#how-and-when-to-move-up-the-contributor-ladder">How and When to Move up the Contributor Ladder?</a></li>
</ul>
</li>
<li class="toc-entry toc-h3"><a href="#conclusion">Conclusion</a></li>
</ul>
<!--  -->

<p><br /></p>

<h3 id="finding-what-to-work-on">Finding What to Work On</h3>

<p>When searching for something to contribute, a good place to start is probably the <a href="https://github.com/cilium/cilium/issues?q=is%3Aissue%20state%3Aopen%20label%3Agood-first-issue%20no%3Aassignee"><code class="language-plaintext highlighter-rouge">good-first-issues</code></a>.
It’s best to select one where you have some idea how to make the changes.
Expecting other contributors to tell you which exact places to patch in the codebase is not a good solution.</p>

<p>Small documentation changes also make for excellent first contributions in my opinion because they are typically easier to get merged and still allow you to get familiar with the process.
That being said, a pull request fixing just a typo is probably not worth it.</p>

<p>Fixing something that you noticed was broken or unideal while running Cilium is usually the best, mostly because you’ll be familiar with the issue and more motivated.</p>

<p><br /></p>

<h3 id="larger-contributions">Larger Contributions</h3>

<p>The process for making larger contributions is typically a bit different, as you’ll need to discuss it with the community first.
Submitting a large set of changes without first discussing with the community is unlikely to lead to any successful outcome.</p>

<h4 id="cilium-feature-proposals">Cilium Feature Proposals</h4>

<p>For larger contributions, especially new features, it’s best to go through <a href="https://github.com/cilium/design-cfps?tab=readme-ov-file#purpose-of-cfps">the Cilium Feature Proposal (CFP) process</a> first.
The usual steps are to start writing it out in a Google document (see examples), ask for reviews in <a href="https://docs.cilium.io/en/latest/community/community/#weekly-community-meeting">the community meeting</a>, then submit it to <a href="https://github.com/cilium/design-cfps/pulls">cilium/design-cfps</a> once it’s more stable.
You don’t need to wait for the CFP to be merged before submitting a first implementation as draft pull requests.</p>

<h4 id="splitting-in-pull-requests">Splitting in Pull Requests</h4>

<p>Prefer small pull requests.
Preparatory changes can be their own pull request if they make sense on their own.</p>

<p>People sometimes also split the feature changes themselves, between datapath and agent changes, ingress and egress, or whatever else makes sense.
In that case, it’s best to hide the feature from users (ex., via a hidden flag) before all pieces are in.</p>

<p><br /></p>

<h3 id="preparing-the-branch">Preparing the Branch</h3>

<p>Within a pull request, commits should be kept small, each with few changes.
Refactoring changes should be separated from functional changes.
Tests can be separate commits as well.</p>

<p>Commit descriptions should explain the <em>why</em>.
You will often also need to explain the <em>what</em>, if it’s not obvious from reading the code.
The commit title should tell the <em>what</em>.
For example, the following commit has a title that explains the <em>what</em> (we ignore a drop reason in the CLI) and a body that explains the <em>why</em> (because it should always be ignored so we might as well ignore it by default).</p>
<pre><code class="language-git">commit a92f8c3e0ac44f4d7ed7ee210c000da5ea93f9aa
Author: Paul Chaignon &lt;paul.chaignon@gmail.com&gt;
Date:   Tue Oct 29 11:12:23 2024 +0100

    cilium-cli: Ignore "No egress gateway found" drops
    
    Those drops currently need to be ignored in all tests involving the
    egress gateway, so we might as well ignore them by default in the
    connectivity tests.
    
    Signed-off-by: Paul Chaignon &lt;paul.chaignon@gmail.com&gt;
</code></pre>

<div class="note">
When describing what the commit changes, if you find yourself writing a list, it's usually a good indication that your commit is too big: each element of the list should be its own commit instead.
</div>

<p>Remember to sign off your commits, with <code class="language-plaintext highlighter-rouge">git commit -s</code>.
If you forget, a bot will come complain on your pull request, even if it’s still in draft.
I recommend adding <a href="https://stackoverflow.com/a/46536244/6884590">a Git hook</a> to never forget.</p>

<p style="text-align: center;"><img src="/assets/cilium-first-pr/mlh-complain-signoff.png" title="The Maintainer's Little Helper bot complains about a commit missing its signed-off-by." alt="The Maintainer's Little Helper bot complains about a commit missing its signed-off-by." style="width: 100%;" /></p>

<p>If you’re making functional changes, you should definitely test them locally by deploying Cilium.</p>

<p><br /></p>

<h3 id="opening-the-draft-pull-request">Opening the Draft Pull Request</h3>

<p>Summarize the changes in the pull request description.
It doesn’t have to be long; the main description is in commits.
I typically don’t write more than a sentence per commit and sometimes just a couple sentences to sum up the whole changeset.
If you have a single commit, the pull request description can be that commit’s description<sup id="fnref:github-pr-description" role="doc-noteref"><a href="#fn:github-pr-description" class="footnote" rel="footnote">2</a></sup>.</p>

<p>If you have rights to set labels, you should set the <code class="language-plaintext highlighter-rouge">release-note/{misc,minor,major,bugfix,ci}</code> label with one of its five values.
If not, one of the reviewers will set it.
This label determines where in the release notes your pull request will be announced.
You should only set <code class="language-plaintext highlighter-rouge">release-note/bug</code> if you are fixing a bug that was exposed to users (that is, a released bug).
<code class="language-plaintext highlighter-rouge">release-note/major</code> is for major changes, such as new features.
<code class="language-plaintext highlighter-rouge">release-note/minor</code> is for any other change with user-visible impacts (ex., a new metric).
Finally, <code class="language-plaintext highlighter-rouge">release-note/ci</code> is for tests and <code class="language-plaintext highlighter-rouge">release-note/misc</code> for everything else.
See <a href="https://github.com/cilium/cilium/releases">the existing release notes</a> for examples.</p>

<p>If you are making a user-visible change<sup id="fnref:user-visible-changes" role="doc-noteref"><a href="#fn:user-visible-changes" class="footnote" rel="footnote">3</a></sup>, you should also fill in the release note itself, at the bottom of the pull request description.
This is particularly important for bug fixes.
It will be used in release notes to described how your changes affect users.
You should be specific so that users can understand if they are affected and how.
For example, for a bug fix:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>```release-note
Fix transient connectivity issue on upgrades when IPsec and IPv6 are enabled.
```
</code></pre></div></div>

<p>Always open in draft first!
That way you can run the CI before asking people for reviews.
There’s no point asking for reviews if the CI is surfacing bugs in your changes.</p>

<p style="text-align: center;"><img src="/assets/cilium-first-pr/open-pr-button.png" title="The button to open the pull request, either in draft mode or directly awaiting reviews." alt="The button to open the pull request, either in draft mode or directly awaiting reviews." style="width: 50%;" /></p>

<p>The CI consist of initial tests, triggered whenever you push, and end-to-end tests, which need to be manually triggered.</p>

<p>Wait for all initial tests to be finished.
It takes about 20 minutes.
Once they all completed, ask an Organization Member to trigger the end-to-end tests.
If you don’t know one, ask in #development on <a href="https://slack.cilium.io">the Cilium Slack</a>.</p>

<p>End-to-end tests can take up to several hours to finish, though most finish in less than an hour.</p>

<p>Note that If you only have documentation changes, you probably don’t need to trigger end-to-end tests before making the pull request ready for reviews; the tests will all be skipped anyway.</p>

<p>See <a href="#passing-the-ci">Passing the CI</a> below if any tests are failing.</p>

<p><br /></p>

<h3 id="open-for-reviews">Open for reviews</h3>

<p>Optional: If you have rights to assign reviewers, just before making ready for reviews, it may be worth selecting reviewers you know are familiar with your changes, for example people who reviewed the CFP.
Consider that you will need reviews covering each review team listed in Reviewers, so asking for reviews from people not on those teams won’t help you achieve that specific goal<sup id="fnref:more-reviews" role="doc-noteref"><a href="#fn:more-reviews" class="footnote" rel="footnote">4</a></sup>.
Don’t assign specific people without asking them!</p>

<p style="text-align: center;"><img src="/assets/cilium-first-pr/reviewers.png" title="List of review teams and assigned reviewers on a Cilium pull request." alt="List of review teams and assigned reviewers on a Cilium pull request." style="width: 55%;" /></p>

<p>Make the pull request ready for reviews.</p>

<p>Wait for reviews.
After a few days without response, post a message in #development on <a href="https://slack.cilium.io">Slack</a>.
If you still don’t get the reviews, you can try to ping the assigned reviewers directly in Slack.</p>

<p><br /></p>

<h3 id="review-rounds">Review rounds</h3>

<p>Try to address reviews quickly.
The faster you re-requested a review, the more likely reviewers are to still have all the context.
For that same reason, I wouldn’t recommend making a pull request ready for reviews just before leaving on holiday.</p>

<div class="note">
Do not address reviews in separate commits.
You should fix issues in the commit where they were introduced.
Reviewers will typically re-review the whole set of commits anyway.
Of course, if you introduce new changes, independent of previous commits, it can be a new commit.
</div>

<p>Whenever pushing a new version, make sure to also rebase so you don’t end up with failing tests or merge conflicts because your base is too old.
You can use the following commands to rebase:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git checkout <span class="o">[</span>your_branch]
git remote add upstream git@github.com:cilium/cilium
git fetch upstream
git rebase upstream/main <span class="c"># Resolve conflicts if any.</span>
git log <span class="c"># Check everything looks alright.</span>
git push origin <span class="o">[</span>your_branch] <span class="nt">--force-with-lease</span>
</code></pre></div></div>

<p>Don’t forget to mark conversations in the pull request as resolved if you addressed them.
The pull request cannot be merged until all conversations have been resolved.</p>

<p>After addressing a review, always re-request a review from the reviewer.</p>

<p style="text-align: center;"><img src="/assets/cilium-first-pr/request-review.png" title="Requesting a new review from someone who had rejected the changes." alt="Requesting a new review from someone who had rejected the changes." style="width: 65%;" /></p>

<p>Some reviewers will nitpick.
Most will indicate which requests for changes are nitpicks.
That’s fine and a good indication those requests are optional.
It probably helps to address them though, especially if you’re a new contributor.</p>

<p><br /></p>

<h3 id="passing-the-ci">Passing the CI</h3>

<p>For each failing CI job, first check the error and if it could be related to your changes.
Is it in the same code area?
Are almost all tests failing?
Then, it’s likely related to your changes.</p>

<p style="text-align: center;"><img src="/assets/cilium-first-pr/failing-ci.png" title="CI status on a pull request where the changes don't even build." alt="CI status on a pull request where the changes don't even build." style="width: 100%;" /></p>

<p>If it doesn’t look related, ask for someone to re-trigger.
Note reviewers may sometimes do this while reviewing, to help you out.</p>

<p>If the same tests fail again with the same errors, it’s either related to your changes or something is broken in main.
To check the second possibility, search for the error in <a href="https://github.com/cilium/cilium/issues?q=is%3Aissue%20state%3Aopen%20label%3Aci%2Fflake">GitHub issues</a>.
If you don’t find any issues, check if those same tests are passing on <code class="language-plaintext highlighter-rouge">main</code>.
To find the test runs on <code class="language-plaintext highlighter-rouge">main</code>, go to <a href="https://github.com/cilium/cilium/actions?query=branch%3Amain">Actions</a>, select the workflow on the left, then filter by <code class="language-plaintext highlighter-rouge">event:push</code> or <code class="language-plaintext highlighter-rouge">event:schedule</code><sup id="fnref:test-runs-main" role="doc-noteref"><a href="#fn:test-runs-main" class="footnote" rel="footnote">5</a></sup>.
If the <code class="language-plaintext highlighter-rouge">main</code> runs are consistently failing, ask for help in #development.</p>

<div class="note">
<p>Being an Organization Member helps a lot here because you can retrigger tests yourself without having to wait for someone else to do it.
You can do this using the trigger phrases indicated in the name of many CI tests (ex. <code class="language-plaintext highlighter-rouge">/ci-e2e-upgrade</code>).
Being a Reviewer is even better because then you can retrigger only failing jobs in the failing workflow (see screenshot below).
See the section below on <a href="#how-and-when-to-move-up-the-contributor-ladder">how to become an org member</a>.</p>

</div>

<p style="text-align: center;"><img src="/assets/cilium-first-pr/rerun-only-failing-jobs.png" title="The two workflow rerun buttons available to committers, with one allowing us to rerun only jobs that are failing." alt="The two workflow rerun buttons available to committers, with one allowing us to rerun only jobs that are failing." style="width: 40%;" /></p>

<p>If you suspect your changes are causing the test failure, you might want to run the same tests locally.
That is unfortunately not trivial because Cilium has many different ways it runs its tests in CI.
Instead, I would recommend to first check the artifacts; Cilium sysdumps are attached for each failure<sup id="fnref:where-are-artifacts" role="doc-noteref"><a href="#fn:where-are-artifacts" class="footnote" rel="footnote">6</a></sup> and they contain a lot of debugging information.
If that isn’t enough, you can find how to reproduce locally by checking the sources for the workflow.
You can see the sources by clicking on “Workflow file” in the bottom left.</p>

<p style="text-align: center;"><img src="/assets/cilium-first-pr/workflow-file.png" title="The &quot;Workflow file&quot; tab to see the sources for the workflow." alt="The &quot;Workflow file&quot; tab to see the sources for the workflow." style="width: 50%;" /></p>

<p>Don’t rebase your pull request just because a couple CI jobs are failing.
When you rebase, you start from scratch in terms of CI.
There are almost always a few CI jobs failing so if you rebase each time, you will never reach a green CI.
That being said, if you didn’t rebase your pull request in a while (ex., your base is more than a week old), rebasing may help avoid CI failures.</p>

<div class="note">
The CI is in a bad state and has basically always been in Cilium.
Cilium's CI is huge because it needs to cover many different features, environments, and kernels.
It is thus unsurprising that it requires a lot of maintenance.
My only advise is to be patient and ask for help if you can't figure out what's happening.
</div>

<p><br /></p>

<h3 id="merging">Merging!</h3>

<p>Once all review teams are covered with Approved reviews and CI is green, the pull request should be mergeable.
You may need to ping in #development to ask one of the committers to merge it.</p>

<p style="text-align: center;"><img src="/assets/cilium-first-pr/merging-button.jpeg" title="The pull request status once CI is green and Approved reviews are in. Only Committers can then merge." alt="The pull request status once CI is green and Approved reviews are in. Only Committers can then merge." style="width: 100%;" /></p>

<p><br /></p>

<h3 id="common-questions">Common Questions</h3>

<h4 id="what-should-i-do-if-the-needs-rebase-label-is-added">What Should I Do If the <code class="language-plaintext highlighter-rouge">needs-rebase</code> Label is Added?</h4>

<p>If the <code class="language-plaintext highlighter-rouge">needs-rebase</code> label was added, it can be because there are merge conflicts or because a reviewer thinks it would help reduce CI failures.</p>

<p>You can rebase with the following steps:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git checkout <span class="o">[</span>your_branch]
git remote add upstream git@github.com:cilium/cilium
git fetch upstream
git rebase upstream/main <span class="c"># Resolve conflicts if any.</span>
git log <span class="c"># Check everything looks alright.</span>
git push origin <span class="o">[</span>your_branch] <span class="nt">--force-with-lease</span>
</code></pre></div></div>
<p>If there are merge conflicts, git will complain when running <code class="language-plaintext highlighter-rouge">git rebase upstream/main</code>.
Follow its instructions, fix the conflicts, and finish the rebase before pushing your updated branch.</p>

<h4 id="what-if-i-want-to-fix-something-that-is-broken-in-a-previous-release">What If I Want to Fix Something that is Broken in a Previous Release?</h4>

<p>For bugs in stable branches, fixes should be sent to <code class="language-plaintext highlighter-rouge">main</code> first. After they are merged, they will be backported to the affected branches if they meet <a href="https://docs.cilium.io/en/stable/contributing/release/backports/">the backport criteria</a>.</p>

<p>If the bug is fixed in <code class="language-plaintext highlighter-rouge">main</code> by another pull request, you should check on that other pull request if the backport was considered.
If it wasn’t and you think it matches the backport criteria, you can ask in the merged pull request to consider backporting it.
Don’t forget to explain why you think it should be backported.
For example: you reproduced the issue on that version and it matches backport criteria X and Y.</p>

<p>If the bug doesn’t exist in <code class="language-plaintext highlighter-rouge">main</code> but not because it was fixed by a pull request (ex. the feature was later removed or refactored), then you can send a fix directly to the affected branch.
Be sure to clearly explain why you’re sending the fix without going through <code class="language-plaintext highlighter-rouge">main</code>.
Note that it still needs to match the backport criteria even if it’s not actually being backported.</p>

<h4 id="how-and-when-to-move-up-the-contributor-ladder">How and When to Move up the Contributor Ladder?</h4>

<p>Cilium has a contributor ladder with multiple roles you can read about <a href="https://github.com/cilium/community/blob/main/CONTRIBUTOR-LADDER.md">here</a>. You can ask to move up the ladder on that same repository. The TL;DR of roles is as follows:</p>
<ul>
  <li><strong>Community Contributor</strong>: Everyone who contributes.</li>
  <li><strong>Organization Member</strong>:
    <ul>
      <li>Main power: Trigger CI tests by youself with <code class="language-plaintext highlighter-rouge">/test</code>, <code class="language-plaintext highlighter-rouge">/ci-e2e-upgrade</code>, and similar comments.</li>
      <li>When: After making several pull requests.
I’d say minimum ~4, but the exact number isn’t defined and probably depends on the pull requests.</li>
    </ul>
  </li>
  <li><strong>Reviewer</strong>:
    <ul>
      <li>Main powers: Retrigger only failing jobs within a CI workflow.
Assign reviewers on pull requests.
Your reviews count for the review team(s) you asked to join.</li>
      <li>When: After leaving reviews for a few months.</li>
    </ul>
  </li>
  <li><strong>Committer</strong>:
    <ul>
      <li>Main powers: Click the merge button once green.
Vote on project matters.</li>
      <li>When: You can’t request this. Another Committer will have to nominate you.
I’ve usually nominated people after a year of semi-regular contributions and reviews, or sooner if the person contributes often.</li>
    </ul>
  </li>
</ul>

<p>If you’re planning to contribute to Cilium for a while, I’d recommend to ask to become an Organization Member as soon as possible, as it helps a lot with the CI.</p>

<p><br /></p>

<h3 id="conclusion">Conclusion</h3>

<p>Welcome to the community of Cilium contributors!
I hope this guide will help you make many successful contributions.
With a relatively informal process like this one, it can be hard to cover everything.
So if you notice something I didn’t cover, please reach out!</p>

<p>And as usual in open source, don’t hesitate to reach out to fellow contributors in public channels.
The best place for that is probably <a href="https://slack.cilium.io">the Cilium Slack</a>, in the #development channel I mentioned several times above.</p>

<p><br /></p>

<p>Thanks to Simone for his help in understanding GitHub’s permission model.</p>

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:c-and-go" role="doc-endnote">
      <p>Golang for the userspace parts and C for the kernel/eBPF parts. <a href="#fnref:c-and-go" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:github-pr-description" role="doc-endnote">
      <p>In that case, GitHub will automatically copy your commit description into the pull request description. <a href="#fnref:github-pr-description" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:user-visible-changes" role="doc-endnote">
      <p>User-visible changes correspond to labels <code class="language-plaintext highlighter-rouge">release-note/major</code>, <code class="language-plaintext highlighter-rouge">release-note/minor</code>, and <code class="language-plaintext highlighter-rouge">release-note/bug</code>. You should write the release note even if you don’t have permissions to set the corresponding label yourself. <a href="#fnref:user-visible-changes" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:more-reviews" role="doc-endnote">
      <p>Of course, having more reviews can help improve the quality of the pull request. <a href="#fnref:more-reviews" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:test-runs-main" role="doc-endnote">
      <p>Test runs on <code class="language-plaintext highlighter-rouge">main</code> are always running either on push or on schedule. <a href="#fnref:test-runs-main" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:where-are-artifacts" role="doc-endnote">
      <p>Under the “Summary” tab on the left, at the very bottom of the page. <a href="#fnref:where-are-artifacts" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="cilium" /><summary type="html"><![CDATA[Getting started guide on contributing to the open source project Cilium, by making your first pull request. Includes many tips on how to prepare the pull request, go through reviews, or pass the CI.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://pchaigno.github.io/assets/cilium-first-pr/open-pr-button.png" /><media:content medium="image" url="https://pchaigno.github.io/assets/cilium-first-pr/open-pr-button.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Linux XFRM Reference Guide for IPsec</title><link href="https://pchaigno.github.io/xfrm/2024/10/30/linux-xfrm-ipsec-reference-guide.html" rel="alternate" type="text/html" title="Linux XFRM Reference Guide for IPsec" /><published>2024-10-30T08:26:10+00:00</published><updated>2025-02-13T10:31:00+00:00</updated><id>https://pchaigno.github.io/xfrm/2024/10/30/linux-xfrm-ipsec-reference-guide</id><content type="html" xml:base="https://pchaigno.github.io/xfrm/2024/10/30/linux-xfrm-ipsec-reference-guide.html"><![CDATA[<p>This post focuses on the XFRM building blocks Cilium uses to provide its IPsec support.
Therefore, only tunnel mode and ESP are discussed, XFRM devices are not described, and some focus is made on the use of packet marks in XFRM policies and states.</p>

<p>Several others have written on XFRM, usually with a slightly different focus.
James Bottomley gave a <a href="https://blog.hansenpartnership.com/figuring-out-how-ipsec-transforms-work-in-linux/">quick introduction on his blog</a>, with examples of configurations.
<a href="https://thermalcircle.de/doku.php?id=blog:linux:nftables_ipsec_packet_flow">Andrej Stender’s blog</a> has a very detailed description of the typical packet paths for IPsec gateways.</p>

<p>If you find mistakes, you can report them by email or via the other contact methods listed at the bottom.</p>

<!--  -->
<ul id="toc" class="section-nav">
<li class="toc-entry toc-h3"><a href="#overview">Overview</a>
<ul>
<li class="toc-entry toc-h4"><a href="#xfrm-policies-and-states">XFRM Policies and States</a></li>
<li class="toc-entry toc-h4"><a href="#policy-templates">Policy Templates</a></li>
</ul>
</li>
<li class="toc-entry toc-h3"><a href="#xfrm-packet-flows">XFRM Packet Flows</a>
<ul>
<li class="toc-entry toc-h4"><a href="#egress-packet-flow">Egress Packet Flow</a></li>
<li class="toc-entry toc-h4"><a href="#ingress-packet-flow">Ingress Packet Flow</a></li>
</ul>
</li>
<li class="toc-entry toc-h3"><a href="#output-description-of-ip-xfrm">Output Description of `ip xfrm`</a></li>
<li class="toc-entry toc-h3"><a href="#updating-xfrm-states-and-policies">Updating XFRM States and Policies</a>
<ul>
<li class="toc-entry toc-h4"><a href="#identifying-fields-of-xfrm-states">Identifying Fields of XFRM States</a></li>
<li class="toc-entry toc-h4"><a href="#identifying-fields-of-xfrm-policies">Identifying Fields of XFRM Policies</a></li>
<li class="toc-entry toc-h4"><a href="#seamless-updates-of-xfrm-policies">Seamless Updates of XFRM Policies</a></li>
<li class="toc-entry toc-h4"><a href="#seamless-updates-of-xfrm-states">Seamless Updates of XFRM States</a></li>
</ul>
</li>
<li class="toc-entry toc-h3"><a href="#xfrm-errors">XFRM Errors</a></li>
<li class="toc-entry toc-h3"><a href="#performance-considerations">Performance Considerations</a>
<ul>
<li class="toc-entry toc-h4"><a href="#data-structure-for-xfrm-policies">Data Structure for XFRM Policies</a></li>
<li class="toc-entry toc-h4"><a href="#data-structure-for-xfrm-states">Data Structure for XFRM States</a></li>
</ul>
</li>
</ul>
<!--  -->

<h3 id="overview">Overview</h3>

<p>IPsec encryption in the Linux kernel relies on <a href="https://man7.org/linux/man-pages/man8/ip-xfrm.8.html">XFRM</a>.
XFRM is an IP framework intended for packet transformations, from encryption to compression.
It is configured via a set of <em>policy</em> and <em>state</em> objects, which for IPsec, correspond to <a href="https://datatracker.ietf.org/doc/html/rfc4301">Security Policies and Security Associations</a>.</p>

<h4 id="xfrm-policies-and-states">XFRM Policies and States</h4>

<p>At a high-level, XFRM policies define what traffic to accept and reject, whereas states define how to perform the encryption and decryption.
Policies can match on the direction (<code class="language-plaintext highlighter-rouge">out</code>, <code class="language-plaintext highlighter-rouge">in</code>, or <code class="language-plaintext highlighter-rouge">fwd</code>), the source and destination IP addresses with CIDRs, and the packet mark.
As an example, the following policy matches egressing packets with any source IP address, 10.56.1.X destination IP addresses, and <code class="language-plaintext highlighter-rouge">0xcb93eXX</code> packet marks.
Policies default to allowing traffic as done here.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>src 0.0.0.0/0 dst 10.56.1.0/24 
	dir out priority 0 
	mark 0xcb93e00/0xffffff00 
	[...]
</code></pre></div></div>
<p>States are relatively similar, except that they are agnostic to the direction and can only match on exact IP addresses (or 0.0.0.0 to match all).
The following state will apply to packets with IP addresses 10.56.0.17 -&gt; 10.56.1.238 and the same packet marks as above.
In the case of tunnel-mode IPsec, these IP addresses correspond to the outer IP addresses.
For ingressing, encrypted packets, the SPI will also be used (discussed below).</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>src 10.56.0.17 dst 10.56.1.238
	proto esp spi 0x00000003 reqid 1 mode tunnel
	replay-window 0 
	mark 0xcb93e00/0xffffff00 output-mark 0xe00/0xffffff00
	aead rfc4106(gcm(aes)) 0x6254fced5f7a5ea9401b9015ecf10d65eac51a69 128
	anti-replay context: seq 0x0, oseq 0x36, bitmap 0x00000000
	sel src 0.0.0.0/0 dst 0.0.0.0/0
</code></pre></div></div>
<p>You may notice that nothing specifies if this state should perform encryption or decryption.
That’s because it can actually do both.
As said above, states are agnostic to the direction of traffic so the same state may theoretically be used for both encryption and decryption.
What to do will be determined based on where in the stack the state is matched (ex., decryption on ingress).</p>

<h4 id="policy-templates">Policy Templates</h4>

<p>XFRM policies also typically define a template, as below:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>src 0.0.0.0/0 dst 10.56.1.0/24 
	dir out priority 0 
	mark 0xcb93e00/0xffffff00 
	tmpl src 10.56.0.17 dst 10.56.1.238
		proto esp spi 0x00000003 reqid 1 mode tunnel
</code></pre></div></div>
<p>How this template is used depends on the direction.
For egressing traffic, the template defines the encoding to perform.
For example, the above template will encapsulate packets with an IP header and an ESP header.
The IP header will have IP addresses 10.56.0.17 and 10.56.1.238.
The ESP header will have SPI 3.</p>

<p>For ingressing and forwarded traffic however, the template acts as an additional filter.
The following XFRM policy for example will only allow packets if they are ESP packets with outer IP addresses 10.56.1.238 and 10.56.0.17, in addition to having a packet mark matching <code class="language-plaintext highlighter-rouge">0xd00/0xf00</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>src 0.0.0.0/0 dst 10.56.0.0/24 
	dir in priority 0 
	mark 0xd00/0xf00 
	tmpl src 10.56.1.238 dst 10.56.0.17
		proto esp reqid 1 mode tunnel
</code></pre></div></div>

<p>The template of XFRM OUT policies points to the XFRM state to use for encryption.
The IP addresses, the SPI, the protocol, the mode, and the reqid should all match between the XFRM state and the template.</p>

<h3 id="xfrm-packet-flows">XFRM Packet Flows</h3>

<p>IPsec and XFRM are represented in <a href="https://upload.wikimedia.org/wikipedia/commons/3/37/Netfilter-packet-flow.svg">the usual Linux networking diagram</a>.
There are however several errors in that diagram when it comes to XFRM<sup id="fnref:netfilter-diagram-errors" role="doc-noteref"><a href="#fn:netfilter-diagram-errors" class="footnote" rel="footnote">1</a></sup>, so I decided to write a new one.
It takes inspiration from <a href="https://thermalcircle.de/doku.php?id=blog:linux:nftables_ipsec_packet_flow">Andrej Stender’s diagrams</a> and simplifies the overall flow to focus on just what I want to explain here.
All pieces related to XFRM are in purple, routing decisions in orange, and the rest in yellow.</p>

<div style="text-align: center;">
<img src="/assets/netfilter-with-xfrm.png" alt="Packet flow in Netfilter and XFRM subsystems." class="zoomable" />
</div>

<p><br /></p>

<h4 id="egress-packet-flow">Egress Packet Flow</h4>

<p>On egress, packets will first hit one of the <code class="language-plaintext highlighter-rouge">XFRM OUT policy</code> blocks.
At this point, a lookup is performed against the XFRM OUT policies.
If a match is found, the packet goes to the <code class="language-plaintext highlighter-rouge">XFRM encode</code> block and the template is used to lookup XFRM states.
If a state is found, its information is used to encrypt the packet.</p>

<p>The encrypted packet will then navigate again through the <code class="language-plaintext highlighter-rouge">OUTPUT</code> and <code class="language-plaintext highlighter-rouge">POSTROUTING</code> chains.</p>

<h4 id="ingress-packet-flow">Ingress Packet Flow</h4>

<p>On ingress, encrypted packets (ex., ESP packets) will hit the <code class="language-plaintext highlighter-rouge">XFRM decode</code> after they navigate through the <code class="language-plaintext highlighter-rouge">INPUT</code> chain.</p>

<p>In tunnel mode, encrypted packets will typically have one of the server’s IP addresses as the outer destination address, so they should automatically be routed through the <code class="language-plaintext highlighter-rouge">INPUT</code> chain.
If not, it may be necessary to add IP routes to redirect packets to the <code class="language-plaintext highlighter-rouge">INPUT</code> chain.
As an example, Cilium identifies IPsec traffic on tc-bpf ingress and marks them with a special value which is then used to reroute those packets to the <code class="language-plaintext highlighter-rouge">INPUT</code> chain.</p>

<p>At the <code class="language-plaintext highlighter-rouge">XFRM decode</code>, if packets match an XFRM state, they will be decoded (i.e., decapsulated and decrypted) using the state’s information.
The match is based on the source &amp; destination addresses, the mark, the SPI, and the protocol.
In case of any decoding error (ex., wrong key), the packet is dropped and an error counter is increased.</p>

<p>As illustrated on the diagram, an XFRM policy matching the packet isn’t required for the decoding to happen (it goes directly to <code class="language-plaintext highlighter-rouge">XFRM decode</code>), but it is required for the packet to proceed to a local process or through the <code class="language-plaintext highlighter-rouge">FORWARD</code> chain.
An XFRM policy with an optional template (i.e., <code class="language-plaintext highlighter-rouge">level use</code>) will allow all decrypted packets through.
Traffic that was never encrypted, and therefore does not come from <code class="language-plaintext highlighter-rouge">XFRM decode</code>, is allowed by default.</p>

<p>After a packet is decrypted, it is recirculated in the stack, as if coming from the interface it was initially received on.
More specifically, packets are recirculated before the tc layer, such that they are visible on the tc-bpf hook a second time (once before decryption, once after).
The packet mark is preserved when recirculated, so it’s possible to identify and trace packets that have been decrypted and recirculated.
The packet mark can also be modified during decryption, using the <code class="language-plaintext highlighter-rouge">output-mark</code> field of the XFRM states.</p>

<h3 id="output-description-of-ip-xfrm">Output Description of <code class="language-plaintext highlighter-rouge">ip xfrm</code></h3>

<p>The example outputs below are from iproute2-6.1.0.
More fields will likely appear in newer versions.
For example, XFRM states have a <code class="language-plaintext highlighter-rouge">dir</code> field in newer kernels (v6.10+), which will likely appear in the <code class="language-plaintext highlighter-rouge">ip xfrm state</code> output at some point.</p>

<p>In the <code class="language-plaintext highlighter-rouge">ip xfrm</code> output, policies are ordered by date of creation, with newer policies at the top.
This is important because, in case two policies match a packet and have the same priority, the newest one is used.</p>

<!--  -->
<div class="highlighter-rouge"><div class="highlight fake-pre">
$ ip -s xfrm policy<br />
<span class="field">src 0.0.0.0/0<span class="field-desc">The CIDR to match against the source IP address</span></span> <span class="field">dst 0.0.0.0/0<span class="field-desc">The CIDR to match against the destination IP address</span></span> uid 0<br />
&emsp;&emsp;<span class="field">dir fwd<span class="field-desc">States the direction. It defines where in the Linux stack this policy will be used, between ingress, egress, and forwarding.</span></span> <span class="field">action allow<span class="field-desc">The action to take on matching packets. Packets can only be allowed through (by default) or dropped.</span></span> <span class="field">index 18<span class="field-desc">Used to differentiate between different policies which might have the same or overlapping selectors. If not given or if it already exists, it is automatically (re-)generated (cf., `xfrm_gen_index`). The three LSBs encode the direction (ex., 1 for `XFRM_POLICY_OUT`). The MSBs are simply incremented by one (that is, the index is incremented by 8) until a free index is found.</span></span> <span class="field">priority 2975<span class="field-desc">States the priority for this policy in case multiple could match the packet. 0 is the highest priority.</span></span> <span class="field">share any<span class="field-desc">Always set to `any` and unused today.</span></span> <span class="field">flag  (0x00000000)<span class="field-desc">Set of flags for XFRM policies. Only `XFRM_POLICY_ICMP` (0x2) is supported at the moment; `XFRM_POLICY_LOCALOK` (0x1) is not implemented (anymore?). When `XFRM_POLICY_ICMP` is given, the policy will also apply to ICMP packet with a payload packet that matches the policy's selector.</span></span><br />
&emsp;&emsp;lifetime config:<br />
&emsp;&emsp;<span class="field">
&emsp;&emsp;limit: soft (INF)(bytes), hard (INF)(bytes)<br />
&emsp;&emsp;limit: soft (INF)(packets), hard (INF)(packets)<span class="field-desc">Not implement and not enforced.</span></span><br />
&emsp;&emsp;<span class="field">
&emsp;&emsp;expire add: soft 0(sec), hard 0(sec)<br />
&emsp;&emsp;expire use: soft 0(sec), hard 0(sec)<span class="field-desc">Various expiration times for the policy, based on the time since the policy was added or the time since the policy was last matched by a packet. When a soft expiration time is reached, a notification is sent to userspace via netlink (`struct xfrm_user_expire`). When a hard limit or expiration time is reached, the policy is deleted.</span></span><br />
&emsp;&emsp;lifetime current:<br />
&emsp;&emsp;&emsp;&emsp;<span class="field">0(bytes), 0(packets)<span class="field-desc">Not implemented; will always be 0.</span></span><br />
&emsp;&emsp;&emsp;&emsp;<span class="field">add 2024-06-17 11:24:49 use 2024-06-17 11:25:01<span class="field-desc">Timestamps for when the policy was added and when it was last matched by a packet, to be used if expiration times have been set.</span></span><br />
&emsp;&emsp;tmpl <span class="field">src 0.0.0.0<span class="field-desc">See Policy Templates for how this field is used.</span></span> <span class="field">dst 10.92.0.164<span class="field-desc">See Policy Templates for how this field is used.</span></span><br />
&emsp;&emsp;&emsp;&emsp;<span class="field">proto esp<span class="field-desc">See Policy Templates for how this field is used.</span></span> <span class="field">spi 0x00000000(0)<span class="field-desc">See Policy Templates for how this field is used.</span></span> <span class="field">reqid 1(0x00000001)<span class="field-desc">See Policy Templates for how this field is used.</span></span> <span class="field">mode tunnel<span class="field-desc">See Policy Templates for how this field is used.</span></span><br />
&emsp;&emsp;&emsp;&emsp;<span class="field">level use<span class="field-desc">The nonsensical way to indicate this template is optional, the alternative being `level required`. If no XFRM state matching the template is found, the template will be skipped if optional. Otherwise, the packet will be dropped with `XfrmInTmplMismatch`.</span></span> <span class="field">share any<span class="field-desc">Not implemented and will always be `any`.</span></span><br />
&emsp;&emsp;&emsp;&emsp;<span class="field">enc-mask ffffffff<span class="field-desc">Bit mask defining the list of allowed encryption algorithms. See Encryption algorithms in include/uapi/linux/pfkeyv2.h for the list of possible values.</span></span> <span class="field">auth-mask ffffffff<span class="field-desc">Bit mask defining the list of allowed authentication algorithms. See Authentication algorithms in include/uapi/linux/pfkeyv2.h for the list of possible values.</span></span> <span class="field">comp-mask ffffffff<span class="field-desc">Non-implemented bit mask (was probably defined for compression algorithms).</span></span>
</div></div>
<!--  -->

<!--  -->
<div class="highlighter-rouge"><div class="highlight fake-pre">
&amp; ip -s xfrm state<br />
<span class="field">src 10.92.1.189<span class="field-desc">The IP address to match against the packets' source IP addresses.</span></span> <span class="field">dst 10.92.0.164<span class="field-desc">The IP address to match against the packets' destination IP addresses.</span></span><br />
&emsp;&emsp;<span class="field">proto esp<span class="field-desc">The IPsec protocol to use.</span></span> <span class="field">spi 0x00000003(3)<span class="field-desc">The Security Parameter Index. A tag to distinguish between multiple IPsec streams that may be using different algorithms and/or keys. Particularly useful during key rotations.</span></span> <span class="field">reqid 1(0x00000001)<span class="field-desc">An ID only used to ensure the XFRM policy template and the state match. It doesn't seem to be used for anything else in the kernel.</span></span> <span class="field">mode tunnel<span class="field-desc">States whether the packet is encapsulated (`tunnel`) or if the ESP header is simply added to the existing packet (`transport`).</span></span><br />
&emsp;&emsp;<span class="field">replay-window 0<span class="field-desc">Size of the replay window used for the anti-replay checks (i.e., toleration setting).</span></span> seq 0x00000000 <span class="field">flag  (0x00000000)<span class="field-desc">Holds various flags including `XFRM_STATE_ESN` (0x80) for ESN mode.</span></span><br />
&emsp;&emsp;<span class="field">mark 0x4db50d00/0xffff0f00<span class="field-desc">The value and mask used to match against the packets' marks.</span></span> <span class="field">output-mark 0xd00/0xffffff00<span class="field-desc">The value and mask to apply to the packets' marks after they have been encrypted or decrypted.</span></span><br />
&emsp;&emsp;<span class="field">aead rfc4106(gcm(aes))<span class="field-desc">The type and name of algorithm in use.</span></span> <span class="field">0x856f15d0ccabe682286b4286bccf5d595b88b168 (160 bits)<span class="field-desc">The key and its size. It's of course sensitive information that should be treated as such.</span></span> <span class="field">128<span class="field-desc">The ICV length. Which lengths are supported depends on the algorithm in use.</span></span><br />
&emsp;&emsp;anti-replay context: <span class="field">seq 0x0<span class="field-desc">Holds the current receive-side sequence number, for the anti-replay check.</span></span>, <span class="field">oseq 0x0<span class="field-desc">The last emitted sequence number. If this number overflows (on 32-bits), packets are dropped and the error counter `XfrmOutStateSeqError` is increased. In ESN mode, this sequence number is coded on 64-bits.</span></span>, <span class="field">bitmap 0x00000000<span class="field-desc">Tracks the sequence numbers that have already been seen in the replay window.</span></span><br />
&emsp;&emsp;<span class="field">sel src 0.0.0.0/0 dst 0.0.0.0/0<span class="field-desc">An additional filter applying to the decrypted packets, to ensure the inner packets are coming and going where you expect.</span></span> <span class="field">uid 0<span class="field-desc">This field appears to be unused (`user` in `struct xfrm_selector`).</span></span><br />
&emsp;&emsp;<span class="field">lifetime config:<br />
&emsp;&emsp;limit: soft (INF)(bytes), hard (INF)(bytes)<br />
&emsp;&emsp;limit: soft (INF)(packets), hard (INF)(packets)<br />
&emsp;&emsp;expire add: soft 0(sec), hard 0(sec)<br />
&emsp;&emsp;expire use: soft 0(sec), hard 0(sec)<span class="field-desc">Various limits and expiration times for the state, based on the number of bytes received, the number of packets received, the time since the state was added, or the time since the state was last used for a packet. When a soft limit or expiration time is reached, a notification is sent to userspace via netlink (`struct xfrm_user_expire`). When a hard limit or expiration time is reached, the state is deleted.</span></span><br />
&emsp;&emsp;lifetime current:<br />
&emsp;&emsp;&emsp;&emsp;<span class="field">20124(bytes), 83(packets)<span class="field-desc">Counters for bytes and packets matched by this state, to be used if limits have been set.</span></span><br />
&emsp;&emsp;&emsp;&emsp;<span class="field">add 2024-06-17 11:15:48 use 2024-06-17 11:16:02<span class="field-desc">Timestamps for when the state was added and when it was last matched by a packet, to be used if expiration times have been set.</span></span><br />
&emsp;&emsp;stats:<br />
&emsp;&emsp;&emsp;&emsp;<span class="field">replay-window 0<span class="field-desc">Incremented whenever a packet is received with a sequence number outside the window.</span></span> <span class="field">replay 0<span class="field-desc">Incremented whenever a packet is received with a sequence number in the replay window that was already observed.</span></span> <span class="field">failed 0<span class="field-desc">Incremented when the checksums for authentication or encryption headers are incorrect (full name `integrity_failed` on kernel's side). `XfrmInStateProtoError` is always incremented when this counter is incremented.</span></span>
</div></div>
<!--  -->

<h3 id="updating-xfrm-states-and-policies">Updating XFRM States and Policies</h3>

<p>In Cilium, on several occasions, we had to make substantial changes to our XFRM states and policies.
In the process, we faced several conflicts: you try to add a new XFRM state and the kernel complains that it conflicts with an existing state.
These conflicts can be particularly non-obvious as they can depend on the order of additions for XFRM states.
With proper documentation that would be easy to resolve, but in its absence, you need to dig into the kernel sources to understand which fields matter to identify a state or a policy.</p>

<p>This section aims to document those aspects: which fields constitute the “key” of XFRM states and policies, how to avoid conflicts, and how to perform updates without dropping traffic.</p>

<h4 id="identifying-fields-of-xfrm-states">Identifying Fields of XFRM States</h4>

<!--  -->
<div class="highlighter-rouge"><div class="highlight fake-pre">
$ ip xfrm state<br />
src 10.36.98.139 dst <span style="color: #D17638; font-weight: bold;">10.36.1.178</span><br />
&emsp;&emsp;proto <span style="color: #D17638; font-weight: bold;">esp</span> spi <span style="color: #D17638; font-weight: bold;">0x00000003</span> reqid 1 mode tunnel<br />
&emsp;&emsp;replay-window 0<br />
&emsp;&emsp;mark <span style="color: #D17638; font-weight: bold;">0xc90a</span>0000/0xffff0000 output-mark 0xd00/0xffffff00<br />
&emsp;&emsp;aead rfc4106(gcm(aes)) 0xf83bd6832d552fa23e9ab5fdb742e1241b054f6c 128<br />
&emsp;&emsp;anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000<br />
&emsp;&emsp;sel src 0.0.0.0/0 dst 0.0.0.0/0<br />
</div></div>
<!--  -->

<p>XFRM states are identified by their destination IP address, the masked value of the mark, the SPI, and the protocol, as shown above, in bold orange.
The source IP address and the unmasked part of the mark are not considered when identifying XFRM states.
Thus, the “key” for XFRM states could be written as:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>key = (dst_ip, proto, spi, (mark_value &amp; mark_mask))
</code></pre></div></div>

<p>For the mark, it checks if the sanitized value (i.e., with the mask applied) from the new mark is matched by any of the existing marks:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(new_mark_value &amp; new_mark_mask) &amp; existing_mark_mask != existing_mark_value
</code></pre></div></div>
<p>For example, if the new mark is <code class="language-plaintext highlighter-rouge">0x12345600/0xffffff00</code> and mark <code class="language-plaintext highlighter-rouge">0x12340000/0xffff0000</code> already exists, the new mark will be rejected.
If however <code class="language-plaintext highlighter-rouge">0x12345600/0xffffff00</code> was added first and <code class="language-plaintext highlighter-rouge">0x12340000/0xffff0000</code> is the new mark, it will be accepted.
Hence, the order of addition of XFRM states can matter.</p>

<div class="note">
Note that if you use unsanitized mark values, you may run into unexpected behavior at runtime.
An unsanitized value is one with bits set that are not part of the mask, ex. 0xabcd0001/0xffff0000.
If using such mark values, the kernel will apply the mask to the packet's mark and then compare it to the unsanitized value.
Therefore, it won't match any packets at runtime.
This bug concerns the marks of both policies and states, for ingress and egress.
</div>

<p>For XFRM state deletions, note that it will complain if you pass any argument not part of the key… except for the source IP address.
But even if you give it a source IP address, it will not consider it when matching for the deletion.</p>

<h4 id="identifying-fields-of-xfrm-policies">Identifying Fields of XFRM Policies</h4>

<!--  -->
<div class="highlighter-rouge"><div class="highlight fake-pre">
$ ip xfrm policy<br />
src <span style="color: #D17638; font-weight: bold;">10.0.0.0/8</span> dst <span style="color: #D17638; font-weight: bold;">10.36.1.0/24</span><br />
&emsp;&emsp;dir <span style="color: #D17638; font-weight: bold;">in</span> priority 0<br />
&emsp;&emsp;mark <span style="color: #D17638; font-weight: bold;">0x58d73e00/0xffffff00</span><br />
&emsp;&emsp;tmpl src 10.36.1.179 dst 10.36.2.60<br />
&emsp;&emsp;&emsp;&emsp;proto esp reqid 0 mode transport<br />
</div></div>
<!--  -->

<p>XFRM policies are identified by their direction, source IP address &amp; mask, destination IP address &amp; mask, and their mark &amp; mask, as shown above in bold orange.
However, contrary to XFRM states, the masks (ex., CIDR or mark masks) are not applied before using the related values (resp., CIDR IP addresses or mark values).
Thus, the “key” for XFRM policies could be written as:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>key = (dir, src_cidr_ip, src_cidr_mask, dst_cidr_ip, dst_cidr_mask,
       mark_value, mark_mask)
</code></pre></div></div>
<p>So XFRM policies with <code class="language-plaintext highlighter-rouge">dst 10.0.0.0/8</code> and <code class="language-plaintext highlighter-rouge">dst 10.1.1.1/8</code> will be considered two different policies!
For updates and deletions, the exact values must be used: a more generic policy won’t be considered a match.</p>

<h4 id="seamless-updates-of-xfrm-policies">Seamless Updates of XFRM Policies</h4>

<p>Updating XFRM policies without disrupting ongoing traffic is relatively easy.
If you only need to update non-identifying fields such as the priority or the template, you can simply run <code class="language-plaintext highlighter-rouge">ip xfrm policy update</code>.</p>

<p>If you however need to identify fields (ex., change the mask for marks), then you can first de-prioritize existing policies, before adding the new policies with a higher priority.
By default, policies are created with the highest priority, 0.
Thus, de-prioritizing a policy is a simple matter of running <code class="language-plaintext highlighter-rouge">ip xfrm policy update</code> to increase the priority value.
This guarantees that the old policies stay in place and traffic is still processed during the update.
Then, new policies can be added as usual, with a higher priority, for example 0.
Once all new policies are in place, old policies shouldn’t be used anymore and can be removed.</p>

<h4 id="seamless-updates-of-xfrm-states">Seamless Updates of XFRM States</h4>

<p>Similarly to policies, updating non-identifying fields of existing states is a simple matter of running <code class="language-plaintext highlighter-rouge">ip xfrm state update</code>.
If you need to update identifying fields however, there is no priority mechanism to keep both sets of states, old and new, in place during the update.</p>

<p>Instead, the best approach I’m aware of to avoid disrupting operations during the update is to rely on SPIs.
You can distinguish the sets of old and new states by their SPIs, for example by reserving bits in the SPI for a version number.
As an example, if your existing states have SPIs <code class="language-plaintext highlighter-rouge">0x0000xxxx</code>, you could assign SPIs <code class="language-plaintext highlighter-rouge">0x0001xxxx</code> for the new states.
This approach obviously requires some planning beforehand, when assigning the SPIs.</p>

<p>Then, you need some synchronization mechanism to only start encrypting traffic with the new SPIs once the receiver has installed XFRM states with the new SPIs as well.
Presumably, you already have such a synchronization mechanism to handle key rotations.</p>

<h3 id="xfrm-errors">XFRM Errors</h3>

<p>All XFRM errors correspond to packet drops.
Some of them may also be associated with per-state counters increasing.
<code class="language-plaintext highlighter-rouge">CONFIG_XFRM_STATISTICS</code> is required to see these error counters in <code class="language-plaintext highlighter-rouge">/proc/net/xfrm_stat</code>.</p>

<ul>
  <li><ins>XfrmInError</ins>: If the kernel fails to allocate memory during encryption.</li>
  <li><ins>XfrmInBufferError</ins>:
    <ul>
      <li>If a packet is going through too many XFRM states.
The maximum is set to <code class="language-plaintext highlighter-rouge">XFRM_MAX_DEPTH</code> (6).</li>
      <li>If too many XFRM policy templates apply to a packet.
The maximum is also set to <code class="language-plaintext highlighter-rouge">XFRM_MAX_DEPTH</code> (6).</li>
    </ul>
  </li>
  <li><ins>XfrmInHdrError</ins>:
    <ul>
      <li>If the SPI portion of the packet is malformed.</li>
      <li>If the outer IP header is malformed.</li>
    </ul>
  </li>
  <li><ins>XfrmInNoStates</ins>: If no XFRM IN state was found that matches the AH or ESP packet ingressing on the INPUT chain.</li>
  <li><ins>XfrmInStateProtoError</ins>:
    <ul>
      <li>If the AH or ESP checksum is incorrect.</li>
      <li>If the packet’s IPsec protocol (ex., AH, ESP) doesn’t match the protocol specified by the XFRM state.</li>
      <li>Also includes all protocol specific errors (ex., from <code class="language-plaintext highlighter-rouge">esp_input</code>) listed below:</li>
      <li>If decryption/encryption fails (ex., because the key specified in the XFRM IN state doesn’t match the key with which the packet was encrypted).</li>
      <li>If the protocol headers (ex., ESP) or trailers are malformed.</li>
      <li>If there is not enough memory to perform encryption/decryption.</li>
    </ul>
  </li>
  <li><ins>XfrmInStateModeError</ins>: If the packet is in IPsec tunnel mode, but the matched XFRM state is in transport mode.</li>
  <li><ins>XfrmInStateSeqError</ins>: If the anti-replay check rejected the packet.
If the check failed because the sequence number was outside the window, the <code class="language-plaintext highlighter-rouge">replay-window</code> counter of the associated XFRM state will be incremented.
If it failed because the sequence number was seen already, the <code class="language-plaintext highlighter-rouge">replay</code> counter is incremented instead.</li>
  <li><ins>XfrmInStateExpired</ins>: There can be a delay between when a state expires (hard limits) and when it’s actually deleted.
During that time, matching packets are dropped with <code class="language-plaintext highlighter-rouge">XfrmInStateExpired</code> on ingress.</li>
  <li><ins>XfrmInStateMismatch</ins>:
    <ul>
      <li>If the encapsulation protocol of the XFRM state (ex., <code class="language-plaintext highlighter-rouge">espinudp</code> in <code class="language-plaintext highlighter-rouge">encap</code> field of <code class="language-plaintext highlighter-rouge">ip xfrm state</code>) doesn’t match the encapsulation protocol of the packet.</li>
      <li>If the decrypted packet doesn’t match the selector (<code class="language-plaintext highlighter-rouge">sel</code> field) of the used XFRM state.</li>
    </ul>
  </li>
  <li><ins>XfrmInStateInvalid</ins>: If received packet matched an XFRM state that is being deleted or that expired.</li>
  <li><ins>XfrmInTmplMismatch</ins>:
    <ul>
      <li>If a packet matches an XFRM policy with a non-optional template, but the template doesn’t match any of the XFRM states used to decrypt the packet (yes, a packet can be decoded multiple times).</li>
      <li>If an XFRM state with <code class="language-plaintext highlighter-rouge">mode tunnel</code> was used on the packet and it doesn’t match any XFRM policy template.</li>
    </ul>
  </li>
  <li><ins>XfrmInNoPols</ins>: If the ingressing packet doesn’t match any XFRM policy and the default action is set to <code class="language-plaintext highlighter-rouge">block</code>.
See <code class="language-plaintext highlighter-rouge">ip xfrm policy {get,set}default</code> to view and set the default XFRM policy actions.</li>
  <li><ins>XfrmInPolBlock</ins>: If the packet matches an XFRM IN policy with <code class="language-plaintext highlighter-rouge">action block</code>.</li>
  <li><ins>XfrmOutError</ins>:
    <ul>
      <li>If the kernel fails to allocate memory during encryption.</li>
      <li>In some cases, if the packet to encrypt is malformed.</li>
    </ul>
  </li>
  <li><ins>XfrmOutBundleCheckError</ins>: Unused.</li>
  <li><ins>XfrmOutNoStates</ins>: If the packet matched an XFRM OUT policy, but no XFRM state was found that matches the policy’s template.</li>
  <li><ins>XfrmOutStateProtoError</ins>: If a protocol-specific (ex., ESP) encryption error happens.</li>
  <li><ins>XfrmOutStateModeError</ins>: If the packet exceeds the MTU once encapsulated and it shouldn’t be fragmented.</li>
  <li><ins>XfrmOutStateSeqError</ins>: The output sequence number (<code class="language-plaintext highlighter-rouge">oseq</code>) of an XFRM state reached its maximum value, <code class="language-plaintext highlighter-rouge">UINT32_MAX</code> when not using ESN mode.</li>
  <li><ins>XfrmOutStateExpired</ins>: There can be a delay between when a state expires (hard limits) and when it’s actually deleted.
During that time, matching packets are dropped with <code class="language-plaintext highlighter-rouge">XfrmOutStateExpired</code> on egress.</li>
  <li><ins>XfrmOutPolBlock</ins>: If the packet matches an XFRM OUT policy with <code class="language-plaintext highlighter-rouge">action block</code>.</li>
  <li><ins>XfrmOutPolDead</ins>: Unused.
<code class="language-plaintext highlighter-rouge">XfrmOutStateInvalid</code> is reported instead for XFRM states that in the process of being deleted.</li>
  <li><ins>XfrmOutPolError</ins>:
    <ul>
      <li>If too many XFRM policy templates apply to a packet.
The maximum is also set to <code class="language-plaintext highlighter-rouge">XFRM_MAX_DEPTH</code> (6).</li>
      <li>If no XFRM state is found for a non-optional template of the matching XFRM policy.</li>
    </ul>
  </li>
  <li><ins>XfrmFwdHdrError</ins>: If the packet is malformed when going through the FWD policy check.</li>
  <li><ins>XfrmOutStateInvalid</ins>: If egressing packet matched an XFRM state that is being deleted or that expired.</li>
  <li><ins>XfrmOutStateDirError</ins>: If the direction of the XFRM state found during the lookup is defined and isn’t <code class="language-plaintext highlighter-rouge">XFRM_SA_DIR_OUT</code>.
Only on kernels v6.10 and newer.</li>
  <li><ins>XfrmInStateDirError</ins>: If the direction of the XFRM state found during the lookup is defined and isn’t <code class="language-plaintext highlighter-rouge">XFRM_SA_DIR_IN</code>.
Only on kernels v6.10 and newer.</li>
</ul>

<h3 id="performance-considerations">Performance Considerations</h3>

<p>This section describes the data structures used to hold the XFRM policies and states.
This is useful to understand when dealing with a large number of states and policies as the information they hold can help improve indexing and speed up the lookups.
When dealing with thousands of policies and states, the lookup cost can become non-negligible even when compared to the encryption/decryption cost.</p>

<h4 id="data-structure-for-xfrm-policies">Data Structure for XFRM Policies</h4>

<p>XFRM policies are stored in a rather complex data structure made of multiple red-black trees and hash tables.
At the root, everything is contained in a <a href="https://lwn.net/Articles/751974/">resizable hash table</a> indexed by network namespace, IP family, direction, and interface (in case XFRM interfaces are used).
Each entry in this resizable hash table contains several black-red trees, which themselves hold the XFRM policies.
Those entries are represented by the structure <code class="language-plaintext highlighter-rouge">xfrm_pol_inexact_bin</code>.</p>

<!--  -->
<div>
<div class="mermaid zoomable">graph LR
table(resizable<br />hashtable) --&gt; bin1(...)
table(resizable<br />hashtable) --&gt; bin(xfrm_pol_inexact_bin)
table(resizable<br />hashtable) --&gt; bin2(...)
bin --&gt; rbtree_dst(root_d)
bin --&gt; rbtree_src(root_s)
bin --&gt; list_anyany((any;any))
rbtree_src --&gt; node_src1(( ))
rbtree_src --&gt; node_src2(( ))
node_src1 --&gt; node_src3(( ))
node_src1 --&gt; node_src4(( ))
node_src2 --&gt; node_src5(( ))
node_src2 --&gt; node_src6(( ))
node_src6 --&gt; list_srcany((src;any))
rbtree_dst --&gt; node_dst1(( ))
rbtree_dst --&gt; node_dst2(( ))
node_dst1 --&gt; node_dst3(( ))
node_dst1 --&gt; node_dst4(( ))
node_dst2 --&gt; node_dst5(( ))
node_dst2 --&gt; node_dst6(( ))
node_dst5 --&gt; rbtree_dstsrc(root)
node_dst5 --&gt; list_dstany((any;dst))
rbtree_dstsrc --&gt; node_dstsrc1(( ))
rbtree_dstsrc --&gt; node_dstsrc2(( ))
node_dstsrc1 --&gt; node_dstsrc3(( ))
node_dstsrc1 --&gt; node_dstsrc4(( ))
node_dstsrc2 --&gt; node_dstsrc5(( ))
node_dstsrc2 --&gt; node_dstsrc6(( ))
node_dstsrc6 --&gt; list_dstsrc((src;dst))
</div>
</div>
<!--  -->

<p>Once <code class="language-plaintext highlighter-rouge">xfrm_pol_inexact_bin</code> has been retrieved (based on current IP family, namespace, and direction), each of its red-black trees is looked up using the source and destination IP addresses.
The <code class="language-plaintext highlighter-rouge">root_s</code> tree contains policies sorted by source IP addresses; the <code class="language-plaintext highlighter-rouge">root_d</code> tree contains policies sorted by destination IP addresses.
In addition, leaf nodes of the <code class="language-plaintext highlighter-rouge">root_d</code> tree also contain another tree with policies sorted by source IP addresses.
That allows the lookups into <code class="language-plaintext highlighter-rouge">root_s</code> and <code class="language-plaintext highlighter-rouge">root_d</code> to return three lists of candidate <code class="language-plaintext highlighter-rouge">(src_ip; dst_ip)</code> policies from the leaf nodes:</p>
<ul>
  <li>A list of <code class="language-plaintext highlighter-rouge">(src_ip; any)</code> candidates from <code class="language-plaintext highlighter-rouge">root_s</code>.</li>
  <li>A list of <code class="language-plaintext highlighter-rouge">(any; dst_ip)</code> candidates from <code class="language-plaintext highlighter-rouge">root_d</code>.</li>
  <li>A list of <code class="language-plaintext highlighter-rouge">(src_ip; dst_ip)</code> candidates from the trees pointed by the leaf nodes of <code class="language-plaintext highlighter-rouge">root_d</code>.</li>
</ul>

<p>These three lists of candidate XFRM policies are completed by a list of <code class="language-plaintext highlighter-rouge">(any; any)</code> candidates directly stored in the <code class="language-plaintext highlighter-rouge">xfrm_pol_inexact_bin</code> entry.</p>

<p>Note that an XFRM policy will only be present in one of the four candidate lists, according to its source and destination CIDRs.</p>

<p>These four lists of candidate XFRM policies are then evaluated.
The kernel iterates through each list, looking for the highest-priority (lowest <code class="language-plaintext highlighter-rouge">priority</code> number) candidate that matches the packet.
If two policies match and have the same priority, the newest one is preferred.
It’s also only during this linear evaluation of candidates that the packet mark is compared with the policy marks.</p>

<h4 id="data-structure-for-xfrm-states">Data Structure for XFRM States</h4>

<p>XFRM states are organized in four hash tables, with different XFRM fields used for indexing and different purposes:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">net-&gt;xfrm.state_bydst</code> is indexed by source and destination IP addresses as well as reqid.</li>
  <li><code class="language-plaintext highlighter-rouge">net-&gt;xfrm.state_bysrc</code> is indexed only by source and destination IP addresses.</li>
  <li><code class="language-plaintext highlighter-rouge">net-&gt;xfrm.state_byspi</code> is indexed by destination IP address, SPI, and protocol.</li>
  <li><code class="language-plaintext highlighter-rouge">net-&gt;xfrm.state_byseq</code> is indexed by sequence number only.</li>
</ul>

<p><code class="language-plaintext highlighter-rouge">net-&gt;xfrm.state_byspi</code> is used when looking up an XFRM state for ingressing packets.
This makes sense to speed up the search as each XFRM state is encouraged to have its own SPI (cf., <a href="https://datatracker.ietf.org/doc/html/rfc4301">RFC4301</a>, section 4.1) and the encrypted packets carry the SPI.</p>

<p>When searching for the XFRM state that corresponds to an XFRM policy template (before encryption), <code class="language-plaintext highlighter-rouge">net-&gt;xfrm.state_bydst</code> is used.
That makes sense because the indexing information is what the XFRM policy template provides.
That hash table is typically also the one being used when iterating through all XFRM states (ex., when flushing them), but any hash table would do the job for that.</p>

<p><code class="language-plaintext highlighter-rouge">net-&gt;xfrm.state_bysrc</code> and <code class="language-plaintext highlighter-rouge">net-&gt;xfrm.state_byseq</code> are used for various other management tasks, such as looking up an XFRM state to update, answering a netlink query from the user, or checking for existing states before adding a new one.</p>

<h3 id="conclusion">Conclusion</h3>

<p>There’s still a lot that would need to be covered for this guide to be a complete reference on XFRM/IPsec in Linux.
To cite a few, this guide didn’t cover <a href="https://docs.strongswan.org/docs/5.9/features/routeBasedVpn.html#_xfrm_interfaces_on_linux">XFRM interfaces</a>, <a href="https://docs.strongswan.org/docs/5.9/features/routeBasedVpn.html#_vti_devices_on_linux">VTIs</a>, <a href="https://datatracker.ietf.org/doc/html/rfc8229">ESP-in-TCP</a>, <a href="https://datatracker.ietf.org/doc/html/rfc3948">ESP-in-UDP</a>, <a href="https://docs.kernel.org/networking/xfrm_device.html">XFRM offloads</a>, or <a href="https://datatracker.ietf.org/doc/rfc9611/">per-resource child SAs</a>.
Nonetheless, it should constitute a good basis to configure and troubleshoot XFRM configurations for IPsec tunneling.</p>

<p><br /></p>

<p>Thanks to Louis, Gray, and Simone for identifying several mistakes and helping with wordsmithing.</p>

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:netfilter-diagram-errors" role="doc-endnote">
      <p>For example, after XFRM decode, packets actually hit the tc/qdisc of the ingress device again, as you can see if you hook into tc-bpf. <a href="#fnref:netfilter-diagram-errors" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="xfrm" /><summary type="html"><![CDATA[This post aims to be a relatively complete reference guide for the XFRM subsystem in the Linux kernel, when used for IPsec. It covers the basic configuration, the packet flows, the meaning of all state and policy fields, the impact of all XFRM errors, and some performance considerations.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://pchaigno.github.io/assets/netfilter-with-xfrm.png" /><media:content medium="image" url="https://pchaigno.github.io/assets/netfilter-with-xfrm.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Research Grant from the eBPF Foundation</title><link href="https://pchaigno.github.io/ebpf/2024/04/02/research-grant-ebpf-foundation.html" rel="alternate" type="text/html" title="Research Grant from the eBPF Foundation" /><published>2024-04-02T08:26:10+00:00</published><updated>2024-04-02T08:26:10+00:00</updated><id>https://pchaigno.github.io/ebpf/2024/04/02/research-grant-ebpf-foundation</id><content type="html" xml:base="https://pchaigno.github.io/ebpf/2024/04/02/research-grant-ebpf-foundation.html"><![CDATA[<p>Last year, the first workshop entirely dedicated to eBPF was hosted by the SIGCOMM conference.
Today, I’m happy to share the first eBPF research grant, from the eBPF Foundation!</p>

<p style="font-size: 110%; text-align: center;"><a href="https://ebpf.foundation/funding-opportunities">ebpf.foundation/funding-opportunities</a></p>

<p>Despite being developed and maintained in large part by industry, eBPF has always had strong ties with the academic community.
The eBPF ancestor, <a href="https://www.tcpdump.org/papers/bpf-usenix93.pdf">cBPF</a>, was first published at Usenix Winter 1993.
Subsequent work on eBPF, such as <a href="https://raw.githubusercontent.com/tohojo/xdp-paper/master/xdp-the-express-data-path.pdf">XDP</a> or <a href="/ebpf/2023/09/06/prevail-understanding-the-windows-ebpf-verifier.html">PREVAIL</a>, was also published at top academic conferences, often in the context of industry-academia partnerships.
In addition, on multiple occasions, Alexei Starovoitov stated his goal to build eBPF as an enabler for innovation.
I believe this was always well understood by the academic community, with papers at top conferences regularly building on eBPF<sup id="fnref:building-on-ebpf" role="doc-noteref"><a href="#fn:building-on-ebpf" class="footnote" rel="footnote">1</a></sup>.</p>

<p>This new grant comes as a $25–50k unrestricted gift and can for instance be used to cover part of a PhD student’s salary.
I hope it will serve as an additional incentive for the kernel and academic communities to collaborate.
I’m a strong believer that the kernel community would benefit from further research on eBPF and its verifier, particularly in formal verification, static analysis, and compiler theory.</p>

<p>Thanks a lot to Daniel Borkmann and Bill Mulligan for setting this up!</p>

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:building-on-ebpf" role="doc-endnote">
      <p>The <a href="https://www.usenix.org/system/files/conference/atc18/atc18-amit.pdf">Hyperupcalls</a> (ATC’18), <a href="/ebpf/2020/11/04/hxdp-efficient-software-packet-processing-on-fpga-nics.html">hXDP</a> (OSDI’20), <a href="/ebpf/2021/04/12/bmc-accelerating-memcached-using-bpf-and-xdp.html">BMC</a> (NSDI’21), and <a href="https://www.vldb.org/pvldb/vol16/p3335-butrovich.pdf">Tigger</a> (VLDB’23) papers come to mind, among many others. <a href="#fnref:building-on-ebpf" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ebpf" /><summary type="html"><![CDATA[This post discusses the publication of the first grant dedicated to eBPF research, with a short introduction on the links between industry and academia in the eBPF community.]]></summary></entry><entry><title type="html">PREVAIL: Understanding the Windows eBPF Verifier</title><link href="https://pchaigno.github.io/ebpf/2023/09/06/prevail-understanding-the-windows-ebpf-verifier.html" rel="alternate" type="text/html" title="PREVAIL: Understanding the Windows eBPF Verifier" /><published>2023-09-06T08:26:10+00:00</published><updated>2023-10-30T14:50:00+00:00</updated><id>https://pchaigno.github.io/ebpf/2023/09/06/prevail-understanding-the-windows-ebpf-verifier</id><content type="html" xml:base="https://pchaigno.github.io/ebpf/2023/09/06/prevail-understanding-the-windows-ebpf-verifier.html"><![CDATA[<p>In 2021, Microsoft open sourced their <a href="https://cloudblogs.microsoft.com/opensource/2021/05/10/making-ebpf-work-on-windows/">eBPF-for-Windows project</a>.
They rely on existing open-source projects to JIT-compile, interpret, and verify BPF programs.
Interestingly, <a href="https://github.com/vbpf/ebpf-verifier">PREVAIL</a>, the BPF verifier they use, originated from peer-reviewed academic work and contrasts significantly with the Linux verifier.</p>

<p>In this blog post, I’ll summarize <a href="https://vbpf.github.io/assets/prevail-paper.pdf">the PREVAIL paper</a> with a strong focus on its design.
I will also introduce its formalism and have a quick look at the evaluations.
The PREVAIL implementation evolved a lot since the paper was published in 2019, yet the design stayed the same.
Some of the limitations may have been removed and the evaluation numbers may have changed.</p>

<!--  -->
<ul id="toc" class="section-nav">
<li class="toc-entry toc-h3"><a href="#introduction">Introduction</a></li>
<li class="toc-entry toc-h3"><a href="#abstract-interpretation">Abstract Interpretation</a>
<ul>
<li class="toc-entry toc-h4"><a href="#introductory-example">Introductory Example</a></li>
<li class="toc-entry toc-h4"><a href="#abstract-domains">Abstract Domains</a></li>
</ul>
</li>
<li class="toc-entry toc-h3"><a href="#abstract-domain-requirements-for-prevail">Abstract Domain Requirements for PREVAIL</a></li>
<li class="toc-entry toc-h3"><a href="#formalism-of-prevail">Formalism of PREVAIL</a>
<ul>
<li class="toc-entry toc-h4"><a href="#formal-representation">Formal Representation</a></li>
<li class="toc-entry toc-h4"><a href="#formalizing-memory-writes">Formalizing Memory Writes</a></li>
</ul>
</li>
<li class="toc-entry toc-h3"><a href="#implementation-of-prevail-and-limitations">Implementation of PREVAIL and Limitations</a></li>
<li class="toc-entry toc-h3"><a href="#accuracy-and-cost-evaluations">Accuracy and Cost Evaluations</a></li>
<li class="toc-entry toc-h3"><a href="#conclusion">Conclusion</a></li>
<li class="toc-entry toc-h3"><a href="#addendum-false-positive-example">Addendum: False Positive Example</a></li>
</ul>
<!-- ) -->

<h3 id="introduction">Introduction</h3>

<p>In this paper, the authors introduce PREVAIL<sup id="fnref:prevail-stands-for" role="doc-noteref"><a href="#fn:prevail-stands-for" class="footnote" rel="footnote">1</a></sup>, an alternative static analyzer for eBPF bytecode, using abstract interpretation techniques.
As is the usage, they introduce their results in the abstract:</p>

<blockquote>
  <p>Our evaluation, based on real-world eBPF programs, shows that [PREVAIL] generates no more false alarms than the existing Linux verifier, while it supports a wider class of programs (including programs with loops) and has better asymptotic complexity.</p>
</blockquote>

<p>Early in the paper, the authors make one important observation:</p>

<blockquote>
  <p><em>The need for a better verifier is widely recognized by eBPF developers</em>.</p>
</blockquote>

<p>That’s true and I’m glad to see it is also clear to the academic community.
They describe four aspects on which the verifier could be improved:</p>

<blockquote>
  <ul>
    <li>First, the verifier reports many false positives, forcing developers to heavily massage their code for the verifier to accept it, e.g., by inserting redundant checks and redundant accesses.</li>
    <li>Second, the verifier does not scale to programs with a large number of paths.</li>
    <li>Third, it does not currently support programs with loops.</li>
    <li>Finally, the verifier lacks a formal foundation.</li>
  </ul>
</blockquote>

<p>The first and second points are probably the main issues today.
Because the verifier runs on low-level bytecode, it doesn’t have all of the high-level information from the original C program<sup id="fnref:btf-revolution" role="doc-noteref"><a href="#fn:btf-revolution" class="footnote" rel="footnote">2</a></sup>.
As a result, it sometimes struggles to keep track of and verify code optimized by the compiler<sup id="fnref:example-clang-optimization" role="doc-noteref"><a href="#fn:example-clang-optimization" class="footnote" rel="footnote">3</a></sup>.</p>

<p>The second point only affects large BPF projects such as Cilium, but can be hard to resolve, as small changes in the code and compiler options can lead the verifier to reject programs. 
On newer kernels, support for function-by-function verification makes this a lot more manageable, by allowing developers to break programs into smaller pieces.</p>

<p>Support for bounded loops was merged in Linux v5.3.
It was then extended to support various loop structures of arbitrary sizes via BPF helpers (<code class="language-plaintext highlighter-rouge">bpf_loop</code>) and kfuncs (e.g., <code class="language-plaintext highlighter-rouge">bpf_iter_num_next</code> and <code class="language-plaintext highlighter-rouge">bpf_for</code> macro).</p>

<p>Finally, I’m not sure the lack of formal foundations should be an argument in itself, but I guess the point is that formal foundations would allow us to reason about the correctness of the verifier.</p>

<p><br /></p>

<h3 id="abstract-interpretation">Abstract Interpretation</h3>

<p>This section aims to provide a short introduction to abstract interpretation, the static analysis technique used by PREVAIL.
I’ll focus on the minimal information needed to understand the paper.
For a more thorough introduction, you can refer to <a href="https://wiki.mozilla.org/Abstract_Interpretation">the Mozilla wiki</a>.</p>

<h4 id="introductory-example">Introductory Example</h4>

<p>Abstract interpretation is a technique for static program analysis, used to analyze a program’s behavior over all possible inputs.
Since finding all possible runtime errors in an arbitrary program is <a href="https://en.wikipedia.org/wiki/Undecidable_problem">undecidable</a>, static analysis trades complete coverage of possible inputs for an approximate result (e.g., rejecting safe programs).</p>

<p>Abstract interpretation achieves this by using <em>abstract values</em> for variables.
As an example, we will analyze the snippet of BPF bytecode below with integer intervals as abstract values for our variables.</p>

<p>This snippet of bytecode reads 16 bits from memory (instruction 4), at offset <code class="language-plaintext highlighter-rouge">r0 + r1</code>, with <code class="language-plaintext highlighter-rouge">r0</code> pointing to a BPF map value.
At instruction 1, we check that the value in <code class="language-plaintext highlighter-rouge">r1</code> is bounded. If it is not, we bound it with a bitmask at instruction 2.</p>

<figure class="highlight"><pre><code class="language-bpf" data-lang="bpf">  <span class="c1">// r0 is a non-null pointer to a map value.
</span>  <span class="c1">// r1 initially can be any positive value on 64-bits.
</span>  <span class="gl">0</span><span class="p">:</span><span class="w"> </span><span class="n">r6</span> <span class="o">=</span> <span class="n">r0</span>
  <span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">r1</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="nl">pc</span>+<span class="m">1</span><span class="w">  </span><span class="c1">// Jump to insn 3 if r1 is bounded.
</span>  <span class="gl">2</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">&amp;=</span> <span class="m">0xf</span><span class="w">             </span><span class="c1">// If it is not, bound it.
</span>  <span class="gl">3</span><span class="p">:</span><span class="w"> </span><span class="n">r6</span> <span class="o">+=</span> <span class="n">r1</span>
  <span class="gl">4</span><span class="p">:</span><span class="w"> </span><span class="n">r7</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">u16</span><span class="w"> </span><span class="o">*</span><span class="p">)(</span><span class="n">r6</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="c1">// Read map value.</span></code></pre></figure>

<p>We are interested in the value of <code class="language-plaintext highlighter-rouge">r1</code> at the entry of instruction 3, before it’s used for a memory access.
The initial abstract value for <code class="language-plaintext highlighter-rouge">r1</code> is <code class="language-plaintext highlighter-rouge">[0; MAX_UINT64]</code>.
It represents the set of possible concrete values <code class="language-plaintext highlighter-rouge">r1</code> can take at instruction 0.
When we reach the conditional jump, we analyze both paths.
If the condition is true, then we can update the abstract value to <code class="language-plaintext highlighter-rouge">[0; 13]</code>.
If false, we reach instruction 2 and can update <code class="language-plaintext highlighter-rouge">r1</code> to <code class="language-plaintext highlighter-rouge">[0; 15]</code>.</p>

<p>So far it looks very similar to what the Linux verifier would do.
That changes at instruction 3.
Instead of continuing to analyze the two paths independently, we will use the join operation<sup id="fnref:ai-operations" role="doc-noteref"><a href="#fn:ai-operations" class="footnote" rel="footnote">4</a></sup>, <code class="language-plaintext highlighter-rouge">⨆</code>.
In particular, we can define the abstract value of <code class="language-plaintext highlighter-rouge">r1</code> at instruction 3 as the join of <code class="language-plaintext highlighter-rouge">r1</code>’s abstract values after instructions 1 and 2, that is <code class="language-plaintext highlighter-rouge">[0; 13] ⨆ [0; 15] = [0; 15]</code>.</p>

<p>This analysis tells us that the memory access at instruction 4 is unsafe (out of bounds) if the map value is 16-bytes long or less (2 bytes access at maximum offset 15).</p>

<p><a id="fnref:addendum-false-positive-example"> </a>
See <a href="#addendum-false-positive-example">the addendum</a> for a second example in which the integer intervals leads to a loss of precision and a false positive.</p>

<p><br /></p>

<h4 id="abstract-domains">Abstract Domains</h4>

<p>The <em>Interval</em> abstract domain, which we’ve used above, is only one domain among many that can be used for abstract interpretation.
We can cite for example, the <em>Parity</em> domain, to track odd and even numbers, or the <em>Polyhedra</em> domain, which can track linear relationships between variables.
The table below<sup id="fnref:domain-table-source" role="doc-noteref"><a href="#fn:domain-table-source" class="footnote" rel="footnote">5</a></sup> gives a few examples of abstract numerical domains, from least expressive to most expressive (<code class="language-plaintext highlighter-rouge">c</code> and <code class="language-plaintext highlighter-rouge">a</code> being constants, <code class="language-plaintext highlighter-rouge">x</code> variables).
The abstract domain to use depends on the application and is often a tradeoff between the computational cost and what can be analyzed.</p>

<table>
  <thead>
    <tr>
      <th>Numerical domain</th>
      <th>Representable constraints</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Parity</td>
      <td><code class="language-plaintext highlighter-rouge">x % 2 == c</code></td>
    </tr>
    <tr>
      <td>Interval</td>
      <td><code class="language-plaintext highlighter-rouge">±x<sub>i</sub> &lt;= c</code></td>
    </tr>
    <tr>
      <td>Zone</td>
      <td><code class="language-plaintext highlighter-rouge">(±x<sub>i</sub> &lt;= c)</code> and <code class="language-plaintext highlighter-rouge">(x<sub>i</sub> - x<sub>j</sub> &lt;= c)</code></td>
    </tr>
    <tr>
      <td>Octagon</td>
      <td><code class="language-plaintext highlighter-rouge">(±x<sub>i</sub> &lt;= c)</code> and <code class="language-plaintext highlighter-rouge">(±x<sub>i</sub> ± x<sub>j</sub> &lt;= c)</code></td>
    </tr>
    <tr>
      <td>Polyhedra</td>
      <td><code class="language-plaintext highlighter-rouge">a<sub>1</sub>x<sub>1</sub> + a<sub>2</sub>x<sub>2</sub> + ... +a<sub>n</sub>x<sub>n</sub> &lt;= c, a<sub>i</sub> ∈ Z</code></td>
    </tr>
  </tbody>
</table>

<p>So for example, with the Interval domain, you could imagine having constraints <code class="language-plaintext highlighter-rouge">x<sub>1</sub> &lt;= 2</code>, <code class="language-plaintext highlighter-rouge">-x<sub>1</sub> &lt;= 0</code>, and <code class="language-plaintext highlighter-rouge">x<sub>2</sub> &lt;= 0</code>.
In other words, <code class="language-plaintext highlighter-rouge">x<sub>1</sub> ∈ [0; 2]</code> and <code class="language-plaintext highlighter-rouge">x<sub>2</sub> ∈ ]-∞; 0]</code>.</p>

<p>More expressive abstract domains are also usually more expensive.
For example, while the join operation for the Interval domain has complexity <code class="language-plaintext highlighter-rouge">O(n)</code> (with <code class="language-plaintext highlighter-rouge">n</code> the number of variables), the same operation has complexity <code class="language-plaintext highlighter-rouge">O(n²)</code> in the Octagon domain.</p>

<p>One important aspect of the domain’s expressiveness is whether they are <em>relational</em>, meaning that they can express relations between variables.
Zone for example can preserve some relations between variables <code class="language-plaintext highlighter-rouge">x<sub>i</sub></code> and <code class="language-plaintext highlighter-rouge">x<sub>j</sub></code> with its second constraint type.
In the table above, we can see that Zone, Octagon, and Polyhedra are <em>relational</em> domains, while Parity and Interval are <em>non-relational</em>.</p>

<p>For more information on abstract domains, you can check <a href="http://elina.ethz.ch/slides/pldi2015-fast-octagon.pdf">these PLDI 2015 slides</a>, which include a walkthrough of a program analysis with Octagon (slides 14–30).
<a href="https://youtu.be/SdOaoIcVZAY?t=120">The POPL 2017 presentation</a> from the same author includes an example assertion that can be proven by Polyhedra but not by Octagon.</p>

<p>Let’s go back to our PREVAIL paper.</p>

<p><br /></p>

<h3 id="abstract-domain-requirements-for-prevail">Abstract Domain Requirements for PREVAIL</h3>

<p>Using a couple of example BPF programs, the authors make several observations that will drive the design of PREVAIL.</p>

<blockquote>
  <p>An eBPF program can access a fixed set of memory regions, known at compile time. […] The program can acquire access to additional regions via the maps API [5]. Such regions can be shared by multiple processes, as well as between kernel and user-space applications.</p>
</blockquote>

<p>This is a key observation for memory accesses.
BPF programs can access different memory regions including the stack, <em>context</em> (e.g., <em>skb_buff</em>), packet data, and map values.
All of these regions except the packet have a static size, known at the time of verification.</p>

<p>Because the size of the packet is not known during verification, developers of BPF programs must implement bounds checks on the packet.
For example:</p>

<figure class="highlight"><pre><code class="language-c" data-lang="c">  <span class="k">if</span> <span class="p">(</span><span class="n">packet_ptr</span> <span class="o">+</span> <span class="n">access_size</span> <span class="o">&gt;</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">data_end</span><span class="p">)</span> <span class="k">return</span> <span class="n">TC_ACT_DROP</span><span class="p">;</span></code></pre></figure>

<p>This leads the authors to make the following observation:</p>

<blockquote>
  <p><strong>Observation 1.</strong> The analysis must track binary relations among registers.</p>
</blockquote>

<p>In other words, to understand the bounds of <code class="language-plaintext highlighter-rouge">packet_ptr</code>, the analysis must be able to track relations between variables (in our case, between <code class="language-plaintext highlighter-rouge">data_end</code> and <code class="language-plaintext highlighter-rouge">packet_ptr + access_size</code>).
That in turns limits the choice of abstract domain to <em>relational abstract domains</em>.</p>

<blockquote>
  <p><strong>Observation 2.</strong> The analysis must track values in memory, including relations between different locations, as if they were registers.</p>
</blockquote>

<p>This second observation comes from the use of <em>register spilling</em>.
When all registers are in use, the compiler can move some of their contents to the stack, to load it back into registers at a later time.
If we don’t track those register contents while on the stack, we would lose all of their information.</p>

<blockquote>
  <p><strong>Observation 3.</strong> As eBPF programs are getting larger and more complex, verification via path enumeration is becoming infeasible.</p>
</blockquote>

<p>The number of paths through a program grows exponentially with the number of branches.
To scale to large programs, the Linux verifier makes use of state pruning, which allows it to recognize already-verified states.
Abstract interpretation is an interesting alternative as it was designed specifically to address this problem.</p>

<p><br /></p>

<h3 id="formalism-of-prevail">Formalism of PREVAIL</h3>

<p>I’ll now dive into the formalism of PREVAIL.
I will give pointers to understand the notations and some of their underlying intuitions.
If that aspect doesn’t interest you, you can skip ahead to <a href="#implementation-of-prevail-and-limitations">the implementation</a>.</p>

<h4 id="formal-representation">Formal Representation</h4>

<blockquote>
  <p>eBPF programs manipulate two kinds of regions: private regions, which can be accessed only by the program, and shared regions, which are used for intra-kernel inter-process communication.</p>
</blockquote>

<p>The authors distinguish between <em>private</em> (stack, context, packet) and <em>shared</em> (e.g., map values) memory regions.
Map values are shared memory regions because they may be modified at any time by another process or BPF program.
As such, they need special handling in the verifier.</p>

<blockquote>
  <p>We distinguish numerical values from pointers using tags: a value tagged <code class="language-plaintext highlighter-rouge">num</code> is a numerical value, while a value tagged <code class="language-plaintext highlighter-rouge">R</code> is a pointer offset into region <code class="language-plaintext highlighter-rouge">R</code>.</p>
</blockquote>

<p>PREVAIL models every variable with a tag and value.
Scalars are tagged <code class="language-plaintext highlighter-rouge">num</code>, stack pointers <code class="language-plaintext highlighter-rouge">stk</code>, packet pointers <code class="language-plaintext highlighter-rouge">pkt</code>, etc.
For pointers, the value represents the offset into the memory region represented by the tag.
Therefore, <code class="language-plaintext highlighter-rouge">(pkt, 4)</code> is a pointer at offset 4 into the packet, whereas <code class="language-plaintext highlighter-rouge">(num, 4)</code> represents the integer 4.</p>

<p>To represent the tags of shared memory regions (e.g., maps), the authors use the sizes of these regions:</p>

<blockquote>
  <p>First we abstract the tags of pointers to shared regions by the sizes of the regions they point to. This bounds the number of possible tags in any program <em>P</em>.</p>
</blockquote>

<p>The downside of this simple approach is that PREVAIL can’t tell two pointers to shared regions of the same size apart.
The authors therefore need to forbid subtractions and comparisons between such pointers.</p>

<blockquote>
  <p>as we can no longer tell whether two pointers to a shared region of size K point to the same region or not, we strengthen Safe() to forbid subtraction and less-than comparison between such pointers.</p>
</blockquote>

<p>Because of that change, PREVAIL can reject BPF program the Linux verifier would accept, but I doubt many programs are in this case in practice.</p>

<p style="text-align: center;"><img src="/assets/prevail/prevail-primitive-commands.png" alt="Formalism for the eBPF operations supported by the PREVAIL verifier." title="Formalism for the eBPF operations supported by the PREVAIL verifier." style="width: 45%;" /></p>

<p>The grammar in the above figure formalizes the primitive eBPF operations that PREVAIL supports.
The first operation defines assignments and ALU operations, while the second and third define load and store instructions respectively.
<code class="language-plaintext highlighter-rouge">assume</code> is used to state the conditions of conditional jumps.
<code class="language-plaintext highlighter-rouge">shared K</code> returns a pointer to a shared memory region of size <code class="language-plaintext highlighter-rouge">K</code>, typically for a BPF map lookup.</p>

<p><br /></p>

<h4 id="formalizing-memory-writes">Formalizing Memory Writes</h4>

<p>In the following, I will focus on the formalism for the store operation, used to write to memory.
See the paper for other operations.</p>

<script type="text/javascript">
function showFormula1Subset1() {
  $('#formula1-subset1').show();
}
function hideFormula1Subset1() {
  $('#formula1-subset1').hide();
}
function showFormula1Subset2() {
  $('#formula1-subset2').show();
}
function hideFormula1Subset2() {
  $('#formula1-subset2').hide();
}
function showFormula1Subset3() {
  $('#formula1-subset3').show();
}
function hideFormula1Subset3() {
  $('#formula1-subset3').hide();
}
function showFormula1Subset4() {
  $('#formula1-subset4').show();
}
function hideFormula1Subset4() {
  $('#formula1-subset4').hide();
}
function showFormula2Subset1() {
  $('#formula2-subset1').show();
}
function hideFormula2Subset1() {
  $('#formula2-subset1').hide();
}
function showFormula2Subset2() {
  $('#formula2-subset2').show();
}
function hideFormula2Subset2() {
  $('#formula2-subset2').hide();
}
function showFormula2Subset3() {
  $('#formula2-subset3').show();
}
function hideFormula2Subset3() {
  $('#formula2-subset3').hide();
}
function showFormula2Subset4() {
  $('#formula2-subset4').show();
}
function hideFormula2Subset4() {
  $('#formula2-subset4').hide();
}
function showFormula2Subset5() {
  $('#formula2-subset5').show();
}
function hideFormula2Subset5() {
  $('#formula2-subset5').hide();
}

jqueryReady(function() {
  $('#ref-formula1-subset1').hover(showFormula1Subset1, hideFormula1Subset1);
  $('#ref-formula1-subset2').hover(showFormula1Subset2, hideFormula1Subset2);
  $('#ref-formula1-subset3').hover(showFormula1Subset3, hideFormula1Subset3);
  $('#ref-formula1-subset4').hover(showFormula1Subset4, hideFormula1Subset4);
  $('#ref-formula2-subset1').hover(showFormula2Subset1, hideFormula2Subset1);
  $('#ref-formula2-subset2').hover(showFormula2Subset2, hideFormula2Subset2);
  $('#ref-formula2-subset3').hover(showFormula2Subset3, hideFormula2Subset3);
  $('#ref-formula2-subset4').hover(showFormula2Subset4, hideFormula2Subset4);
  $('#ref-formula2-subset5').hover(showFormula2Subset5, hideFormula2Subset5);
});
</script>

<p style="text-align: center;"><img id="formula1-subset1" src="/assets/prevail/prevail-safe-store-command-subset1.png" alt="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode. The bounds check part of the formula is framed." title="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode." style="position: absolute; z-index: 1; width: 444px; display: none;" />
<img id="formula1-subset2" src="/assets/prevail/prevail-safe-store-command-subset2.png" alt="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode. The pointer check part of the formula is framed." title="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode." style="position: absolute; width: 444px; display: none;" />
<img id="formula1-subset3" src="/assets/prevail/prevail-safe-store-command-subset3.png" alt="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode. The check on the pointer type is framed in the formula." title="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode." style="position: absolute; width: 444px; display: none;" />
<img id="formula1-subset4" src="/assets/prevail/prevail-safe-store-command-subset4.png" alt="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode. The specific case for packet pointers is framed in the formula." title="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode." style="position: absolute; width: 444px; display: none;" />
<img src="/assets/prevail/prevail-safe-store-command.png" alt="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode." title="Formalism of the PREVAIL verifier for the safety of the store operation in eBPF bytecode." style="width: 444px;" /></p>

<p>PREVAIL deems a store of <code class="language-plaintext highlighter-rouge">sz</code> bytes at memory pointed by <code class="language-plaintext highlighter-rouge">p</code> safe if:</p>
<ol>
  <li><span id="ref-formula1-subset1">it is within the bounds of the memory region of <code class="language-plaintext highlighter-rouge">p</code>, noted <code class="language-plaintext highlighter-rouge">e<sub>ρ</sub>(p)</code>, and</span></li>
  <li><span id="ref-formula1-subset2"><code class="language-plaintext highlighter-rouge">p</code> is a pointer (i.e., not tagged <code class="language-plaintext highlighter-rouge">num</code>), and</span></li>
  <li><span id="ref-formula1-subset3">in case the stored value <code class="language-plaintext highlighter-rouge">x</code> is a pointer, <code class="language-plaintext highlighter-rouge">p</code> points to the stack.</span></li>
</ol>

<p>The third condition is meant to ensure that pointers are never written to externally-visible memory locations (e.g., the packet) as that would lead to pointer leaks.
<span id="ref-formula1-subset4">You can also notice that in case <code class="language-plaintext highlighter-rouge">p</code> is a packet pointer, the upper-bound check is performed against the special variable <code class="language-plaintext highlighter-rouge">data_end</code> instead of the static region size, <code class="language-plaintext highlighter-rouge">sizeof(R)</code>.</span></p>

<p>The authors then define how the different eBPF operations impact the verification state.
The verification state is defined by the triple <code class="language-plaintext highlighter-rouge">σ = (e, μ, ζ)</code>, with <code class="language-plaintext highlighter-rouge">e</code> being the set of registers, <code class="language-plaintext highlighter-rouge">μ</code> the set of memory cells on the stack, and <code class="language-plaintext highlighter-rouge">ζ</code> the set of stack addresses holding scalars.
The example for an assignment of immediate value <code class="language-plaintext highlighter-rouge">K</code> to register <code class="language-plaintext highlighter-rouge">w</code> is trivial; it simply associates register <code class="language-plaintext highlighter-rouge">w</code> to state <code class="language-plaintext highlighter-rouge">(num, K)</code> in <code class="language-plaintext highlighter-rouge">e</code>:</p>

<p style="text-align: center;"><img src="/assets/prevail/prevail-meaning-safe-assign-command.png" alt="Formula in the PREVAIL verifier for how an immediate value assignment to a register impacts the formal verifier state." title="Formula in the PREVAIL verifier for how an immediate value assignment to a register impacts the formal verifier state." style="width: 40%;" /></p>

<p>As shown below, the store operation is a bit more involved to track.
<span id="ref-formula2-subset1">First, if the store is to a region other than the stack, the verification state can be left as is; it doesn’t need to be tracked.</span>
<span id="ref-formula2-subset2">Otherwise, both <code class="language-plaintext highlighter-rouge">μ</code> and <code class="language-plaintext highlighter-rouge">ζ</code> need to be updated.</span>
<span id="ref-formula2-subset3">In <code class="language-plaintext highlighter-rouge">μ</code>, the register <code class="language-plaintext highlighter-rouge">e(x)=(R,n)</code> is associated to the memory cell defined by its position <code class="language-plaintext highlighter-rouge">e<sub>n</sub>(p)</code> and size <code class="language-plaintext highlighter-rouge">sz</code>.</span>
<span id="ref-formula2-subset4">Any other memory cell overlapping with this store is removed from <code class="language-plaintext highlighter-rouge">μ</code>.</span>
<span id="ref-formula2-subset5">Finally, addresses overwritten by the store are added or removed from <code class="language-plaintext highlighter-rouge">ζ</code> depending on whether the stored register <code class="language-plaintext highlighter-rouge">x</code> holds a scalar or not.</span></p>

<p style="text-align: center;"><img id="formula2-subset1" src="/assets/prevail/prevail-meaning-safe-store-command-subset1.png" alt="Formula in the PREVAIL verifier for how a store operation impacts the verifier state. The case for store operations to regions other than the stack is framed." title="Formula in the PREVAIL verifier for how a store operation impacts the verifier state." style="position: absolute; width: 444px; display: none;" />
<img id="formula2-subset2" src="/assets/prevail/prevail-meaning-safe-store-command-subset2.png" alt="Formula in the PREVAIL verifier for how a store operation impacts the verifier state. The case for store operations to the stack is framed." title="Formula in the PREVAIL verifier for how a store operation impacts the verifier state." style="position: absolute; width: 444px; display: none;" />
<img src="/assets/prevail/prevail-meaning-safe-store-command.png" alt="Formula in the PREVAIL verifier for how a store operation impacts the verifier state." title="Formula in the PREVAIL verifier for how a store operation impacts the verifier state." style="width: 444px;" /></p>

<p style="text-align: center;"><img id="formula2-subset3" src="/assets/prevail/prevail-store-helper-function-subset3.png" alt="Helper function in the PREVAIL verifier for the formalism of store operations in eBPF. The update of memory cells on the stack is framed in the formula." title="Helper function in the PREVAIL verifier for the formalism of store operations in eBPF." style="position: absolute; width: 370px; display: none;" />
<img id="formula2-subset4" src="/assets/prevail/prevail-store-helper-function-subset4.png" alt="Helper function in the PREVAIL verifier for the formalism of store operations in eBPF. The update of overlapping memory cells on the stack is framed in the formula." title="Helper function in the PREVAIL verifier for the formalism of store operations in eBPF." style="position: absolute; width: 370px; display: none;" />
<img id="formula2-subset5" src="/assets/prevail/prevail-store-helper-function-subset5.png" alt="Helper function in the PREVAIL verifier for the formalism of store operations in eBPF. The update of the set of stack addresses holding scalars is framed in the formula." title="Helper function in the PREVAIL verifier for the formalism of store operations in eBPF." style="position: absolute; width: 370px; display: none;" />
<img src="/assets/prevail/prevail-store-helper-function.png" alt="Helper function in the PREVAIL verifier for the formalism of store operations in eBPF." title="Helper function in the PREVAIL verifier for the formalism of store operations in eBPF." style="width: 370px;" /></p>

<p><br /></p>

<h3 id="implementation-of-prevail-and-limitations">Implementation of PREVAIL and Limitations</h3>

<p>The implementation section helps to understand the main limitations of PREVAIL.
Most of those limitations are simply gaps in the initial implementation and are not caused by the use of abstract interpretation.</p>

<blockquote>
  <p>[PREVAIL] maintains a variable for every one of (the finite number of) possible cells in the memory, and instantiates the underlying domains to track the values as if every cell is a (syntactic) analysis variable</p>
</blockquote>

<p>Each stack slot (memory cells <code class="language-plaintext highlighter-rouge">μ</code>, if you’ve read the formalism section) is tracked as a separate variable.
As we’ve seen in <a href="#abstract-domains">the introduction to abstract domains</a>, the complexity of abstract domain operations usually grows with the number of variables.
So PREVAIL is likely to consume significantly more resources for BPF programs using a lot of stack slots.</p>

<blockquote>
  <p>PREVAIL translates eBPF binaries into a CFG-based language understood by Crab <a href="https://seahorn.github.io">[30]</a>—a parametric framework for modular construction of abstract interpreters.</p>
</blockquote>

<p>Note this intermediate Crab language was later removed (cf. <a href="https://github.com/vbpf/ebpf-verifier/pull/87">vbpf/ebpf-verifier#87</a>).
That led to a significant reduction of the memory consumption.</p>

<blockquote>
  <p>We encode abstract tags as constant numbers and used the same abstract domain to track values and tags together. […] We handle null checks by tracking absolute values of pointers in addition to offsets</p>
</blockquote>

<!-- where I stopped last time -->
<p>I was a bit surprised by these changes.
I would have thought tags could be encoded with a much simpler abstract domain than values.
But I also thought the null checks could have been handled with additional tags as in the Linux verifier<sup id="fnref:linux-tags-null-values" role="doc-noteref"><a href="#fn:linux-tags-null-values" class="footnote" rel="footnote">6</a></sup> to avoid having to track absolute values of pointers.</p>

<blockquote>
  <p>Bitwise operations are not tracked precisely. Instead we use efficient over-approximations, e.g., we approximate <code class="language-plaintext highlighter-rouge">w &amp;= x</code> (bitwise and) when <code class="language-plaintext highlighter-rouge">x &gt; 0</code> with <code class="language-plaintext highlighter-rouge">assume(w&lt;x)</code>.</p>
</blockquote>

<p>PREVAIL over-approximates bitwise operations, potentially leading to false positives.
The Linux kernel does the same, but with what looks like a much more precise over-approximation, using <a href="https://arxiv.org/pdf/2105.05398.pdf">tristate numbers</a> (<code class="language-plaintext highlighter-rouge">tnums</code>).</p>

<p>The initial PREVAIL implementation also doesn’t support a lot of the more advanced eBPF features, such as BPF function calls, packet resizing, map-in-maps, and most helpers.
This lack of support would clearly prevent the use of PREVAIL for the largest BPF users out there (e.g., Cilium and Katran), but there do not seem to be any real blockers to their implementation.</p>

<p>Support for 32-bit arithmetic was also missing, which means programs compiled with <code class="language-plaintext highlighter-rouge">mcpu=v3</code> would likely be rejected<sup id="fnref:mcpuv3" role="doc-noteref"><a href="#fn:mcpuv3" class="footnote" rel="footnote">7</a></sup>.
That was covered last year (cf. <a href="https://github.com/vbpf/ebpf-verifier/pull/419">vbpf/ebpf-verifier#419</a>).</p>

<blockquote>
  <p>Our verifier does not currently implement termination check.</p>
</blockquote>

<p>Finally, at the time the paper was written, PREVAIL didn’t ensure programs terminate.
That was fixed in 2021 (cf. <a href="https://github.com/vbpf/ebpf-verifier/pull/139">vbpf/ebpf-verifier#139</a>) with a new abstract value <code class="language-plaintext highlighter-rouge">max_instructions</code>.
The constraint <code class="language-plaintext highlighter-rouge">max_instructions &lt; 100000</code> is added such that the longest path through the program can have at most 100k instructions<sup id="fnref:100k-insns-limit" role="doc-noteref"><a href="#fn:100k-insns-limit" class="footnote" rel="footnote">8</a></sup>.</p>

<p><br /></p>

<h3 id="accuracy-and-cost-evaluations">Accuracy and Cost Evaluations</h3>

<p>The authors evaluate the accuracy (number of false positives) and runtime cost (duration and memory consumption of the analysis) of PREVAIL.
To that end, they rely on <a href="https://github.com/vbpf/ebpf-samples/tree/v0.1-alpha">a corpus of 192 BPF programs</a> from six open source projects including Linux, Open vSwitch, Suricata, and Cilium.
BPF programs in the corpus are either of a small size (e.g., Linux samples) or networking-related; it doesn’t include any large tracing program for example.
The Cilium samples are also quite old and appear to have been generated with options that don’t maximize the programs’ size and complexity.
Nevertheless, the corpus includes a good variety of programs, with some in the thousands of instructions.</p>

<p>The authors first measure the accuracy of PREVAIL when using different abstract domains.
The Interval domain is clearly not adapted to verify BPF programs and serves more as a reference.
This evaluation is useful to guide a choice between the other, more expressive domains.
Since, as we will see, the accuracy also depends on the implementation of those domains, the choice is not apriori obvious.</p>

<blockquote>
  <p>the numerical abstract domains used in our final evaluation are:</p>
  <ul>
    <li>interval: classical Intervals <a href="https://www.di.ens.fr/~cousot/COUSOTpapers/publications.www/CousotCousot-ISOP-76-Dunod-p106--130-1976.pdf">[21]</a>.</li>
    <li>zone-crab: Zone using sparse representation and Split Normal Form <a href="https://seahorn.github.io/papers/split-dbm-sas16.pdf">[29]</a>.</li>
    <li>zone-elina: Zone using online decomposition <a href="https://dl.acm.org/doi/pdf/10.1145/3158143">[50]</a>.</li>
    <li>oct-elina: Octagon using online decomposition <a href="https://files.sri.inf.ethz.ch/website/papers/PLDI15-OptOctagon.pdf">[48]</a>.</li>
    <li>poly-elina: Polyhedra using online decomposition <a href="http://elina.ethz.ch/papers/POPL17-Polyhedra.pdf">[49]</a>.</li>
  </ul>
</blockquote>

<p>I mentioned all of these domains before and, here again, they are ordered from least expressive to most expressive.
<a href="http://elina.ethz.ch">Elina</a> and <a href="https://seahorn.github.io">Crab</a> refer to the libraries used to implement those abstract domains.
<a href="http://elina.ethz.ch/slides/pldi2015-fast-octagon.pdf">Online decomposition</a> is an optimization that partitions the set of program variables into disjoint subsets maintained throughout the analysis.
Since the cost of most abstract domain operations grows exponentially with the number of variables, this optimization helps limit that growth.</p>

<table>
  <thead>
    <tr>
      <th>Abstract domain</th>
      <th>Number of programs for which verification failed</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>interval</td>
      <td>64/192</td>
    </tr>
    <tr>
      <td>zone-crab</td>
      <td>2/192</td>
    </tr>
    <tr>
      <td>zone-elina</td>
      <td>2/192</td>
    </tr>
    <tr>
      <td>oct-elina</td>
      <td>2/192</td>
    </tr>
    <tr>
      <td>poly-elina</td>
      <td>23/192</td>
    </tr>
  </tbody>
</table>

<p>The above table shows the number of programs that each abstract domain was unable to verify among the corpus.
As expected, Interval was only able to verify two thirds of all programs, probably because it can’t track relations between variables.
Conversely, the result for poly-elina is surprisingly bad given Polyhedra is the most expressive domain in the set.
The authors however explains that this is due only to a limitation of the Elina implementation of that domain:</p>

<blockquote>
  <p>The implementation uses 64-bit integers for representing the coefficients, and falls back to top when the coefficients cannot be represented precisely using 64 bit.</p>
</blockquote>

<p>Of course, more expressive abstract domains come at a price.
The following two figures represent the verification time in seconds (left) and the memory consumption in GB (right) for each abstract domain.</p>

<p style="text-align: center;"><img src="/assets/prevail/prevail-analysis-times.png" alt="Scatter plot of the PREVAIL runtime for eBPF programs of various sizes (in number of instructions), shown for several abstract domains." title="Scatter plot of the PREVAIL runtime for eBPF programs of various sizes (in number of instructions), shown for several abstract domains." style="width: 47%;" />
<img src="/assets/prevail/prevail-memory-usages.png" alt="Scatter plot of the PREVAIL memory usage for eBPF programs of various sizes (in number of instructions), shown for several abstract domains." title="Scatter plot of the PREVAIL memory usage for eBPF programs of various sizes (in number of instructions), shown for several abstract domains." style="width: 47%;" /></p>

<p>The Interval domain has the lowest costs.
All other domains have much larger costs, except maybe for zone-crab which still requires around 5s and 1.5GB of memory to verify the largest programs.
Given that 1.5GB of memory is still too much for the Linux kernel, the authors suggest running PREVAIL in userspace<sup id="fnref:ebpf-for-windows" role="doc-noteref"><a href="#fn:ebpf-for-windows" class="footnote" rel="footnote">9</a></sup>.</p>

<p>As a point of comparison, the Linux kernel takes <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c04c0d2b968ac45d6ef020316808ef6c82325a82">about a second</a> and consumes <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c3997601d51069ec08d7d06cf31a17884056cc2">only MBs of memory</a> in the worst case.
That makes it faster than even the Interval domain.
Of course, as the authors note, the current corpus is biased toward the Linux verifier since all its programs were successfully loaded on Linux.</p>

<blockquote>
  <p>The actual runtime of zone-crab is roughly linear in the number of instructions, despite its cubic worst-case asymptotic complexity.</p>
</blockquote>

<p>As the authors note, if zone-crab behaves well in practice, it’s worst-case runtime is actually cubic.
It would be interesting to see if it’s possible to craft a BPF program that exhausts the verifier’s resources in this way.
The Linux verifier faces the same threat and currently mitigates it by enforcing various complexity limits on the input programs (e.g., <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ceefbc96fa5c5b975d87bf8e89ba8416f6b764d9"><code class="language-plaintext highlighter-rouge">BPF_COMPLEXITY_LIMIT_STATES</code></a>).</p>

<p>It’s a bit disappointing that the paper doesn’t include any comparison with the Linux verifier on the same corpus of BPF programs.
The authors also mention PREVAIL was able to verify nine programs rejected by the Linux verifier, but without providing more details.</p>

<p><br /></p>

<h3 id="conclusion">Conclusion</h3>

<p>I’m always super excited to read about alternatives to the Linux BPF verifier, and this paper is no exception!
If like me you don’t have a background in formal methods, the paper can be a bit hard to understand.
Hopefully, I gave enough pointers in this blog post to help with that.
Definitely worth a read!</p>

<p>This academic project is also one of the lucky few that already had a “real-life” application two years after their publication.
The implementation evolved a lot during those two years and continues to.
It would therefore be interesting to see how the performance compares to two years ago—and maybe how it now compares to the Linux verifier.</p>

<p><br /></p>

<p>Thanks to Aditi for her review and suggestions on an earlier version of this post!</p>

<p><br /></p>
<hr />

<p><br /></p>

<h3 id="addendum-false-positive-example">Addendum: False Positive Example</h3>

<script>
jqueryReady(function() {
  $('#addendum-false-positive-example').append(' <a href="#fnref:addendum-false-positive-example">↩</a>');
});
</script>

<p>Using integer intervals to track the possible values of variables can be imprecise, even if your variables are indeed integers.</p>

<p>Consider the example bytecode below.
We bound check <code class="language-plaintext highlighter-rouge">r1</code> and <code class="language-plaintext highlighter-rouge">r2</code>, then multiply them together, and use the result to decide whether to execute a division by zero.
We want to check with abstract interpretation if the division by zero will ever be executed.</p>

<figure class="highlight"><pre><code class="language-bpf" data-lang="bpf">  <span class="gl">0</span><span class="p">:</span><span class="w"> </span><span class="n">r0</span> <span class="o">=</span> <span class="m">0</span><span class="w">
  </span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">r1</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="nl">pc</span>+<span class="m">4</span><span class="w">  </span><span class="c1">// r1 ∈ [0; 10]
</span>  <span class="gl">2</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">r2</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="nl">pc</span>+<span class="m">3</span><span class="w">  </span><span class="c1">// r2 ∈ [0; 10]
</span>  <span class="gl">3</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">*=</span> <span class="n">r2</span>              <span class="c1">// r1 ∈ [0; 100]
</span>  <span class="gl">4</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">r1</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="m">11</span><span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="nl">pc</span>+<span class="m">1</span><span class="w">
  </span><span class="gl">5</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">/=</span> <span class="n">r0</span>              <span class="c1">// Division by zero!
</span>  <span class="gl">6</span><span class="p">:</span><span class="w"> </span><span class="k">exit</span></code></pre></figure>

<p>After instruction 2, both <code class="language-plaintext highlighter-rouge">r1</code> and <code class="language-plaintext highlighter-rouge">r2</code> have abstract value <code class="language-plaintext highlighter-rouge">[0; 10]</code>.
After instruction 3, <code class="language-plaintext highlighter-rouge">r1</code> holds the multiplication of <code class="language-plaintext highlighter-rouge">r1</code> and <code class="language-plaintext highlighter-rouge">r2</code> and therefore has abstract value <code class="language-plaintext highlighter-rouge">[0; 100]</code>.
When considering the condition at instruction 4, because <code class="language-plaintext highlighter-rouge">11 ∈ [0; 100]</code>, we will walk both paths and hit the division by zero.</p>

<p>Except we know that <code class="language-plaintext highlighter-rouge">r1</code> can never take value 11.
There are no two numbers between 0 and 10, that once multiplied together, can give 11 (said otherwise, 11 is a prime number).
When using integer intervals as abstract values, we will lose that information during the multiplication.
That loss of precision can lead to false positives, such as rejecting a program because of a never-executed division by zero in our example.
<a href="#fnref:addendum-false-positive-example">↩</a></p>

<p><br /></p>

<h3 id="footnotes">Footnotes</h3>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:prevail-stands-for" role="doc-endnote">
      <p>PREVAIL stands for “Polynomial-Runtime eBPF Verifier using an Abstract Interpretation Layer”. <a href="#fnref:prevail-stands-for" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:btf-revolution" role="doc-endnote">
      <p>That is quickly changing with BTF, which can preserve type information from the C program. <a href="#fnref:btf-revolution" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:example-clang-optimization" role="doc-endnote">
      <p>For example, Clang 11.0.0 sometimes moves NULL checks after pointer arithmetic on map values, which causes the verifier to error with “<code class="language-plaintext highlighter-rouge">pointer arithmetic on map_value_or_null prohibited, null-check it first</code>”. <a href="#fnref:example-clang-optimization" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:ai-operations" role="doc-endnote">
      <p>Abstract interpretation defines other operations on abstract values, such as widening and narrowing, which can be used to analyze loops without walking each iteration. <a href="#fnref:ai-operations" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:domain-table-source" role="doc-endnote">
      <p>Taken from <a href="https://youtu.be/t_ht1p67tOA?t=51">the POPL 2018 presentation</a> by Gagandeep Singh. <a href="#fnref:domain-table-source" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:linux-tags-null-values" role="doc-endnote">
      <p>For example, Linux has both <code class="language-plaintext highlighter-rouge">PTR_TO_MAP_VALUE_OR_NULL</code> and <code class="language-plaintext highlighter-rouge">PTR_TO_MAP_VALUE</code>, the former becoming the latter after a null check. <a href="#fnref:linux-tags-null-values" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:mcpuv3" role="doc-endnote">
      <p>See my <a href="https://pchaigno.github.io/bpf/2021/10/20/ebpf-instruction-sets.html">introduction to eBPF instruction sets</a> for details. <a href="#fnref:mcpuv3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:100k-insns-limit" role="doc-endnote">
      <p>There don’t seem to be any blockers to increase this limit and <a href="https://github.com/vbpf/ebpf-verifier/pull/139#discussion_r552216612">Dave Thaler suggested it could be configurable</a>. <a href="#fnref:100k-insns-limit" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:ebpf-for-windows" role="doc-endnote">
      <p>That is how the eBPF-for-Windows project ended up using PREVAIL. <a href="#fnref:ebpf-for-windows" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="ebpf" /><summary type="html"><![CDATA[This post summarizes the PREVAIL paper from PLDI 2019. PREVAIL is a new verifier for eBPF based on abstract interpretation and now used by Microsoft in eBPF-for-Windows.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://pchaigno.github.io/assets/illustration-prevail-beginning.png" /><media:content medium="image" url="https://pchaigno.github.io/assets/illustration-prevail-beginning.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">eBPF Instruction Set Extensions</title><link href="https://pchaigno.github.io/bpf/2021/10/20/ebpf-instruction-sets.html" rel="alternate" type="text/html" title="eBPF Instruction Set Extensions" /><published>2021-10-20T16:00:10+00:00</published><updated>2025-01-23T14:31:00+00:00</updated><id>https://pchaigno.github.io/bpf/2021/10/20/ebpf-instruction-sets</id><content type="html" xml:base="https://pchaigno.github.io/bpf/2021/10/20/ebpf-instruction-sets.html"><![CDATA[<p>Not everyone who develops BPF programs knows that several versions of the instruction set exist.
This isn’t really surprising given documentation on the subject is scarce.
So let’s go through the different eBPF instruction sets, why they exist, and why their choice matters.</p>

<p><br /></p>

<h3 id="llvms-backend-selector">LLVM’s Backend Selector</h3>

<p>If you’ve been using <code class="language-plaintext highlighter-rouge">llc</code> to compile your BPF program, you might have noticed an <code class="language-plaintext highlighter-rouge">-mcpu</code> parameter.
The help output gives us the following information:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ llc -march=bpf -mcpu=help
Available CPUs for this target:

  generic - Select the generic processor.
  probe   - Select the probe processor.
  v1      - Select the v1 processor.
  v2      - Select the v2 processor.
  v3      - Select the v3 processor.
  v4      - Select the v4 processor.

Available features for this target:

  alu32    - Enable ALU32 instructions.
  dummy    - unused feature.
  dwarfris - Disable MCAsmInfo DwarfUsesRelocationsAcrossSections.

Use +feature to enable a feature, or -feature to disable it.
For example, llc -mcpu=mycpu -mattr=+feature1,-feature2
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">-mcpu</code> parameter is used as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ clang -O2 -Wall -target bpf -emit-llvm -c example.c -o example.bc
$ llc example.bc -march=bpf -mcpu=probe -filetype=obj -o example.o
</code></pre></div></div>

<p>That parameter allows us to tell LLVM which eBPF instruction set to use.
It defaults to <code class="language-plaintext highlighter-rouge">generic</code>, an alias for <code class="language-plaintext highlighter-rouge">v1</code>, the oldest instruction set.
<code class="language-plaintext highlighter-rouge">probe</code> will select the newest instruction set your kernel supports.
We will see below that selecting newer versions allows LLVM to generate smaller and more efficient bytecode.</p>

<p><br /></p>

<h3 id="descriptions">Descriptions</h3>

<p>The first two extensions of the base instruction set, v2 and v3, add support for new jump instructions.
The fourth extension adds a whole set of new instructions, for the most part related to signed operations.</p>

<p>Each new extension includes all instructions from previous extensions.</p>

<h4 id="version-two">Version Two</h4>

<p>v2 adds support for lower-than jumps where only greater-than jumps were previously available.
Of course, the first kind of jumps can be rewritten into the second, but that requires an additional register load:</p>

<figure class="highlight"><pre><code class="language-bpf" data-lang="bpf"><span class="c1">// Using mcpu=v1:
</span><span class="gl">0</span><span class="p">:</span><span class="w"> </span><span class="n">r2</span> <span class="o">=</span> <span class="m">7</span><span class="w">
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">r2</span><span class="w"> </span><span class="o">s&gt;</span><span class="w"> </span><span class="n">r1</span><span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="nl">pc</span>+<span class="m">1</span><span class="w">
</span><span class="c1">// Using mcpu=v2's BPF_JSLT:
</span><span class="gl">0</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">r1</span><span class="w"> </span><span class="o">s&lt;</span><span class="w"> </span><span class="m">7</span><span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="nl">pc</span>+<span class="m">1</span></code></pre></figure>

<h4 id="version-three">Version Three</h4>

<p>The second extension, v3, adds 32-bit variants of the existing conditional 64-bit jumps.
Again, you can work around the lack of 32-bit conditional jumps by clearing the 32 most-significant bits.
But using 32-bit conditional jump is shorter:</p>

<figure class="highlight"><pre><code class="language-bpf" data-lang="bpf"><span class="gl">0</span><span class="p">:</span><span class="w"> </span><span class="n">ca</span><span class="kt">ll</span> bpf_skb_load_bytes
<span class="c1">// Using mcpu=v2's 64-bit jumps:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="n">r0</span> <span class="o">&lt;&lt;=</span> <span class="m">32</span><span class="w">
</span><span class="gl">2</span><span class="p">:</span><span class="w"> </span><span class="n">r0</span> <span class="o">s&gt;&gt;=</span> <span class="m">32</span><span class="w">
</span><span class="gl">3</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">r0</span><span class="w"> </span><span class="o">s&lt;</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="mi">+1785</span><span class="w"> </span><span class="nl">&lt;LBB10_90&gt;</span>
<span class="c1">// Using mcpu=v3's 32-bit jumps:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">w0</span><span class="w"> </span><span class="o">s&lt;</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="mi">+1689</span><span class="w"> </span><span class="nl">&lt;LBB10_90&gt;</span></code></pre></figure>

<p><code class="language-plaintext highlighter-rouge">w0</code> is the 32-bit subregister of <code class="language-plaintext highlighter-rouge">r0</code>.</p>

<h4 id="version-four">Version Four</h4>

<p>The latest extension, v4, adds seven new instructions.
Most are related to signed operation, though there is also a new unconditional jump instruction and a new byte-swapping instruction.
Their format is described in details in <a href="https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com">the original proposal by Yonghong Song</a>.</p>

<p>(Syntax highlighting is a bit off on the examples here because I haven’t yet updated <a href="/ebpf/2019/10/11/bpf-syntax-highlighting.html">the BPF lexer</a>.)</p>

<p>The fourth extensions adds support for signed division and modulo operations.
Those were simply not supported before.
Trying to compile a signed division with <code class="language-plaintext highlighter-rouge">mcpu=v3</code> will error with <code class="language-plaintext highlighter-rouge">Unsupport [sic] signed division for DAG</code>.</p>

<figure class="highlight"><pre><code class="language-bpf" data-lang="bpf"><span class="c1">// C code: s64 y = x / -5;
// Using mcpu=v4's signed division:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">s/=</span> <span class="m">-0</span>x<span class="m">5</span></code></pre></figure>

<p>This fourth extension also added support for sign-extended load, store and mov operations.
Those were already possible, but required two more instructions to extend the sign to the expected size.</p>

<figure class="highlight"><pre><code class="language-bpf" data-lang="bpf"><span class="c1">// C code: s64 y = e-&gt;pid;
// Using mcpu=v3:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">=</span> <span class="o">*</span><span class="p">(</span><span class="kt">u32</span><span class="w"> </span><span class="o">*</span><span class="p">)(</span><span class="n">r6</span><span class="w"> </span><span class="o">+</span><span class="mi">0</span><span class="p">)</span>
<span class="gl">2</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">&lt;&lt;=</span> <span class="m">0x20</span><span class="w">
</span><span class="gl">3</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">s&gt;&gt;=</span> <span class="m">0x20</span><span class="w">
</span><span class="c1">// Using mcpu=v4:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">=</span> *(s<span class="m">32</span><span class="w"> </span>*)(<span class="n">r6</span> +<span class="m">0</span>)</code></pre></figure>

<p>Not everything in the fourth extension is about signed operations!
It also added a new unconditional jump with a 32-bit jump offset.
The existing unconditional jump only supported 16-bit offsets.
When programs have a lot of instructions (e.g., unrolled loop), it’s easy to end up needing a longer jump.
In such cases, the compiler would simply error out with <code class="language-plaintext highlighter-rouge">LLVM ERROR: Branch target out of insn range</code>.</p>

<figure class="highlight"><pre><code class="language-bpf" data-lang="bpf"><span class="c1">// Largest-possible unconditional jump with mcpu=v3:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="mi">+0</span><span class="nl">x7fff</span> &lt;LBB<span class="m">0</span>_<span class="m">2</span>&gt;
<span class="c1">// Largest-possible unconditional jump with mcpu=v4:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="k">goto</span><span class="nl">l</span> +<span class="m">0x7fffffff</span><span class="w"> </span>&lt;LBB<span class="m">0</span>_<span class="m">2</span>&gt;</code></pre></figure>

<p>Finally, <code class="language-plaintext highlighter-rouge">mcpu=v4</code> added support for unconditional byte-swapping instructions.
Until now, the <code class="language-plaintext highlighter-rouge">be{16,32,64}</code> and <code class="language-plaintext highlighter-rouge">le{16,32,64}</code> instructions would only swap bytes on their respective architecture: <code class="language-plaintext highlighter-rouge">le16</code> will only swap the two bytes on big endian CPUs and vice versa.</p>

<figure class="highlight"><pre><code class="language-bpf" data-lang="bpf"><span class="c1">// C code: u64 y = __builtin_bswap32(x)
// Using mcpu=v3 on little endian:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">=</span> <span class="k">be32</span> <span class="n">r1</span>
<span class="c1">// Using mcpu=v3 on big endian:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">=</span> le<span class="m">32</span><span class="w"> </span><span class="n">r1</span>
<span class="c1">// Using mcpu=v4 on any CPU:
</span><span class="gl">1</span><span class="p">:</span><span class="w"> </span><span class="n">r1</span> <span class="o">=</span> bswap<span class="m">32</span><span class="w"> </span><span class="n">r1</span></code></pre></figure>

<p><br /></p>

<h3 id="prerequisites">Prerequisites</h3>

<p>You need recent-enough versions of Linux and LLVM to use the instruction set extensions.
The following table sums it up.</p>

<table>
  <thead>
    <tr>
      <th>BPF ISA version</th>
      <th>New instructions</th>
      <th>Linux version</th>
      <th>LLVM version</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>v1 (generic)</td>
      <td>-</td>
      <td><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=daedfb22451dd02b35c0549566cbb7cc06bdd53b">v3.18</a></td>
      <td><a href="https://reviews.llvm.org/rL227008">v3.7</a></td>
    </tr>
    <tr>
      <td>v2</td>
      <td><code class="language-plaintext highlighter-rouge">BPF_J{LT,LE,SLT,SLE}</code></td>
      <td><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=92b31a9af73b3a3fc801899335d6c47966351830">v4.14</a></td>
      <td><a href="https://reviews.llvm.org/rL311522">v6.0</a></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">mattr=+alu32</code></td>
      <td>32-bit calling convention</td>
      <td><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2dc6b100f928aac8d7532bf7112d3f8d3f952bad">v5.0</a><sup id="fnref:alu32-support" role="doc-noteref"><a href="#fn:alu32-support" class="footnote" rel="footnote">1</a></sup></td>
      <td><a href="https://reviews.llvm.org/rL325983">v7.0</a></td>
    </tr>
    <tr>
      <td>v3</td>
      <td>32-bit variants of all jumps</td>
      <td><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=092ed0968bb648cd18e8a0430cd0a8a71727315c">v5.1</a></td>
      <td><a href="https://reviews.llvm.org/rL353384">v9.0</a>, with <code class="language-plaintext highlighter-rouge">mattr=+alu32</code></td>
    </tr>
    <tr>
      <td>v4</td>
      <td><code class="language-plaintext highlighter-rouge">BPF_{SDIV,SMOD}</code>, sign-extended <code class="language-plaintext highlighter-rouge">BPF_{LD,LDX,,ST,STX,MOV}</code>, 32-bit <code class="language-plaintext highlighter-rouge">BPF_JA</code>, and target-independent byte swapping</td>
      <td><a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f9a1ea821ff25353a0e80d971e7958cd55b47a3">v6.6</a><sup id="fnref:sign-extended-store" role="doc-noteref"><a href="#fn:sign-extended-store" class="footnote" rel="footnote">2</a></sup></td>
      <td><a href="https://github.com/llvm/llvm-project/commit/6c412b6c6faa2dabd8602d35d3f5e796fb1daf80">v18.0</a>, with <code class="language-plaintext highlighter-rouge">mattr=+alu32</code></td>
    </tr>
  </tbody>
</table>

<p><a href="https://github.com/torvalds/linux/blob/28806e4d9b97865b450d72156e9ad229f2067f0b/Documentation/bpf/bpf_design_QA.rst#q-why-bpf-jlt-and-bpf-jle-instructions-were-not-introduced-in-the-beginning">The BPF FAQ</a> also gives good insight into why these instruction set extensions exist:</p>

<blockquote>
  <p><strong>Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning?</strong></p>

  <p>A: Because classic BPF didn’t have them and BPF authors felt that compiler workaround would be acceptable.
Turned out that programs lose performance due to lack of these compare instructions and they were added.
These two instructions are a perfect example of the kind of new BPF instructions that are acceptable and can be added in the future.
These two already had equivalent instructions in native CPUs. New instructions that don’t have one-to-one mapping to HW instructions will not be accepted.</p>
</blockquote>

<p><br /></p>

<h3 id="impact-on-program-size-and-complexity">Impact on Program Size and Complexity</h3>

<p>Why does all this matter?
Is it so bad to use the default v1 instruction set?
Can we just set <code class="language-plaintext highlighter-rouge">mcpu=probe</code>?</p>

<p>Let’s first have a look at the impact on the program sizes.
To that end, we can use <a href="https://github.com/cilium/cilium/tree/master/bpf">Cilium’s BPF programs</a>.
They are open source, of heterogeneous sizes, and used in production systems.
In the latest stable Cilium version, v1.16.5, the largest BPF programs are in <code class="language-plaintext highlighter-rouge">bpf_host.c</code>.
Cilium’s <code class="language-plaintext highlighter-rouge">TestVerifier</code> go test loads the programs in the kernel and retrieves various statistics.
In the following, I’m using it with LLVM 18.1.8.</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>git checkout v1.16.5
<span class="nv">$ </span><span class="k">for </span>v <span class="k">in </span>v1 v2 v3 v4<span class="p">;</span> <span class="k">do</span>       <span class="se">\</span>
    <span class="nb">sed</span> <span class="nt">-i</span> <span class="s2">"s/mcpu=v[1-4].*/mcpu=</span><span class="nv">$v</span><span class="s2">/"</span> bpf/Makefile.bpf <span class="o">&amp;&amp;</span>               <span class="se">\</span>
    make <span class="nt">-C</span> bpf <span class="nv">KERNEL</span><span class="o">=</span>netnext <span class="o">&amp;&amp;</span>                                       <span class="se">\</span>
    <span class="nv">f</span><span class="o">=</span><span class="k">${</span><span class="nv">v</span><span class="p">/ /-</span><span class="k">}</span><span class="nt">-bpf_host</span>.txt<span class="p">;</span>                                            <span class="se">\</span>
    go <span class="nb">test</span> <span class="nt">-exec</span> <span class="nb">sudo</span> <span class="nt">-v</span> <span class="nt">-run</span> TestVerifier/bpf_host/1 ./test/verifier/ <span class="se">\</span>
      <span class="nt">-cilium-base-path</span> <span class="si">$(</span><span class="nb">pwd</span><span class="si">)</span> <span class="nt">-ci-kernel-version</span> netnext <span class="o">&gt;</span> <span class="nv">$f</span><span class="p">;</span>         <span class="se">\</span>
<span class="k">done</span></code></pre></figure>

<div id="plot-isa-versions-program-sizes"></div>
<script>
var host_programs = ['cil_from_host', 'cil_from_netdev', 'cil_to_host', 'cil_to_netdev', 'handle_lxc_traffic', 'tail_handle_ipv4_cont_from_host', 'tail_handle_ipv4_cont_from_netdev', 'tail_handle_ipv4_from_host', 'tail_handle_ipv4_from_netdev', 'tail_handle_ipv6_cont_from_host', 'tail_handle_ipv6_cont_from_netdev', 'tail_handle_ipv6_from_host', 'tail_handle_ipv6_from_netdev', 'tail_handle_nat_fwd_ipv4', 'tail_handle_nat_fwd_ipv6', 'tail_handle_snat_fwd_ipv4', 'tail_handle_snat_fwd_ipv6', 'tail_ipv4_host_policy_ingress', 'tail_ipv6_host_policy_ingress', 'tail_nodeport_nat_egress_ipv4', 'tail_nodeport_nat_egress_ipv6', 'tail_nodeport_nat_ingress_ipv4', 'tail_nodeport_nat_ingress_ipv6', 'tail_nodeport_rev_dnat_ingress_ipv4', 'tail_nodeport_rev_dnat_ingress_ipv6'];
var programs = host_programs.map(p => 'bpf_host ' + p);
var v1ProgramSizes = {
    x: programs,
    y: [674, 696, 866, 3337, 2100, 1208, 655, 633, 2399, 1240, 799, 791, 2673, 892, 840, 3879, 3520, 1125, 1139, 2919, 3254, 1826, 2068, 928, 976],
    type: 'bar',
    mode: 'markers',
    name: 'v1',
    marker: {
        color: 'rgb(241, 115, 35)',
    }
};
var v2ProgramSizes = {
    x: programs,
    y: [669, 690, 850, 3295, 2105, 1193, 644, 629, 2400, 1225, 787, 780, 2642, 891, 833, 3850, 3510, 1113, 1128, 2916, 3240, 1807, 2016, 923, 960],
    type: 'bar',
    mode: 'markers',
    name: 'v2',
    marker: {
        color: 'rgb(248, 197, 23)',
    }
};
var v3ProgramSizes = {
    x: programs,
    y: [643, 673, 782, 3119, 1975, 1151, 617, 580, 2217, 1183, 751, 717, 2467, 818, 761, 3492, 3189, 1054, 1062, 2607, 2935, 1655, 1896, 844, 872],
    type: 'bar',
    mode: 'markers',
    name: 'v3',
    marker: {
        color: 'rgb(98, 138, 198)',
    }
};
var v4ProgramSizes = {
    x: programs,
    y: [599, 634, 746, 2956, 1897, 1068, 600, 563, 2164, 1095, 735, 698, 2415, 793, 748, 3436, 3161, 1014, 1036, 2591, 2911, 1615, 1865, 819, 859],
    type: 'bar',
    mode: 'markers',
    name: 'v4',
    marker: {
        color: 'rgb(152, 201, 62)',
    }
};

plotlyReady(function() {
    var data = [v1ProgramSizes, v2ProgramSizes, v3ProgramSizes, v4ProgramSizes];
    var layout = {
        title: '<b>Program Sizes for Each Instruction Set Version</b>',
        barmode: 'group',
        legend: {x: 0.02, y: 0.95},
        margin: {t: 80, l: 30, r: 0, b: 80},
        xaxis: {
            title: "Cilium's BPF programs",
            showticklabels: false
        },
        yaxis: {tickformat: 's'},
        height: 450,
    };
    if (jQuery.browser && jQuery.browser.mobile) {
        layout.dragmode = false;
    }
    Plotly.newPlot('plot-isa-versions-program-sizes', data, layout);
});
</script>

<p>As expected, each newer instruction set version generates smaller BPF programs.
Since the new instructions have a one-to-one mapping to x86 instructions, we can expect a similar impact on the size of the JIT-compiled programs. 
You can therefore expect a small performance benefit in most cases when using newer instruction sets.</p>

<p>For larger programs and kernels before v5.2<sup id="fnref:4k-limit" role="doc-noteref"><a href="#fn:4k-limit" class="footnote" rel="footnote">3</a></sup>, the v2 and v3 instruction sets may also allow you to reduce your program size below the 4096 instruction limit imposed by the verifier. 
It is however not the only limit imposed by the verifier.
A more common source of problems for large programs is the limit on the number of instructions analyzed by the verifier.</p>

<p>As the verifier analyzes all paths through a program, it counts how many instructions it has already analyzed and stops after a given limit (e.g., 1 million on Linux 5.2+).
We’ll refer to the number of instructions analyzed by the verifier as the <em>complexity</em> of the BPF program.
In the worst case, the complexity grows exponentially with the number of conditions in the program<sup id="fnref:state-pruning" role="doc-noteref"><a href="#fn:state-pruning" class="footnote" rel="footnote">4</a></sup>.
<a href="/ebpf/2021/04/12/bmc-accelerating-memcached-using-bpf-and-xdp.html#bpfs-complexity-constraint"></a></p>

<p><code class="language-plaintext highlighter-rouge">TestVerifier</code> also reports the complexity of each loaded BPF program.
I executed it on a Linux 6.8 and report the results in the following plot.</p>

<div id="plot-isa-versions-complexity"></div>
<script>
var host_programs = ['cil_from_host', 'cil_from_netdev', 'cil_to_host', 'cil_to_netdev', 'handle_lxc_traffic', 'tail_handle_ipv4_cont_from_host', 'tail_handle_ipv4_cont_from_netdev', 'tail_handle_ipv4_from_host', 'tail_handle_ipv4_from_netdev', 'tail_handle_ipv6_cont_from_host', 'tail_handle_ipv6_cont_from_netdev', 'tail_handle_ipv6_from_host', 'tail_handle_ipv6_from_netdev', 'tail_handle_nat_fwd_ipv4', 'tail_handle_nat_fwd_ipv6', 'tail_handle_snat_fwd_ipv4', 'tail_handle_snat_fwd_ipv6', 'tail_ipv4_host_policy_ingress', 'tail_ipv6_host_policy_ingress', 'tail_nodeport_nat_egress_ipv4', 'tail_nodeport_nat_egress_ipv6', 'tail_nodeport_nat_ingress_ipv4', 'tail_nodeport_nat_ingress_ipv6', 'tail_nodeport_rev_dnat_ingress_ipv4', 'tail_nodeport_rev_dnat_ingress_ipv6'];
var programs = host_programs.map(p => 'bpf_host ' + p);
var v1Complexity = {
    x: programs,
    y: [1215, 1084, 1683, 41960, 27178, 30836, 16600, 2918, 15519, 25935, 13078, 2278, 10936, 6777, 4728, 13820, 14891, 21013, 9524, 8081, 5307, 52619, 11575, 4497, 2711],
    type: 'bar',
    mode: 'markers',
    name: 'v1',
    marker: {
        color: 'rgb(241, 115, 35)',
    }
};
var v2Complexity = {
    x: programs,
    y: [1206, 1078, 1695, 42242, 28163, 31582, 17001, 2907, 15092, 26070, 13486, 2304, 10648, 6387, 4412, 13598, 14732, 21438, 9601, 7999, 5176, 51500, 11367, 4502, 2905],
    type: 'bar',
    mode: 'markers',
    name: 'v2',
    marker: {
        color: 'rgb(248, 197, 23)',
    }
};
var v3Complexity = {
    x: programs,
    y: [1004, 1078, 1479, 58125, 40379, 32498, 18275, 2874, 16961, 38127, 14433, 2181, 9950, 7233, 4404, 12983, 12984, 22290, 9354, 7236, 8267, 140516, 11182, 4924, 2642],
    type: 'bar',
    mode: 'markers',
    name: 'v3',
    marker: {
        color: 'rgb(98, 138, 198)',
    }
};
var v4Complexity = {
    x: programs,
    y: [970, 1015, 1437, 47327, 42495, 40794, 23456, 2767, 16767, 31760, 20027, 2678, 10141, 6972, 4408, 17101, 13863, 20652, 9136, 7333, 8451, 80649, 12454, 5010, 2493],
    type: 'bar',
    mode: 'markers',
    name: 'v4',
    marker: {
        color: 'rgb(152, 201, 62)',
    }
};



plotlyReady(function() {
    var data = [v1Complexity, v2Complexity, v3Complexity, v4Complexity];
    var layout = {
        title: '<b>Complexity for Each Instruction Set Version</b>',
        barmode: 'group',
        legend: {x: 0.02, y: 0.95},
        margin: {t: 80, l: 50, r: 0, b: 80},
        xaxis: {
            title: "Cilium's BPF programs",
            showticklabels: false
        },
        yaxis: {
            tickformat: 's',
            // type: 'log',
            // autorange: true,
        },
        height: 450,
    };
    if (jQuery.browser && jQuery.browser.mobile) {
        layout.dragmode = false;
    }
    Plotly.newPlot('plot-isa-versions-complexity', data, layout);
});
</script>

<p>By clicking on the legend to hide v3 and v4, we can notice that v1 and v2 are fairly close.
There are however stricking differences between the first two versions and the third one.
The v3 instruction set sometimes reduces complexity and other times exacerbates it.
The v4 instruction set has a similar impact to v3, though not to the same extent.</p>

<p>It’s unclear why the newer instruction sets sometimes increase complexity when they reduce the number of instructions.
Given that they don’t significantly modify the control flow, it could be that they reduce the efficiency of <a href="/ebpf/2021/04/12/bmc-accelerating-memcached-using-bpf-and-xdp.html#bpfs-complexity-constraint">the verifier’s state pruning</a>.</p>

<p>To sum up, if you are having complexity issues (i.e., hitting the verifier’s threshold), you need to carefully test the impact of each instruction set before making the switch.</p>

<p><br /></p>

<h3 id="conclusion">Conclusion</h3>

<p>We have seen that the Linux kernel supports not one but three eBPF instruction sets!
These instruction sets have an impact on program size and performance, and in most cases, you’re better off setting <code class="language-plaintext highlighter-rouge">mcpu=probe</code> to use the newest supported version.
If you have very large BPF programs, a version switch can lead to a reject by the kernel’s verifier, if you hit the complexity limit, so you should test thoroughly before making the switch.</p>

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:alu32-support" role="doc-endnote">
      <p>As far as I can see, it should be supported since v3.19 with <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0003ec01c667b731c139e23de3306a8b328ccf5">the first helper calls</a>, but most programs break before v5.0 due to the lack of <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2dc6b100f928aac8d7532bf7112d3f8d3f952bad">support for 32-bit signed right shifts</a>. <a href="#fnref:alu32-support" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:sign-extended-store" role="doc-endnote">
      <p>Sign-extended stores were already supported in the kernel, so only LLVM support was needed. It came in commit <a href="https://github.com/llvm/llvm-project/commit/8f28e8069c4ba1110daee8bddc4d5049b6d4646e">8f28e8069c4b</a>. <a href="#fnref:sign-extended-store" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4k-limit" role="doc-endnote">
      <p>The 4096 instructions limit on the program size was removed in Linux 5.2 for privileged users. <a href="#fnref:4k-limit" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:state-pruning" role="doc-endnote">
      <p>In practice, the verifier uses state pruning to recognize equivalent paths and reduce the number of instructions to analyze. <a href="#fnref:state-pruning" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="bpf" /><summary type="html"><![CDATA[This post details the different versions of the eBPF instruction set. Their impact on program size and eBPF complexity is then evaluated.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://pchaigno.github.io/assets/illustration-ebpf-instruction-sets.png" /><media:content medium="image" url="https://pchaigno.github.io/assets/illustration-ebpf-instruction-sets.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>