Skip to content

Commit 74bff5c

Browse files
committed
Add kernelCTF CVE-2025-39946
1 parent 7297a9e commit 74bff5c

File tree

11 files changed

+10490
-0
lines changed

11 files changed

+10490
-0
lines changed
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# CVE-2025-39946
2+
3+
Exploit documentation for `CVE-2025-39946` against `mitigation-v3b-6.1.55`.
4+
5+
As stated in the `vulnerability.md` documentation, the bug behind
6+
`CVE-2025-39946` causes use of uninitialized data and potentially out-of-bounds
7+
accesses. For exploitation we will focus on the uninitialized data in the
8+
`struct skb_shared_info.frags[]` array.
9+
TLS manages the first 5 fragments for internal use, however fragments after that
10+
are accessible to us because of the bug described.
11+
In order to exploit this, we will first groom the heap so that the next fragment
12+
has some controlled value. We then try to re-use this fragment page so that we
13+
can trigger a page write corrupting kernel data.
14+
15+
## Page Write Targets
16+
17+
With the primitive outlined above (essentially a one-shot use-after-free page
18+
write primitive), we need to find a useful page to write to.
19+
There are two obvious choices for this:
20+
- Page tables
21+
- Slab backing pages
22+
23+
At the time of working on the exploit, page tables seemed to be too unstable due
24+
to the one-shot nature of the write, which is why we will continue with the slab
25+
backing pages. In hindsight, page tables were probably a good fit too.
26+
27+
Which slab do we target? Ideally the slab would contain objects that allow
28+
trivial code execution or other memory write primitives. Additionally the
29+
objects for the slab should be allocatable without too much noise in other
30+
slabs, because we do not want to accidently corrupt another slab.
31+
Finally, we need to ensure that the same pages used for the slab can be allocated
32+
for for skb fragments.
33+
34+
Considering all of the above, I went for `struct file` objects:
35+
- They can be allocated rather easily by opening files and we can allocate quite
36+
many
37+
- Files are allocated from a dedicated `kmem_cache`, thus we are sure to only
38+
corrupt file objects aiding stability.
39+
- Files contain a `f_op` vtable, allowing direct rip control.
40+
- File slabs are backed by order 0 pages, which can be allocated easily from
41+
userspace using pipes.
42+
43+
One downside of files is the fact that we cannot allocate files without
44+
allocating inodes too. This is a problem because every file allocation will
45+
result in the allocation of a `struct dentry` which essentially means our page
46+
write might accidently hit a different slab.
47+
48+
## Heap Grooming
49+
50+
In order to get a fragment at the right position we want to have skbs with 6
51+
fragments, so that the last fragment can be picked up by the file slab.
52+
To get the controlled fragments into an skb, we create pipes and fill exactly 5
53+
pages. Pipe buffers are backed by order 0 pages which matches the file slab
54+
`kmem_cache` order. After that we add another partially filled page which will
55+
be the page used for triggering the overwrite.
56+
We then splice those pages onto an skb for the expected fragment layout.
57+
58+
The final page needs to be partially filled so that there is some space left to
59+
write. We will fill the page exactly to the alignment of the `struct file`
60+
objects in the slab. Thus the next write starts at the next `struct file`
61+
object, and will corrupt all the files in the rest of the slab.
62+
63+
For increased chance of hitting the right pages, we will repeat the above for
64+
N (= 16) pipes. We fill all of them and then release the skbs one by one,
65+
immediatly picking each up with a new `struct tls_strparser`. Since the last
66+
freed object will be the first on the freelist, it is very likely that the TLS
67+
socket picks up the prepared skb.
68+
69+
## File Slab Spray
70+
71+
Now that each TLS socket is readily equipped with a prepared skb, we want to
72+
spray file slabs so that new slabs will pick up pages that were released from
73+
the pipe buffers earlier.
74+
For files to allocate we will choose `signalfd`s. Those are a decent choice
75+
because they are rather simple with a small sized context such that we do not
76+
allocate new slabs except for the `file` and the `dentry` slab. Furthermore
77+
`signalfd`s provide an easy to use oracle ([1]) allowing us to check whether we
78+
corrupted the file structure.
79+
80+
```c
81+
static int do_signalfd4(int ufd, sigset_t *mask, int flags)
82+
{
83+
struct signalfd_ctx *ctx;
84+
85+
/* ... */
86+
87+
if (ufd == -1) {
88+
/* ... */
89+
} else {
90+
struct fd f = fdget(ufd);
91+
if (!f.file)
92+
return -EBADF;
93+
ctx = f.file->private_data;
94+
if (f.file->f_op != &signalfd_fops) { // [1]
95+
fdput(f);
96+
return -EINVAL;
97+
}
98+
/* ... */
99+
}
100+
```
101+
102+
As mentioned earlier we cannot prevent allocation of `dentry` slabs when
103+
allocating `signalfd`s. To prevent kernel panics because of corrupted `dentry`s
104+
we will spray the `signalfd`s in a dedicated forked process which will live
105+
forever in case we fail to find a corrupted file. This way we prevent any
106+
accidental oops during cleanup.
107+
108+
## Triggering the Bug for the Page Write
109+
110+
Now that we hopefully have a `signalfd` with a file in a slab backed by the page
111+
we placed into one of the skb fragments, we will trigger the bug as described in
112+
the `vulnerability.md` document and write our payload for each skb set up.
113+
114+
For payload choice we will opt for a simple empty file that basically has
115+
nothing but an `f_op` table that has a `flush` method populated and a reference
116+
count of 1. When we close the file via `close()` we will reach `filp_close()`
117+
which gives us RIP control.
118+
We do not really need the reference count of exactly 1, we just need anything
119+
greater than zero to bypass checks in `filp_close()`. Actually it is better to
120+
choose a greater reference count to prevent the file destructor from running.
121+
Since we will block the kernel in an infinite loop in our flush primitive, we do
122+
not need to worry about that too much though.
123+
124+
As a RIP gadget we will utilize the "one gadget" technique described in great
125+
detail in the [CVE-2025-21700 writeup](https://raw.githubusercontent.com/google/security-research/refs/heads/master/pocs/linux/kernelctf/CVE-2025-21700_lts_cos_mitigation/docs/novel-techniques.md).
126+
Also note that this gadget does not need a KASLR bypass.
127+
128+
To create the `struct file_operations` pointer we will resort to the previously
129+
documented deterministically known location of the exception stacks in the CPU
130+
entry area. This issue has been documented several times (CVE-2023-0597).
131+
132+
After each write completed, we check each `signalfd` using the oracle described
133+
above. If any of them got corrupted we trigger our payload by closing the file
134+
descriptor.
135+
136+
## Stability Notes
137+
138+
Special care was taken to make the exploit repeatible if the page reclaim fails.
139+
It should be close to 80% stable.
140+
As a side note, the usage of the "one gadget" actually helps with the page
141+
reclaim because it causes the PCP to drain, thus giving us more reliability in
142+
the page allocation.
143+
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# CVE-2025-39946
2+
3+
- Requirements:
4+
- Kernel configuration CONFIG_TLS
5+
- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=84c61fe1a75b4255df1e1e7c054c9e6d048da417
6+
- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0aeb54ac4cd5cf8f60131b4d9ec0b6dc9c27b20d
7+
- Affected Versions: 6.0-rc1 - 6.17-rc7
8+
- URL: https://www.cve.org/CVERecord?id=CVE-2025-39946
9+
10+
In the kernel TLS implementation an issue was found when processing invalid
11+
TLS records under network pressure. This behavior can be achieved
12+
deterministically by forcing short reads via out-of-band data. The kernel
13+
test case demonstrates this:
14+
15+
```c
16+
TEST_F(tls_err, oob_pressure)
17+
{
18+
char buf[1<<16];
19+
int i;
20+
21+
memrnd(buf, sizeof(buf));
22+
23+
EXPECT_EQ(send(self->fd2, buf, 5, MSG_OOB), 5);
24+
EXPECT_EQ(send(self->fd2, buf, sizeof(buf), 0), sizeof(buf));
25+
for (i = 0; i < 64; i++)
26+
EXPECT_EQ(send(self->fd2, buf, 5, MSG_OOB), 5);
27+
}
28+
```
29+
30+
The problem manifests in the `tls_strp_copyin_frag` method. After entering
31+
copy mode due to the initial short read and partially receiving the large
32+
buffer, we continue to copy out chunks from said large buffer. Problem is
33+
that TLS pre-allocated the `skb_shinfo(frags)` for only a fixed (small) TLS
34+
record and fails to check whether the available fragments are already
35+
exhausted ([1]). It then continues to copy the incoming data ([2]).
36+
Finally, parsing the TLS header in `tls_rx_msg_size` is made to fail
37+
returning an invalid size. This causes the copy loop to abort, however
38+
fails to abort the full message ([3]). A following read triggered by other
39+
incoming OOB messages forces reentry into `tls_strp_copyin_frag` eventually
40+
exhausting the available fragments causing reads of uninitialized data or
41+
out-of-bounds reads on the skb_shared_info structure.
42+
43+
```c
44+
static int tls_strp_copyin_frag(struct tls_strparser *strp, struct sk_buff *skb,
45+
struct sk_buff *in_skb, unsigned int offset,
46+
size_t in_len)
47+
{
48+
size_t len, chunk;
49+
skb_frag_t *frag;
50+
int sz;
51+
52+
frag = &skb_shinfo(skb)->frags[skb->len / PAGE_SIZE]; // [1]
53+
54+
len = in_len;
55+
/* First make sure we got the header */
56+
if (!strp->stm.full_len) {
57+
/* Assume one page is more than enough for headers */
58+
chunk = min_t(size_t, len, PAGE_SIZE - skb_frag_size(frag));
59+
WARN_ON_ONCE(skb_copy_bits(in_skb, offset,
60+
skb_frag_address(frag) +
61+
skb_frag_size(frag),
62+
chunk)); // [2]
63+
64+
skb->len += chunk;
65+
skb->data_len += chunk;
66+
skb_frag_size_add(frag, chunk);
67+
68+
sz = tls_rx_msg_size(strp, skb);
69+
if (sz < 0)
70+
return sz; // [3]
71+
/*...*/
72+
```
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
SRC := exploit.c
2+
3+
exploit: $(SRC)
4+
$(CC) -O2 -static -s -Wall -lpthread -o $@ $^
5+
6+
rip: rip.c
7+
# needs clang to compile
8+
clang -O3 -o $@ $<
9+
10+
# apparently this is needed for the CI
11+
prerequisites:
12+
Binary file not shown.

0 commit comments

Comments
 (0)