Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions pocs/linux/kernelctf/CVE-2025-40216_mitigation/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Vulnerability Overview
This vulnerability exists within the io_uring subsystem, specifically in how fixed buffers are registered and subsequently imported.

The root cause is an incorrect offset calculation when handling user pointers that are not aligned to the folio size.

This logic error results in an incorrect bv_len (buffer vector length),

which subsequently triggers an Out-of-Bounds (OOB) access and the use of uninitialized memory during I/O operations.

# Root Cause Analysis


## Incorrect Offset Calculation (io_sqe_buffer_register)
In io_sqe_buffer_register, the kernel attempts to calculate the offset of the first page.

However, the bitmask logic used assumes specific alignment guarantees for user pointers that do not exist.

When iov->iov_base is not aligned as expected relative to imu->folio_shift, the calculated off variable is incorrect.

```C
static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
struct io_mapped_ubuf **pimu,
struct page **last_hpage)
{
// ...
/* * VULNERABILITY:
* The bitwise AND logic here produces an incorrect offset if the
* user pointer (iov_base) alignment does not match the folio logic.
*/
off = (unsigned long) iov->iov_base & ((1UL << imu->folio_shift) - 1);
*pimu = imu;
ret = 0;

for (i = 0; i < nr_pages; i++) {
size_t vec_len;

/* * Because 'off' is potentially wrong, 'vec_len' (the length of
* this segment) becomes smaller than it should be.
*/
vec_len = min_t(size_t, size, (1UL << imu->folio_shift) - off);
bvec_set_page(&imu->bvec[i], pages[i], vec_len, off);
off = 0;
// ...
```

The bvec->bv_len stored in the `io_mapped_ubuf` is smaller than the correct value required to represent the data.

## OOB Access in (io_import_fixed)

Later, when io_import_fixed is called to perform I/O using the registered buffer, it iterates over the buffer vectors.

The logic attempts to skip segments (seg_skip) based on the requested offset.

Because the stored bv_len is artificially small (due to the bug above), the function believes it needs to skip more segments than actually exist to reach the requested offset.

```C
int io_import_fixed(int ddir, struct iov_iter *iter,
struct io_mapped_ubuf *imu,
u64 buf_addr, size_t len)
{
// ...
if (offset < bvec->bv_len) {
iter->bvec = bvec;
iter->count -= offset;
iter->iov_offset = offset;
} else {
unsigned long seg_skip;

/* skip first vec */
offset -= bvec->bv_len;

/* * VULNERABILITY:
* Because 'bvec->bv_len' was too small, the remaining 'offset' is too large.
* This causes 'seg_skip' to calculate a value larger than the bvec array length.
*/
seg_skip = 1 + (offset >> imu->folio_shift);

/* * 'iter->bvec' now points Out-Of-Bounds (OOB) past the end of the array.
* This results in the usage of uninitialized memory for io_read/io_write operations.
*/
iter->bvec = bvec + seg_skip;
iter->nr_segs -= seg_skip;
iter->count -= bvec->bv_len + offset;
iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1);
}
// ...
return 0;
}
```

## Exploit Strategy

The uninitialized memory usage in io_import_fixed can be weaponized to achieve a container escape.

1. Achieve Page UAF

By triggering the OOB condition, the iter->bvec pointer is made to point to uninitialized memory.

Heap Spray/Grooming: We manipulate the kernel heap to ensure that this uninitialized memory region contains a pointer to a page we control or have recently freed.

UAF Primitve: This allows us to perform a Use-After-Free (UAF) read or write on the target page.

2. Leak PTEs

Page Reclaiming: We spray a large number of page tables to reclaim the physical memory of the freed page.

Leak: By reading from our UAF pointer (which now aliases the newly allocated Page Table), we can leak the Page Table Entry (PTE) of an empty page. This reveals kernel physical address mappings.

3. Container Escape via core_pattern
Using the leaked information, we calculate the PTE address corresponding to the core_pattern

We use the UAF write primitive to overwrite the core_pattern to achieve container escape
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Requirements:
Capabilities: None
Kernel configuration: CONFIG_IO_URING
User namespaces required: no
Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a8edbb424b1391b077407c75d8f5d2ede77aa70d
Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3a3c6d61577dbb23c09df3e21f6f9eda1ecd634b
Affected kernel versions: v6.11 - v6.15
Affected component: io_uring
Cause: Out of bound read
Syscall to disable:
Description:
Out of bound read in io_uring
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
all: exploit
exploit: exploit.c
gcc exploit.c -static -o exploit
install: exploit
scp exploit vm:
ssh vm ./exploit p
clean:
rm -f exploit

Binary file not shown.
Loading