Skip to content

linux-cachyos 7.0.1: NULL deref in select_task_rq_fair() from pollwake/__wake_up_sync_key → kernel zombie task → full system freeze #828

@Evil-Overlord-666

Description

@Evil-Overlord-666

Kernel NULL deref in select_task_rq_fair from pollwake path → unrecoverable system freeze

Full kernel trace (gist): https://gist.github.com/Evil-Overlord-666/cb5f046c88894ea16d0473f0b38d1834


Summary

On 2026-04-30 at 19:51:31 a BUG: kernel NULL pointer dereference, address: 0x0000000000000044
fired on CPU 4 inside select_task_rq_fair(), called via the wake-up path
unix_stream_sendmsg → sock_def_readable → __wake_up_sync_key → pollwake → try_to_wake_up → select_task_rq.
Process context was Xwayland (PID 2504, UID 1000) running a writev(2) syscall on a unix-domain socket.

The Oops printed note: Xwayland[2504] exited with irqs disabled and
exited with preempt_count 3. Because the task died with IRQs disabled and
preempt_count > 0, the kernel could not finish reaping it. From 19:52:31 onward
the same task remained pinned on CPU 13 spinning in
native_queued_spin_lock_slowpath inside do_exit → __fput → sock_close → unix_release_sock → sock_def_wakeup → __wake_up. RCU stall warnings
("rcu_preempt detected stalls on CPUs/tasks ... P2504") repeated every ~3 minutes
until I hard-rebooted at 23:24 — roughly 3.5 hours of escalating freeze.

There appear to be two distinct bugs here:

  1. Primary: the NULL deref in select_task_rq_fair reachable from the
    pollwake path on a normal writev to a unix socket. CR2 = 0x44 suggests a
    small offset into a NULL task_struct / rq / cfs_rq pointer.
  2. Secondary (recovery): Oops handler does not safely tear down a task that
    crashed with irqs_disabled() && preempt_count > 0. The task's outstanding
    spinlock is never released, so the entire system grinds to a halt instead of
    just losing Xwayland.

Reproducibility

Once, observed in the wild. Not deliberately reproducible. The system had been
up for ~22 hours and was logged in to a Plasma 6.6.4 Wayland session; I was
away. No special workload at the moment of crash.

System

Field Value
Distro CachyOS (rolling)
Kernel 7.0.1-1-cachyos #1 SMP PREEMPT Thu, 23 Apr 2026 21:04:50 +0000 x86_64
Build hash 064ce857db72d62f7ca6e6781b81b6ace6735267
Compiler clang 22.1.3 (kernel built with LLVM/clang)
Cmdline quiet nowatchdog splash rw rootflags=subvol=/@ root=UUID=...
Init systemd v260
DE KDE Plasma 6.6.4, kwin_wayland, Xwayland 24.1.10
CPU Intel Core i9-13900KF (8P+16E, 32 threads)
Board MSI MEG Z790 ACE (MS-7D86), BIOS 1.F0 (2025-08-07)
RAM 64 GiB DDR5
GPU NVIDIA RTX 4090, proprietary driver 595.58.03
Out-of-tree modules nvidia(O), nvidia_drm(O), nvidia_modeset(O), nvidia_uvm(O), razerkbd(OE), razermouse(OE)
Root FS btrfs on /dev/sdd2 (CachyOS install)

Kernel taint at time of Oops: G OE (out-of-tree + unsigned modules).
Note: the fault site itself is in mainline scheduler code (select_task_rq_fair), not in any of the
out-of-tree modules. NVIDIA + Razer modules are loaded but do not appear in the call stack.

Primary Oops (verbatim, abbreviated — full trace in the linked gist above)

Apr 30 19:51:31 ArchEnemy kernel: BUG: kernel NULL pointer dereference, address: 0000000000000044
Apr 30 19:51:31 ArchEnemy kernel: #PF: supervisor read access in kernel mode
Apr 30 19:51:31 ArchEnemy kernel: #PF: error_code(0x0000) - not-present page
Apr 30 19:51:31 ArchEnemy kernel: PGD 1dd8dc067 P4D 1dd8dc067 PUD 0
Apr 30 19:51:31 ArchEnemy kernel: Oops: Oops: 0000 [#1] SMP NOPTI
Apr 30 19:51:31 ArchEnemy kernel: CPU: 4 UID: 1000 PID: 2504 Comm: Xwayland Tainted: G           OE       7.0.1-1-cachyos #1 PREEMPT  064ce857db72d62f7ca6e6781b81b6ace6735267
Apr 30 19:51:31 ArchEnemy kernel: Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Apr 30 19:51:31 ArchEnemy kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D86/MEG Z790 ACE (MS-7D86), BIOS 1.F0 08/07/2025
Apr 30 19:51:31 ArchEnemy kernel: RIP: 0010:select_task_rq_fair.llvm.18029127130511906342+0x2cdd/0x2d30
Apr 30 19:51:31 ArchEnemy kernel: RSP: 0018:ffffcbb785787758 EFLAGS: 00010002
Apr 30 19:51:31 ArchEnemy kernel: RAX: 0000000000000001 RBX: ffff8afaca0efe00 RCX: 0000000000000001
Apr 30 19:51:31 ArchEnemy kernel: RDX: 0000000000000002 RSI: 0000000000000020 RDI: ffff8afad4155280
Apr 30 19:51:31 ArchEnemy kernel: RBP: 0000000000000018 R08: 0000000000000000 R09: ffff8afac0dd5280
Apr 30 19:51:31 ArchEnemy kernel: R10: 0000000000004000 R11: 0000000000000000 R12: 0000000000000004
Apr 30 19:51:31 ArchEnemy kernel: R13: 0000000000000004 R14: ffffcbb785787800 R15: 0000000000000000
Apr 30 19:51:31 ArchEnemy kernel: FS:  00007fd821c9ba00(0000) GS:ffff8b0a81f91000(0000) knlGS:0000000000000000
Apr 30 19:51:31 ArchEnemy kernel: CR2: 0000000000000044 CR3: 0000000120956002 CR4: 0000000000f72ef0
Apr 30 19:51:31 ArchEnemy kernel: Call Trace:
Apr 30 19:51:31 ArchEnemy kernel:  <TASK>
Apr 30 19:51:31 ArchEnemy kernel:  ? obj_cgroup_charge_account.llvm.8602441956471480031+0x131/0x150
Apr 30 19:51:31 ArchEnemy kernel:  ? __memcg_slab_post_alloc_hook+0x304/0x3a0
Apr 30 19:51:31 ArchEnemy kernel:  select_task_rq+0x81/0xe0
Apr 30 19:51:31 ArchEnemy kernel:  try_to_wake_up+0x258/0x6a0
Apr 30 19:51:31 ArchEnemy kernel:  pollwake+0xa1/0xd0
Apr 30 19:51:31 ArchEnemy kernel:  ? __pfx_default_wake_function+0x10/0x10
Apr 30 19:51:31 ArchEnemy kernel:  __wake_up_sync_key+0x65/0xa0
Apr 30 19:51:31 ArchEnemy kernel:  sock_def_readable+0x44/0xd0
Apr 30 19:51:31 ArchEnemy kernel:  unix_stream_sendmsg+0x1d1/0x7e0
Apr 30 19:51:31 ArchEnemy kernel:  __sock_sendmsg+0x6f/0x90
Apr 30 19:51:31 ArchEnemy kernel:  sock_write_iter+0xee/0x140
Apr 30 19:51:31 ArchEnemy kernel:  do_iter_readv_writev+0x18e/0x1f0
Apr 30 19:51:31 ArchEnemy kernel:  vfs_writev+0x1db/0x410
Apr 30 19:51:31 ArchEnemy kernel:  do_writev+0x76/0x110
Apr 30 19:51:31 ArchEnemy kernel:  do_syscall_64+0x111/0xa50
Apr 30 19:51:31 ArchEnemy kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 30 19:51:31 ArchEnemy kernel: ---[ end trace 0000000000000000 ]---
Apr 30 19:51:31 ArchEnemy kernel: note: Xwayland[2504] exited with irqs disabled
Apr 30 19:51:31 ArchEnemy kernel: note: Xwayland[2504] exited with preempt_count 3

Secondary failure — RCU stall / queued-spinlock deadlock during reap

After the Oops, the task did not finish dying. Every ~3 minutes:

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:         13-...0: (1 ticks this GP) idle=b2e4/1/0x4000000000000000 softirq=180214/180214 fqs=...
rcu:         Tasks blocked on level-1 rcu_node (CPUs 0-15): P2504
rcu:         (detected by 27, t=12300406 jiffies, g=6504897, q=119869 ncpus=32)
Sending NMI from CPU 27 to CPUs 13:
RIP: 0010:native_queued_spin_lock_slowpath+0x9d/0x2d0
Call Trace:
  _raw_spin_lock_irqsave+0x3e/0x50
  __wake_up+0x27/0xb0
  sock_def_wakeup+0x3f/0x50
  unix_release_sock+0x225/0x400
  unix_release+0x34/0x50
  sock_close+0x47/0xd0
  __fput+0xf9/0x280
  task_work_run+0x9d/0xc0
  do_exit+0x32a/0xaa0
  make_task_dead+0x80/0x150
  rewind_stack_and_make_dead+0x16/0x20

The same PID (2504) is still on the CPU 4 hours later, holding a queued
spinlock that was acquired before the original Oops and never released because
the Oops handler returned with irqs_disabled() && preempt_count == 3.

Stall counter t=12300406 jiffies ≈ 12,300 seconds ≈ 3 h 25 min — the entire
time the I was away from the machine.

What I think is going on (best-effort, not authoritative)

  • pollwake() calls try_to_wake_up() for the task on the other end of the
    poll wait queue. try_to_wake_up calls select_task_rq() which dispatches
    to select_task_rq_fair() for SCHED_NORMAL tasks.
  • The fault is at select_task_rq_fair+0x2cdd/0x2d30 (very near the function
    end), reading address 0x44. CR2 = 0x44 is consistent with dereferencing a
    small structure offset off a NULL base — most likely a task_struct /
    sched_entity / cfs_rq pointer that became NULL (use-after-free on the
    poll wait entry's task pointer? task exited concurrently?).
  • The Oops occurs while holding the wake queue's spinlock (taken in
    __wake_up_sync_key). When Xwayland[2504] exited with irqs disabled, that
    spinlock is leaked. Subsequent attempts by any CPU to take that lock will
    spin forever in the queued-spinlock slowpath, which is exactly what we see
    later.

CachyOS kernel uses BORE/EEVDF/sched-ext patches in addition to clang LTO; the
fault is in select_task_rq_fair, which is fair-class code, so this could be:

  • a generic upstream bug also present in mainline,
  • specific to a CachyOS scheduler patch,
  • or interaction with the LLVM-built kernel layout (the .llvm.<hash> symbol
    suffix indicates clang's local-symbol mangling).

I have no way to disambiguate without testing on a linux-cachyos-lts (6.18.22)
or a vanilla mainline kernel. I'm staying on 7.0.1 for now, so reproduction
data may follow if it recurs.

Asks

  1. Has anyone else hit select_task_rq_fair NULL derefs on 7.0.1?
  2. Does CachyOS apply patches to select_task_rq_fair / EEVDF / BORE that
    could put this fault offset (+0x2cdd) in a known region?
  3. The "exited with irqs disabled / preempt_count 3" -> RCU stall pattern is
    really the user-visible bug. Even if the primary deref is rare, the recovery
    path turning a single-task crash into a system-wide freeze is a separate
    robustness issue worth flagging.

Reporter: @Evil-Overlord-666 (will update if/when the issue is opened)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions