Fix RealMutableStore write-queue data race (inverted lock polarity) by MartinStrambach · Pull Request #735 · MobileNativeFoundation/Store

MartinStrambach · 2026-06-01T13:01:00Z

Fix `RealMutableStore` write-queue data race (inverted lock polarity)

Summary

RealMutableStore's per-key write-request queue is a non-thread-safe ArrayDeque
(internal/definition/WriteRequestQueue.kt), but the lock guarding it has inverted polarity:
the mutators take a shared lock while the only exclusive holder is a pure read. Concurrent
writes to the same key therefore mutate the deque while another coroutine iterates it, corrupting
the backing array.

JVM: the deque's fail-fast iterator throws ConcurrentModificationException, which
RealMutableStore catches and surfaces as StoreWriteResponse.Error.Exception.
Kotlin/Native: the racing structural add reallocates the backing array under an active
iteration → freed-pointer dereference → EXC_BAD_ACCESS (a hard process crash).

This reproduces in production on a multithreaded native dispatcher under sustained same-key writes
(~every few seconds across concurrent streams).

Root cause

withWriteRequestQueueLock guards the queue with a Lightswitch — a shared/reader lock that lets
multiple holders run concurrently:

threadSafety.writeRequests.lightswitch.lock(threadSafety.writeRequests.mutex)
try {
    getQueue(key).block()
} finally {
    threadSafety.writeRequests.lightswitch.unlock(threadSafety.writeRequests.mutex)
}

All mutating access flows through this method:

addWriteRequestToQueue → queue.add(writeRequest)
updateWriteRequestQueue → for (writeRequest in this) { ... } (iterate + rebuild)

Because both take the shared lightswitch, they do not exclude each other. The only operation
taking the exclusive writeRequests.mutex is getLatestWriteRequest → queue.last(), a pure
read. The polarity is backwards: the mutators share, the reader is exclusive.

Fix

Guard all write-queue access with the per-key exclusive mutex, so add / iterate / rebuild
mutually exclude each other (and exclude getLatestWriteRequest, which already uses
writeRequests.mutex):

return threadSafety.writeRequests.mutex.withLock {
    val queue = getQueue(key)
    queue.block()
}

Regression test

MutableStoreConcurrencyTest (commonTest) fires 64 concurrent writes to the same key over 50
rounds on Dispatchers.Default, plus a sequential baseline. It asserts no corruption-class error
(ConcurrentModificationException / NullPointerException / IndexOutOfBoundsException); on Native,
reaching the assertion at all proves the process didn't crash.

Verified:

JVM (:store:jvmTest): RED pre-fix (NullPointerException, 64/64 in round 0) → GREEN post-fix.
Native (:store:iosSimulatorArm64Test): RED pre-fix (kotlin.ConcurrentModificationException,
round 19) → GREEN post-fix.
Full JVM suite: 101 tests, 0 failures (no regression).

Note: concurrent writes to the same key can still legitimately fail with
IllegalArgumentException("No writes found ...") — a separate, pre-existing logical race where
one write drains another's queue entry. That is not memory corruption and is out of scope here; the
test tolerates it and documents it inline.

Notes for reviewers

Lightswitch removed (2nd commit). Once withWriteRequestQueueLock stopped using it, nothing
acquired it anywhere — readCompletions already locked via its own mutex, so the lightswitch
field on StoreThreadSafety was read by nothing. The 2nd commit drops the field and deletes the
class. Lightswitch/StoreThreadSafety/ThreadSafety are all internal and absent from the
binary-compatibility-validator dumps, so this changes no public API/ABI — jvmApiCheck and
klibApiCheck both pass unchanged. (Happy to split this into a follow-up PR if you'd prefer the
fix land in isolation.)
Re-entrancy: kotlinx.coroutines Mutex is non-reentrant. The call graph has no nested
acquisition of the same writeRequests.mutex (getLatestWriteRequest unlocks in finally before
tryUpdateServer → updateWriteRequestQueue; addWriteRequestToQueue runs in onEach before the
collect block), so this is safe. One caveat worth a maintainer's eye: updateWriteRequestQueue
invokes user onCompletion(s) callbacks while holding the queue lock — a callback that re-enters
the same key's write path would now deadlock instead of (unsafely) racing. Re-entrant store writes
from a completion callback are an unusual pattern, but flagging it explicitly.

Base

Branched off main (5.1.0-alpha08). The same methods exist unchanged back through 5.1.0-alpha06,
so the fix backports cleanly.

MartinStrambach · 2026-06-01T13:14:02Z

@matt-ramotar would you be able to take a look at this when you get a chance? We're hitting this race in production — it manifests as a hard EXC_BAD_ACCESS crash on Kotlin/Native (iOS) under sustained concurrent same-key writes, so we'd love to get it fixed ASAP.

Root cause is the inverted lock polarity in RealMutableStore.withWriteRequestQueueLock (mutators take the shared Lightswitch, only a read takes the exclusive mutex), letting add() race the queue iteration on a non-thread-safe ArrayDeque. The fix is small and comes with a regression test that's red pre-fix / green post-fix on both JVM and native; jvmApiCheck + klibApiCheck confirm no public API/ABI change. Happy to adjust anything you'd like. Thanks!

matt-ramotar · 2026-06-07T10:35:57Z

@MartinStrambach Thanks for reporting! Looking

The `build-and-test` checkout pinned `ref: ${{ github.head_ref || github.ref }}` without a matching `repository`, so for cross-repository (fork) PRs Actions looked for the head branch in the base repo and failed at checkout in ~7s with "a branch or tag with the name '<head-branch>' could not be found" — before any build or test ran. This affected all external/fork contributions (e.g. #735). - Checkout: resolve `repository` and `ref` from the PR head when present (`github.event.pull_request.head.*`), falling back to `github.repository` / `github.ref` for push builds. The `ref` uses the head branch *name* (not the head SHA) so HEAD stays attached to a branch — the KMMBridge plugin runs `git pull --tags`, which fails on a detached HEAD ("you are not currently on a branch"). Fork PRs now check out the contributor's head branch; same-repo PRs and pushes to main are unchanged. - Codecov: skip the upload on fork PRs, where `CODECOV_TOKEN` is unavailable and `fail_ci_if_error: true` would otherwise fail the job. Coverage is still uploaded and enforced for same-repo PRs and pushes to main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

matt-ramotar · 2026-06-07T11:24:32Z

@MartinStrambach can you rebase?

The per-key write-request queue is a non-thread-safe ArrayDeque, but withWriteRequestQueueLock guarded it with a Lightswitch — a shared/reader lock that lets multiple holders run concurrently. So addWriteRequestToQueue (add) and updateWriteRequestQueue (iterate + rebuild) could run on the same deque at once. A structural add during iteration corrupts the backing array: ConcurrentModificationException on the JVM, EXC_BAD_ACCESS on Kotlin/Native. The only exclusive (mutex) holder was a pure read (getLatestWriteRequest) — the polarity was inverted. Guard all write-queue access with the per-key exclusive mutex instead, so add/iterate/rebuild mutually exclude each other and getLatestWriteRequest. Adds MutableStoreConcurrencyTest reproducing the race (red pre-fix on both JVM and native, green after). Lightswitch is now unused in RealMutableStore; left in place to keep the diff focused (can be removed as a follow-up). Signed-off-by: Martin Strambach <martin.strambach@gmail.com>

After the previous commit nothing acquires the per-key Lightswitch: withWriteRequestQueueLock uses writeRequests.mutex directly, and readCompletions already used readCompletions.mutex. The lightswitch field on StoreThreadSafety was therefore read by nothing. Drop the field and delete the class. All three types (Lightswitch, StoreThreadSafety, ThreadSafety) are internal and absent from the binary-compatibility-validator dumps, so this changes no public API/ABI: jvmApiCheck and klibApiCheck both pass unchanged. Signed-off-by: Martin Strambach <martin.strambach@gmail.com>

MartinStrambach · 2026-06-07T14:01:13Z

Rebased onto main. Both commits applied cleanly with no changes.

matt-ramotar · 2026-06-07T19:51:10Z

@MartinStrambach looks like you just need to run ktlintFormat

Signed-off-by: Martin Strambach <martin.strambach@gmail.com>

MartinStrambach · 2026-06-07T20:09:57Z

@matt-ramotar should be fixed

github-project-automation Bot added this to Store Roadmap Jun 1, 2026

github-project-automation Bot moved this to 🆕 Triage in Store Roadmap Jun 1, 2026

MartinStrambach force-pushed the fix/mutable-store-write-queue-race branch 2 times, most recently from 6a974db to d84ded3 Compare June 1, 2026 13:12

matt-ramotar mentioned this pull request Jun 7, 2026

Fix CI checkout and coverage upload for fork PRs #737

Merged

MartinStrambach added 2 commits June 7, 2026 15:55

MartinStrambach force-pushed the fix/mutable-store-write-queue-race branch from d84ded3 to d75ed05 Compare June 7, 2026 13:56

Apply ktlintFormat

4e65786

Signed-off-by: Martin Strambach <martin.strambach@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RealMutableStore write-queue data race (inverted lock polarity)#735

Fix RealMutableStore write-queue data race (inverted lock polarity)#735
MartinStrambach wants to merge 3 commits into
MobileNativeFoundation:mainfrom
MartinStrambach:fix/mutable-store-write-queue-race

MartinStrambach commented Jun 1, 2026 •

edited

Loading

Uh oh!

MartinStrambach commented Jun 1, 2026

Uh oh!

matt-ramotar commented Jun 7, 2026

Uh oh!

matt-ramotar commented Jun 7, 2026

Uh oh!

MartinStrambach commented Jun 7, 2026

Uh oh!

matt-ramotar commented Jun 7, 2026

Uh oh!

MartinStrambach commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MartinStrambach commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix RealMutableStore write-queue data race (inverted lock polarity)

Summary

Root cause

Fix

Regression test

Notes for reviewers

Base

Uh oh!

MartinStrambach commented Jun 1, 2026

Uh oh!

matt-ramotar commented Jun 7, 2026

Uh oh!

matt-ramotar commented Jun 7, 2026

Uh oh!

MartinStrambach commented Jun 7, 2026

Uh oh!

matt-ramotar commented Jun 7, 2026

Uh oh!

MartinStrambach commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MartinStrambach commented Jun 1, 2026 •

edited

Loading

Fix `RealMutableStore` write-queue data race (inverted lock polarity)