Skip to content

zephyr/machine_timer: Race condition calling machine_timer_deinit() from k_timer ISR context #18728

@calinfaja

Description

@calinfaja

Port, board and/or hardware

Zephyr port (all boards). The bug is in ports/zephyr/machine_timer.c which is shared across all Zephyr targets.

MicroPython version

MicroPython v1.27.0 (tag: v1.27.0)

Bug introduced in commit 3823aeb0f (Feb 2025) — "zephyr/machine_timer: Add machine.Timer class implementation."
Present in master as of v1.27.0.

Reproduction

machine_timer_callback() is called from Zephyr k_timer ISR context. For ONE_SHOT mode (or when mp_irq_dispatch fails), it calls machine_timer_deinit(), which traverses and modifies the MP_STATE_PORT(machine_timer_obj_head) linked list. The main thread may concurrently access the same list (GC, scheduler, other timer operations).

Minimal reproduction (run via mpremote exec, repeat if needed — the race is intermittent):

from machine import Timer
import time

def callback(t):
    print("fired")

t = Timer(-1)
t.init(period=100, mode=Timer.ONE_SHOT, callback=callback)
time.sleep_ms(200)
print("done")

Stress test (more reliable trigger):

from machine import Timer
import time

for i in range(20):
    t = Timer(-1)
    t.init(period=10, mode=Timer.ONE_SHOT, callback=lambda x: None)
    time.sleep_ms(50)
print("stress test passed")

Failure sequence:

  1. Timer fires after timeout in Zephyr k_timer ISR context
  2. ISR calls machine_timer_deinit() which traverses/modifies the linked list
  3. Main thread in time.sleep_ms() may be accessing the same list (GC, scheduler)
  4. Race condition or deadlock occurs
  5. Board hangs, mpremote exec times out

The buggy code in ports/zephyr/machine_timer.c (lines 65–76):

static void machine_timer_callback(struct k_timer *timer) {
    machine_timer_obj_t *self = (machine_timer_obj_t *)k_timer_user_data_get(timer);

    if (mp_irq_dispatch(self->callback, MP_OBJ_FROM_PTR(self), self->ishard) < 0) {
        self->mode = TIMER_MODE_ONE_SHOT;
    }

    if (self->mode == TIMER_MODE_ONE_SHOT) {
        machine_timer_deinit(self);  // BUG: linked list mutation from ISR
    }
}

machine_timer_deinit() (lines 158–180) traverses and modifies MP_STATE_PORT(machine_timer_obj_head) — a linked list not protected against concurrent access from ISR and thread contexts.

Expected behaviour

The one-shot timer fires, the callback executes, and the program continues normally. Timer cleanup from ISR context should only perform ISR-safe operations (e.g. k_timer_stop()), not linked list manipulation.

Observed behaviour

The board intermittently hangs (deadlock/race condition). When run via mpremote exec, the command times out waiting for a response. The hang occurs because machine_timer_deinit() modifies the machine_timer_obj_head linked list from ISR context while the main thread may be traversing the same list.

Additional Information

Proposed fix: Replace machine_timer_deinit(self) with k_timer_stop(&self->my_timer) in the ISR callback. k_timer_stop() is ISR-safe per Zephyr's API guarantees. Linked list cleanup is deferred to thread context (GC, explicit deinit(), or machine_timer_deinit_all() on soft reset).

static void machine_timer_callback(struct k_timer *timer) {
    machine_timer_obj_t *self = (machine_timer_obj_t *)k_timer_user_data_get(timer);

    if (self->mode == TIMER_MODE_ONE_SHOT) {
        k_timer_stop(&self->my_timer);
    }

    if (mp_irq_dispatch(self->callback, MP_OBJ_FROM_PTR(self), self->ishard) < 0) {
        self->mode = TIMER_MODE_ONE_SHOT;
        k_timer_stop(&self->my_timer);
    }
}

I can submit a PR with this fix if the approach is acceptable.

Code of Conduct

  • Yes, I agree to follow the MicroPython Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions