Skip to content

Conversation

@dpgeorge
Copy link
Member

@dpgeorge dpgeorge commented Jul 7, 2025

Summary

When mDNS is active on a netif it registers a lot of timeouts, namely:

    mdns_probe_and_announce
    mdns_handle_tc_question

    mdns_multicast_probe_timeout_reset_ipv4
    mdns_multicast_timeout_25ttl_reset_ipv4
    mdns_multicast_timeout_reset_ipv4
    mdns_send_multicast_msg_delayed_ipv4
    mdns_send_unicast_msg_delayed_ipv4

    mdns_multicast_probe_timeout_reset_ipv6
    mdns_multicast_timeout_25ttl_reset_ipv6
    mdns_multicast_timeout_reset_ipv6
    mdns_send_multicast_msg_delayed_ipv6
    mdns_send_unicast_msg_delayed_ipv6

These may still be active after a netif is removed, and if they are called they will find that the mDNS state pointer in the netif is NULL and they will crash.

These functions could be explicitly removed using sys_untimeout(), but mdns_handle_tc_question() is static so it's not possible to access it. Instead use the new sys_untimeout_all_with_arg() helper to deregister all timeout callbacks when a netif is removed.

Fixes issue #17621.

Testing

Tested with RPI_PICO_W firmware, running the test from #17621. It's been running for over 100 cycles without any issue (prior to the fix it would crash within 10 cycles).

Trade-offs and Alternatives

There's also LWIP_NETIF_REMOVE_CALLBACK which is a more lightweight way to do something when a netif is removed. But LWIP_NETIF_EXT_STATUS_CALLBACK is already enabled so easier to use that.

This fix needs to be generalised to all ports that use lwIP, but for now I just wanted something minimal that works and is easy to review / reason about.

Note that we could probably use LWIP_NETIF_EXT_STATUS_CALLBACK instead of the hooks CYW43_CB_TCPIP_INIT_EXTRA and CYW43_CB_TCPIP_DEINIT_EXTRA, but that's also for follow-up work.

@dpgeorge dpgeorge added extmod Relates to extmod/ directory in source port-rp2 labels Jul 7, 2025
@dpgeorge dpgeorge added this to the release-1.26.0 milestone Jul 7, 2025
@codecov
Copy link

codecov bot commented Jul 7, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.44%. Comparing base (8504391) to head (0b698b8).
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #17624   +/-   ##
=======================================
  Coverage   98.44%   98.44%           
=======================================
  Files         171      171           
  Lines       22208    22208           
=======================================
  Hits        21863    21863           
  Misses        345      345           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link

github-actions bot commented Jul 7, 2025

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:   +64 +0.007% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@dpgeorge dpgeorge requested a review from projectgus July 7, 2025 03:31
@dpgeorge
Copy link
Member Author

dpgeorge commented Jul 8, 2025

A possible alternative to this PR would be to never remove netif's only add them once. I'm not sure how easy that would be to do, but it could simplify things quite a bit, to be able to delete all the code that removes netifs.

Copy link
Contributor

@projectgus projectgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to fix the bug for now. The idea of never unregistering interfaces from lwip seems like a good long-term goal, though (less code size, etc).

@andrewleech
Copy link
Contributor

Yes I agree that never de-registering the netif is a worthwhile future change; this is a change I've made in #17613 (review) for just the stm32 eth driver.

Doing this has a few benefits; avoiding issues like this one certainly but also it means you can set / adjust the network interface config regardless of whether the interface is active.

If/when the stm32 pr there is finalised and the architecture agreed on I'd be happy to refactor all the other network interfaces in the same way to make them all consistent.

dpgeorge added 2 commits July 17, 2025 13:38
Really lwIP should provide this, to deregister all callbacks on the given
netif.

Signed-off-by: Damien George <damien@micropython.org>
When mDNS is active on a netif it registers a lot of timeouts, namely:

    mdns_probe_and_announce
    mdns_handle_tc_question

    mdns_multicast_probe_timeout_reset_ipv4
    mdns_multicast_timeout_25ttl_reset_ipv4
    mdns_multicast_timeout_reset_ipv4
    mdns_send_multicast_msg_delayed_ipv4
    mdns_send_unicast_msg_delayed_ipv4

    mdns_multicast_probe_timeout_reset_ipv6
    mdns_multicast_timeout_25ttl_reset_ipv6
    mdns_multicast_timeout_reset_ipv6
    mdns_send_multicast_msg_delayed_ipv6
    mdns_send_unicast_msg_delayed_ipv6

These may still be active after a netif is removed, and if they are called
they will find that the mDNS state pointer in the netif is NULL and they
will crash.

These functions could be explicitly removed using `sys_untimeout()`, but
`mdns_handle_tc_question()` is static so it's not possible to access it.
Instead use the new `sys_untimeout_all_with_arg()` helper to deregister all
timeout callbacks when a netif is removed.

Fixes issue micropython#17621.

Signed-off-by: Damien George <damien@micropython.org>
@dpgeorge dpgeorge force-pushed the rp2-fix-lwip-mdns-crash-on-netif-remove branch from c409b3c to 0b698b8 Compare July 17, 2025 03:41
@dpgeorge dpgeorge merged commit 0b698b8 into micropython:master Jul 17, 2025
67 of 68 checks passed
@dpgeorge dpgeorge deleted the rp2-fix-lwip-mdns-crash-on-netif-remove branch July 17, 2025 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extmod Relates to extmod/ directory in source port-rp2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants