Skip to content

Conversation

@andrewleech
Copy link
Contributor

@andrewleech andrewleech commented Jul 4, 2025

Summary

This PR implements background link detection, hot-plug support, and multipleminor bug fixes for the STM32 Ethernet driver. The changes eliminate blocking timeouts, enable static IP configuration workflows, and fix reliability issues discovered during testing and code review.

Key Features:

  • Static IP configuration before LAN.active(True) with proper preservation
  • Automatic DHCP restart on cable replug
  • Background PHY polling for cable connect/disconnect detection
  • Non-blocking interface activation (56× faster with cable, no timeout without)
  • Dynamic MAC speed/duplex configuration matching PHY autonegotiation
  • Network initialization before boot.py execution

Critical Fixes:

  • Fixed static IP being overwritten after MAC reconfiguration
  • Fixed DHCP restart logic to work in all states
  • Eliminated 30ms IRQ blackout during MAC reconfiguration
  • Fixed IEEE 802.3 PHY_BSR latched-low bit handling for reliable hot-plug
  • Fixed race conditions in link detection and MAC configuration

Breaking Change:
LAN.active() now returns interface enabled state instead of link status. Use LAN.status() or LAN.isconnected() to check cable connection.

Testing

Hardware Tested:

  • NUCLEO_H563ZI (STM32H563ZI) - Primary development board
  • NUCLEO_F429ZI (STM32F429ZI) - Regression testing

Test Coverage - 11 Scenarios:

Basic Functionality:

  1. Fresh boot with cable → DHCP obtains IP in 2-3 seconds
  2. active(True) without cable → Returns in 28ms, no timeout
  3. Link status before active() → Returns current state via direct PHY read
  4. Fresh boot without cable → active(True) succeeds immediately

Static IP Configuration:
5. Static IP before active() → IP preserved, DHCP not started
6. Static IP after DHCP → Static IP replaces DHCP
7. Static IP through toggle → IP survives active(False)/active(True)
8. Static IP during hot-plug → IP preserved during cable unplug/replug

Hot-Plug Detection:
9. Boot without cable + plug-in → Link detected, DHCP starts immediately
10. Hot-unplug detection → Status 3→0 within 1 second
11. Hot-plug DHCP recovery → Status 0→3, DHCP completes in 2-4 seconds

Performance Improvements:

Operation Before After Improvement
active(True) with cable 1586ms 28ms 56× faster
active(True) without cable 10s timeout 28ms No blocking
DHCP acquisition 6 seconds 2-3 seconds 2× faster
IRQ blackout during reconfig 30ms 0ms Eliminated

Board-Specific Issues Found & Fixed:

  • H563ZI: ETH clock stopped during CPU WFI → Enabled clock continuation during sleep
  • F429ZI: MAC hardcoded 100Mbps, PHY negotiated different → Dynamic MAC reconfiguration

Detailed Changes

1. Network Initialization Order (1307179)

Problem: network.LAN() couldn't be instantiated in boot.py due to initialization order.

Solution: Moved mod_network_init() before boot.py execution. Safe because LWIP initialization already occurs earlier; mod_network_init only sets up the NIC list.

Impact: Enables network configuration in boot.py.

Files: stm32/main.c, stm32/mpnetworkport.c


2. Background Link Detection (90297bc)

Problem: No mechanism to detect cable unplug/replug events. Blocking PHY initialization caused multi-second delays.

Architectural Changes:

  • Split LWIP netif initialization into early phase (eth_init) and late phase (eth_start)
  • Added eth_phy_link_status_poll() called every ~250ms from background timer
  • Removed blocking PHY autonegotiation loops from initialization path
  • Integrated with LWIP via netif_set_link_up/down()

Rationale for Polling: STM32 Nucleo boards don't wire PHY interrupt lines to MCU. Polling is the only method to detect cable state changes without hardware modifications.

Functional Changes:

  • active(True) returns immediately (~28ms) regardless of cable state
  • Static IP can be configured before calling active(True)
  • PHY autonegotiation happens asynchronously in background
  • active() now returns interface enabled state (not link status) - BREAKING CHANGE

Performance: 56× faster activation, 2× faster DHCP

Files: stm32/eth.c (+262 lines), stm32/eth.h, stm32/network_lan.c, stm32/mpnetworkport.c


3. MAC Speed/Duplex and DHCP Hot-Plug (e75650e)

Problem: MAC speed/duplex hardcoded at initialization. DHCP didn't restart after cable replug.

MAC Speed/Duplex:

  • Poll for PHY autonegotiation completion in background
  • Reconfigure MAC speed/duplex to match negotiated values
  • 5-second timeout with fallback to 10Mbps Half-Duplex
  • Protected MAC reconfiguration with IRQ disable (later improved in Fix 1)

DHCP Hot-Plug:

  • Detect cable replug via link status change
  • Restart DHCP using dhcp_stop() + dhcp_start() pattern
  • Preserve static IP by checking DHCP state

Issue Fixed: F429ZI regression where MAC/PHY speed mismatch prevented DHCP.

Files: stm32/eth.c (+100 lines), stm32/eth_phy.h


Migration Guide

Breaking Change: active() Method

Old Behavior (v1.26.1 and earlier):

lan = network.LAN()
lan.active(True)
if lan.active():  # Returned True only if cable connected
    print("Cable is plugged in")

New Behavior (this PR):

lan = network.LAN()
lan.active(True)
if lan.active():  # Returns True if interface is enabled
    print("Interface is enabled")

# To check cable connection, use:
if lan.isconnected():  # or lan.status() == 3
    print("Cable is plugged in and IP acquired")

Rationale: Separates interface administrative state from physical link state, consistent with other network stacks and allows active(True) to succeed without blocking on cable presence.


Example Usage

Static IP Before Activation

import network

lan = network.LAN()
lan.active(False)

# Configure static IP while interface is down
lan.ifconfig(('192.168.1.100', '255.255.255.0', '192.168.1.1', '8.8.8.8'))

# Activate interface - IP is preserved
lan.active(True)

# Check status
print(f"Status: {lan.status()}")  # 0=down, 1=link, 2=link+no IP, 3=connected
print(f"IP: {lan.ifconfig()[0]}")  # 192.168.1.100

Boot.py Network Configuration

# In boot.py - now works!
import network

network.country('US')
network.hostname('my-device')

lan = network.LAN()
lan.active(True)

# Wait for connection (cable may not be plugged yet)
import time
for _ in range(30):  # 30 second timeout
    if lan.isconnected():
        print(f'Connected: {lan.ifconfig()[0]}')
        break
    time.sleep(1)

Hot-Plug Detection

import network, time

lan = network.LAN()
lan.active(True)

while True:
    status = lan.status()
    if status == 3:
        print(f"Connected: {lan.ifconfig()[0]}")
    elif status == 0:
        print("Cable unplugged")
    time.sleep(1)

@github-actions
Copy link

github-actions bot commented Jul 4, 2025

Code size report:

Reference:  stm32/usb: Add VBUS sensing configuration for TinyUSB on F4/F7. [27b7bf3]
Comparison: stm32/eth: Restore CLK_SLEEP_ENABLE for packet reception during CPU sleep. [merge of 5c2c306]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@panuph
Copy link

panuph commented Jul 7, 2025

Thanks @andrewleech for this PR. I've done some test based on my use case around the use of network.LAN().isconnected().

This is the test result from the micropython version (branch master commit 35f15cf) that I am using:

# Disconnect the Ethernet cable and wait for 10 seconds

import network

network.LAN().active(False)

network.LAN().ifconfig()  # ('0.0.0.0', '0.0.0.0', '0.0.0.0', '0.0.0.0')
network.LAN().active()  # False
network.LAN().isconnected()  # False

# Connect the Ethernet cable and wait for 10 seconds

network.LAN().ifconfig()  # ('0.0.0.0', '0.0.0.0', '0.0.0.0', '0.0.0.0')
network.LAN().active()  # True
network.LAN().isconnected()  # False

network.LAN().active(True)

network.LAN().ifconfig()  # ('0.0.0.0', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # True
network.LAN().isconnected()  # False

network.LAN().ipconfig(addr4=("192.168.1.1", "255.255.255.0"))

network.LAN().ifconfig()  # ('192.168.1.1', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # True
network.LAN().isconnected()  # True

# Disconnect the Ethernet cable and wait for 10 seconds

network.LAN().ifconfig()  # ('192.168.1.1', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # True
network.LAN().isconnected()  # True (this is most likely a bug!)

# Connect the Ethernet cable and wait for 10 seconds

network.LAN().ifconfig()  # ('192.168.1.1', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # True
network.LAN().isconnected()  # True

network.LAN().active(False)

network.LAN().ifconfig()  # ('0.0.0.0', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # False
network.LAN().isconnected()  # False

# Disconnect the Ethernet cable and wait for 10 seconds

network.LAN().active(True)  # OSError: [Errno 110] ETIMEDOUT

This is the test result from your PR:

# Disconnect the Ethernet cable and wait for 10 seconds

import network

network.LAN().active(False)

network.LAN().ifconfig()  # ('0.0.0.0', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # False
network.LAN().isconnected()  # False

# Connect the Ethernet cable and wait for 10 seconds

network.LAN().ifconfig()  # ('0.0.0.0', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # False
network.LAN().isconnected()  # False

network.LAN().active(True)

network.LAN().ifconfig()  # ('0.0.0.0', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # True
network.LAN().isconnected()  # False

network.LAN().ipconfig(addr4=("192.168.1.1", "255.255.255.0"))

network.LAN().ifconfig()  # ('192.168.1.1', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # True
network.LAN().isconnected()  # True

# Disconnect the Ethernet cable and wait for 10 seconds

network.LAN().ifconfig()  # ('192.168.1.1', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # True
network.LAN().isconnected()  # False

# Connect the Ethernet cable and wait for 10 seconds

network.LAN().ifconfig()  # ('192.168.1.1', '255.255.255.0', '192.168.0.1', '8.8.8.8')
network.LAN().active()  # True
network.LAN().isconnected()  # True

network.LAN().active(False)

network.LAN().ifconfig()  # # ('0.0.0.0', '0.0.0.0', '0.0.0.0', '8.8.8.8')
network.LAN().active()  # False
network.LAN().isconnected()  # False

# Disconnect the Ethernet cable and wait for 10 seconds

network.LAN().ifconfig()  # # ('0.0.0.0', '0.0.0.0', '0.0.0.0', '8.8.8.8')
network.LAN().active()  # True
network.LAN().isconnected()  # False

network.LAN().isconnected() works correctly with your fix. Please note OSError: [Errno 110] ETIMEDOUT is no longer raised with your code when activating the network with no ethernet cable connected.

@mattytrentini
Copy link
Contributor

Should this still be in draft? It's looking pretty complete from our testing!

#define PHY_SPEED_100FULL (6)
#define PHY_DUPLEX (4)

// PHY interrupt registers (common for LAN87xx and DP838xx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants aren't used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I'd initially thought I'd be able to use phy interrupts to trigger the link up/down events, however in many/most cases (including the H5 nucleo) the phy used shares a pin for both ref clock out and interrupt out aka if you're using the reference clock (like we are) you can't get interrupts from the phy.

__HAL_RCC_ETHRX_CLK_SLEEP_ENABLE();
__HAL_RCC_ETH_CLK_SLEEP_DISABLE();
__HAL_RCC_ETHTX_CLK_SLEEP_DISABLE();
__HAL_RCC_ETHRX_CLK_SLEEP_DISABLE();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change really necessary? I would think that it's OK to just enable the clock sleep feature at the start, and not need to disable/enable it during reset.

Copy link
Contributor Author

@andrewleech andrewleech Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the ENABLE lines the way he was previously is actually what I found prevented .active(True) from working when the ethernet cable was connected (on the H5 at least). The reset just after this would timeout unless the cable was connected.

I'm not sure if explicitly disabling it here is necessary, though I thought it might be a good idea because I don't really know what state the system is in at this point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. Maybe it has to do with the WFI hiding in the mp_hal_delay_ms(2) (because the loop below that line is a busy loop and does not do WFI).

Well, probably it's OK to just disable clock-sleep altogether, because we do want ETH to run "in the background" when the CPU does WFI. All other peripherals do that (disable clock-sleep), eg USB.

#if defined(STM32H5)
__HAL_RCC_ETH_RELEASE_RESET();

__HAL_RCC_ETH_CLK_SLEEP_ENABLE();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test again with
__HAL_RCC_ETH_CLK_SLEEP_DISABLE(); for these to be explicit, with comment why.

@andrewleech
Copy link
Contributor Author

CLK_SLEEP Configuration Investigation and Fix

During code review, we discovered that the restructure commit (90297bc) inadvertently removed all CLK_SLEEP_ENABLE/DISABLE calls, leaving the driver dependent on the HAL default state.

Investigation: CLK_SLEEP Semantics

The macro naming is counter-intuitive:

  • __HAL_RCC_ETH_CLK_SLEEP_ENABLE() → Clocks stay ON during CPU sleep (WFI)
  • __HAL_RCC_ETH_CLK_SLEEP_DISABLE() → Clocks turn OFF during CPU sleep

From STM32H5 HAL documentation:

"Enable or disable the AHB1 peripheral clock during Low Power (Sleep) mode.
By default, all peripheral clocks are enabled during SLEEP mode."

Why clocks must stay enabled: The ETH peripheral needs clocks during CPU sleep to receive packets and generate interrupts, which is necessary for DHCP and other network traffic.

Fix Applied

Restored explicit CLK_SLEEP_ENABLE calls in eth_mac_init() for all STM32 platforms (H5, H7, F4) with clarifying comment:

// Release ETH peripheral from reset and enable clocks during CPU sleep.
// Note: CLK_SLEEP_ENABLE means clocks stay ON during sleep (not OFF).
// Clocks must continue during sleep to allow the ETH peripheral to receive
// packets and generate interrupts when the CPU enters sleep mode (WFI),
// which is necessary for DHCP and other network traffic.

Testing on NUCLEO_H563ZI

Test 1: Interface activation without cable

  • active(True) completed in 28ms
  • Status: 0 (link down, as expected)
  • No timeout, immediate return

Test 2: Hot-plug cable detection (2-hour monitoring)

  • ✅ Cable plugged at 536s
  • Link detected immediately: Status 0 → 2
  • DHCP acquired in 4 seconds: IP 192.168.0.10
  • Background polling and DHCP working correctly

Confirmed: ETH peripheral receiving packets and processing DHCP during CPU sleep as expected. No regressions from explicit CLK_SLEEP configuration.

Commit: 095f4db

Move mod_network_init() to run before boot.py execution, allowing
network interfaces like network.LAN() to be instantiated in boot.py.

The previous order caused failures when users tried to create network
interfaces in boot.py because the network subsystem wasn't initialized
until after boot.py completed. This change is safe because LWIP is
already initialized earlier in main() and mod_network_init() only
initializes the NIC list.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
@andrewleech andrewleech force-pushed the stm32_eth branch 4 times, most recently from 907d820 to ae87211 Compare October 30, 2025 09:23
This restructures LWIP initialization and adds background PHY link
status polling to support static IP configuration before active(True),
eliminate timeouts without cable, and enable hot-plug detection.

Background polling rationale: STM32 Nucleo boards do not wire PHY
interrupt lines to the MCU, making polling the only method to detect
cable unplug/replug events during runtime.

Changes:
- Split netif init into early and late phases for IP config before active()
- Remove blocking PHY autonegotiation loops from initialization
- Add background link status polling (needed due to lack of PHY interrupt)
- Move PHY init to eth_start() for cleaner lifecycle
- Optimize eth_link_status() with on-demand polling
- Fix active() to return interface state vs link state

Benefits:
- Static IP can be configured before active(True)
- active(True) succeeds immediately without cable (~28ms vs 10s timeout)
- Hot-plug detection working (unplug/replug cable)
- 56x faster interface activation (1586ms → 28ms)

Tested on NUCLEO_H563ZI and NUCLEO_F429ZI.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add dynamic MAC speed/duplex reconfiguration via autonegotiation
polling and fix DHCP restart after cable hot-plug.

MAC Speed/Duplex:
- Enable non-blocking PHY autonegotiation at initialization
- Poll for autoneg completion in background
- Update MAC configuration to match negotiated PHY speed/duplex
- Add 5-second autoneg timeout with fallback to 10Mbps Half Duplex
- Protect MAC reconfiguration with IRQ disable/enable

DHCP Hot-Plug:
- Fix DHCP not restarting after cable replug (was stuck at 0.0.0.0)
- Root cause: Called dhcp_renew() on stopped client (ineffective)
- Solution: Use dhcp_stop() + dhcp_start() to restart DHCP
- Preserve static IP during hot-plug (check dhcp->state != DHCP_STATE_OFF)

Tested on NUCLEO_H563ZI and NUCLEO_F429ZI with cable hot-plug scenarios.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Replace HAL_NVIC_DisableIRQ with software flag to prevent 30ms IRQ
blackout during MAC speed/duplex reconfiguration.

Issue: During MAC reconfiguration, code disabled all ETH interrupts at
NVIC level for 30ms (3x 10ms delays), blocking systick, USB, and other
peripheral interrupts.

Solution:
- Add mac_reconfig_in_progress flag to eth_t struct
- IRQ handler checks flag and returns early if set
- Replace blind 10ms delays with DMA descriptor polling (max 100ms timeout)
- IRQ handler can run but exits immediately via flag check

Impact: IRQ blackout reduced from 30ms to 0ms (handler responsive).

Tested on NUCLEO_H563ZI.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Per IEEE 802.3 spec, PHY_BSR link status bit latches low on link loss
and requires two consecutive reads to get current state.

Issue: After cable unplug→replug, first PHY_BSR read returns 0 (latched)
even though physical link is restored, causing missed link-up events and
unreliable hot-plug detection.

Solution: Double-read PHY_BSR in all link status checks:
- First read: Clears the latched-low state
- Second read: Returns true current link status

Locations:
- eth_phy_link_status_poll(): Background polling (line 779)
- eth_link_status(): Direct PHY read fallback (line 945)

Impact: Reliable hot-plug detection after cable unplug/replug cycles.

Tested on NUCLEO_H563ZI and NUCLEO_F429ZI.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add inline comment clarifying that active() returns interface enabled
state, not link/cable connection status.

Context: Commit 90297bc changed active() semantics to support static
IP configuration before active(True) and eliminate cable-wait timeouts.

Previous behavior:
- active() returned True only if cable physically connected

Current behavior:
- active() returns True if interface enabled (regardless of cable)
- status() should be used to check physical link/cable state

This is a breaking API change for code that relied on active() to check
cable connection. Users should migrate to status() for link detection.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add link status verification before MAC reconfiguration to prevent
reconfiguring on dead link if cable unplugged between checks.

Issue: eth_phy_link_status_poll() reads PHY_BSR at start, then enters
MAC reconfiguration block. If cable unplugged between these two points,
code attempts MAC speed/duplex reconfiguration on dead link.

Scenario:
1. Line 779: PHY_BSR read shows link up
2. Lines 784-818: Link state change handling
3. Cable unplugged here
4. Line 820: MAC reconfiguration begins (link now down)

Solution: Re-check last_link_status before MAC reconfiguration. Abort
if link went down since initial check.

Impact: Prevents wasted CPU cycles and transient errors from configuring
MAC on non-existent link.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Replace dhcp_renew() with dhcp_stop() + dhcp_start() after MAC speed/
duplex reconfiguration to handle all DHCP states reliably.

Issue: After MAC reconfiguration, code called dhcp_renew() to restart
DHCP. However, dhcp_renew() only works when DHCP is in BOUND state.
If MAC reconfiguration happens during DHCP negotiation (SELECTING,
REQUESTING, etc.), dhcp_renew() fails silently.

Solution: Use dhcp_stop() + dhcp_start() pattern (same as cable replug
handling in lines 792-795). This works correctly regardless of current
DHCP state.

Impact: DHCP reliably restarts after MAC reconfiguration in all states,
not just BOUND.

Matches existing hot-plug behavior for consistency.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Print warning message when autonegotiation times out and code falls
back to 10Mbps Half-Duplex mode.

Issue: After 5-second autonegotiation timeout (PHY_AUTONEG_TIMEOUT_MS),
code silently falls back to 10Mbps Half-Duplex - 10x slower than typical
100Mbps Full-Duplex. Users experience slow network with no indication of
root cause.

Solution: Print warning message to console when fallback occurs:
"ETH: Autonegotiation timeout, using 10Mbps Half-Duplex"

Impact: Users can diagnose slow network performance and investigate
autonegotiation failures (bad cable, switch issues, etc.).

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Static IP configured before active(True) was being overwritten when MAC
speed/duplex reconfiguration occurred (~2s after link up). The MAC
reconfiguration code restarted DHCP whenever a DHCP struct existed,
without checking if a static IP was configured.

Created eth_dhcp_restart_if_needed() helper function that checks if IP
is 0.0.0.0 before restarting DHCP. Consolidated 3 locations with
duplicated DHCP restart logic to use this helper, reducing code size
by 30 lines while fixing the bug.

This ensures static IP is preserved through MAC reconfiguration, cable
unplug/replug, and interface activation.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
…leep.

The CLK_SLEEP_ENABLE calls were removed during the restructure (commit
90297bc), leaving the driver dependent on the HAL default state.
While the default enables clocks during sleep, explicit configuration
is more robust and documents the requirement.

ETH clocks must remain enabled during CPU sleep (WFI) to allow the
peripheral to receive packets and generate interrupts, which is
necessary for DHCP and other network traffic.

Tested on NUCLEO_F429ZI: DHCP acquisition successful.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants