Skip to content

tools/mpy_ld.py: Optimise entry trampoline for x86/x64.#19309

Open
agatti wants to merge 1 commit into
micropython:masterfrom
agatti:natmod-x86-x86-optimised-trampoline
Open

tools/mpy_ld.py: Optimise entry trampoline for x86/x64.#19309
agatti wants to merge 1 commit into
micropython:masterfrom
agatti:natmod-x86-x86-optimised-trampoline

Conversation

@agatti

@agatti agatti commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR allows natmod entry trampolines to possibly be smaller depending on the offset between the mpy_init from the beginning of the text segment.

x64 and x86 shared the same, fixed-size entry point opcode that covered the whole 32-bits offset range. With changes introduced in 0fd0843 the entry point may change its length once all segments are laid out in the file, so optimised entry point code sequences can be emitted depending on the jump distance.

These changes generate optimised entry point jump opcodes for 8-, and 32- bit offsets. Depending on the size of the natmod and where the compiler decides to put the mpy_init symbol this can help shortening the output file by three bytes. Clang seems to prefer placing mpy_init at the beginning of the text segment on x86, whilst GCC sort of does it on x64. In both cases, a 2 bytes opcode is used instead of a 6 bytes one.

This is derived from the work being carried out in #19308, as this change won't really fit the parent PR's scope.

Testing

This should be tested by CI, as x86 and x64 natmod tests should be covered. In addition, the features0 natmod is small enough to trigger the generation of a 8-bit jump. Importing that module on either a x86 or x64 interpreter and running its factorial function works as expected.

Generative AI

I did not use generative AI tools when creating this PR.

@agatti agatti force-pushed the natmod-x86-x86-optimised-trampoline branch from c5e1a3c to 723ffb1 Compare June 4, 2026 16:28
@agatti agatti added the tools Relates to tools/ directory in source, or other tooling label Jun 4, 2026
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.47%. Comparing base (75555f4) to head (18839c3).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19309   +/-   ##
=======================================
  Coverage   98.47%   98.47%           
=======================================
  Files         176      176           
  Lines       22845    22845           
=======================================
  Hits        22497    22497           
  Misses        348      348           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Code size report:

Reference:  samd/mphalport: Run events at least once in mp_hal_delay_ms. [af38ee1]
Comparison: tools/mpy_ld.py: Optimise entry trampoline for x86/x64. [merge of 18839c3]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
      esp32:    +0 +0.000% ESP32_GENERIC
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

Comment thread tools/mpy_ld.py Outdated

def asm_jump_x86(entry):
return struct.pack("<BI", 0xE9, entry)
if entry <= 0xFF:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0xeb expects a signed argument, so this probably needs to be entry <= 0x7F.

Then, probably best to use the existing fit_signed() helper function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it, the opcode guide I read wasn't exactly clear (as x86 opcodes encoding ever was :)). For the sake of completeness I've added an extra check so on x64 you can't have jumps crossing the 2GB barrier, as that's not supported anyway.

@agatti agatti force-pushed the natmod-x86-x86-optimised-trampoline branch from 723ffb1 to 14025bd Compare June 5, 2026 05:22
This commit allows natmod entry trampolines to possibly be smaller
depending on the offset between the `mpy_init` from the beginning of the
text segment.

x64 and x86 shared the same, fixed-size entry point opcode that covered
the whole 32-bits offset range.  With changes introduced in
0fd0843 the entry point may change its
length once all segments are laid out in the file, so optimised entry
point code sequences can be emitted depending on the jump distance.

These changes generate optimised entry point jump opcodes for 8-, and
32- bit offsets.  Depending on the size of the natmod and where the
compiler decides to put the `mpy_init` symbol this can help shortening
the output file by three bytes.

Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
@agatti agatti force-pushed the natmod-x86-x86-optimised-trampoline branch from 14025bd to 18839c3 Compare June 5, 2026 05:25
@octoprobe-bot

Copy link
Copy Markdown

Octoprobe PR report

Test Tests
passed
Tests
skipped
Tests
xfailed
Tests
failed
format flash 10 9
run-tests.py 16538 2199 8 1
run-tests.py --via-mpy --emit native 16777 2826 29 1
run-tests.py --via-mpy 17485 2286 9
run-perfbench.py 436 15 2
run-natmodtests.py 595 145 7
run-tests.py --test-dirs=extmod_hardware 60 78 24
run-tests.py --test-dirs=extmod_hardware --emit-native 51 75 24 7
Total 51952 7633 103 9
Failures

Group: run-tests.py --test-dirs=extmod_hardware --emit-native

Test esp32
0c30-
ESP32_C3_DEVKIT
esp32
5d21-
ESP32_DEVKIT
esp32
472b-
ESP32_S3_DEVKIT
esp32
1830-
LOLIN_C3_MINI
esp8266
2d2d-
LOLIN_D1_MINI
mimxrt
1133-
TEENSY40
nrf
3c2a-
ARDUINO_NANO_33
rp2
5334-
RPI_PICO2
rp2
5334-
RPI_PICO2-
RISCV
rp2
552b-
RPI_PICO2_W
rp2
5f2c-
RPI_PICO_W
rp2
6038-
RPI_PICO_W
samd
5f2a-
ADA_ITSYBITSY_M0
stm32
2b35-
NUCLEO_WB55
stm32
3a21-
PYBV11
stm32
3a21-
PYBV11-
DP
stm32
3a21-
PYBV11-
DP_THREAD
stm32
3a21-
PYBV11-
THREAD
extmod_hardware/machine_can_timings.py skip skip skip skip skip pass skip skip skip skip skip skip skip skip FAIL pass pass pass
extmod_hardware/machine_counter.py skip pass pass skip skip pass skip skip skip skip skip skip skip skip FAIL skip skip skip
extmod_hardware/machine_encoder.py skip XFAIL
xfail_master_478.json
XFAIL
xfail_master_478.json
skip skip pass skip skip skip skip skip skip skip skip FAIL skip FAIL skip
extmod_hardware/machine_pwm.py pass pass pass pass skip pass skip XFAIL
xfail_master_478.json
XFAIL
xfail_master_478.json
pass pass pass XFAIL
xfail_master_478.json
pass FAIL pass FAIL pass
extmod_hardware/machine_uart_irq_break.py pass pass pass pass skip skip skip XFAIL
xfail_master_478.json
XFAIL
xfail_master_478.json
pass pass pass skip skip skip FAIL skip

Group: run-tests.py

Test esp32
0c30-
ESP32_C3_DEVKIT
esp32
5d21-
ESP32_DEVKIT
esp32
472b-
ESP32_S3_DEVKIT
esp32
1830-
LOLIN_C3_MINI
esp8266
2d2d-
LOLIN_D1_MINI
mimxrt
1133-
TEENSY40
nrf
3c2a-
ARDUINO_NANO_33
rp2
5334-
RPI_PICO2
rp2
5334-
RPI_PICO2-
RISCV
rp2
552b-
RPI_PICO2_W
rp2
5f2c-
RPI_PICO_W
rp2
6038-
RPI_PICO_W
samd
5f2a-
ADA_ITSYBITSY_M0
stm32
2b35-
NUCLEO_WB55
stm32
3a21-
PYBV11
stm32
3a21-
PYBV11-
DP
stm32
3a21-
PYBV11-
DP_THREAD
stm32
3a21-
PYBV11-
THREAD
extmod/socket_udp_nonblock.py pass pass pass pass pass skip skip skip skip pass pass FAIL skip skip skip skip skip skip

Group: run-tests.py --via-mpy --emit native

Test esp32
0c30-
ESP32_C3_DEVKIT
esp32
5c34-
ESP32_C3_DEVKIT
esp32
5d21-
ESP32_DEVKIT
esp32
472b-
ESP32_S3_DEVKIT
esp32
1830-
LOLIN_C3_MINI
esp8266
2d2d-
LOLIN_D1_MINI
mimxrt
1133-
TEENSY40
nrf
3c2a-
ARDUINO_NANO_33
rp2
5334-
RPI_PICO2
rp2
5334-
RPI_PICO2-
RISCV
rp2
552b-
RPI_PICO2_W
rp2
5f2c-
RPI_PICO_W
rp2
6038-
RPI_PICO_W
samd
5f2a-
ADA_ITSYBITSY_M0
stm32
2b35-
NUCLEO_WB55
stm32
3a21-
PYBV11
stm32
3a21-
PYBV11-
DP
stm32
3a21-
PYBV11-
DP_THREAD
stm32
3a21-
PYBV11-
THREAD
basics/int_big_error.py pass pass pass pass pass FAIL pass pass pass pass pass pass pass pass pass pass pass pass pass

@dpgeorge

dpgeorge commented Jun 9, 2026

Copy link
Copy Markdown
Member

(Off topic of this PR: @hmaerki we need to discuss how to handle the above failure of extmod/socket_udp_nonblock.py: the reason it fails is because the jobs run in parallel and normal Python crashes due to port 8000 being in use by another test at the same time.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tools Relates to tools/ directory in source, or other tooling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants