Skip to content

[Fix] Pin numpy!=2.3.5 to dodge OpenBLAS atfork SIGSEGV at Kit fork()#5642

Draft
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/pin-numpy-not-2-3-5
Draft

[Fix] Pin numpy!=2.3.5 to dodge OpenBLAS atfork SIGSEGV at Kit fork()#5642
hujc7 wants to merge 1 commit into
isaac-sim:developfrom
hujc7:jichuanh/pin-numpy-not-2-3-5

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented May 15, 2026

TL;DR

NumPy 2.3.5 ships a vendored OpenBLAS (libscipy_openblas64_-fdde5778.so) whose pthread_atfork handler crashes inside Kit's libomni.platforminfo fork() during SimulationApp startup. IsaacLab's setup.py declares numpy>=2, and with the pin-pink → pinocchio → cmeel-boost transitive cap of <2.4, pip resolves to 2.3.5 — exactly the broken release. This PR adds !=2.3.5 to that constraint across the four packages that declare a numpy dep. Pip then resolves to 2.3.4, which ships a different OpenBLAS bundle. Targets the dependency layer of the OpenBLAS-class CI SIGSEGV in IsaacLab's own setup.py.

Why this fix lives here, not in Isaac Sim's base image

Verified by docker run on both candidate Isaac Sim images (the current pin sha256:0dd49a11… and the rolling sha256:06197a67…) — both prebundle numpy 2.3.1 with the safe OpenBLAS hash -56d6093b. The broken numpy 2.3.5 enters the running container at IsaacLab's docker/Dockerfile.base:117-118, when isaaclab.sh --install runs and pip resolves IsaacLab's numpy>=2 constraint to 2.3.5 — installing into _isaac_sim/kit/python/lib/python3.12/site-packages/ and shadowing the base image's prebundle.

So the dependency-resolution layer is in IsaacLab's setup.py.

NumPy 2.3.x → bundled OpenBLAS hash bisection (verified)

numpy bundled OpenBLAS Status
2.3.0 libscipy_openblas64_-56d6093b.so ✅ same hash as Isaac Sim base prebundle; no-crash repro
2.3.1 libscipy_openblas64_-56d6093b.so ✅ same hash
2.3.2 libscipy_openblas64_-8fb3d286.so ❓ different hash; was resolved version for IsaacLab CI Jul–Nov 2025 without these crashes (note: pre-driver-595.58.03)
2.3.3 libscipy_openblas64_-8fb3d286.so ❓ same as 2.3.2
2.3.4 libscipy_openblas64_-8fb3d286.so ❓ same as 2.3.2
2.3.5 libscipy_openblas64_-fdde5778.so broken — exact hash in the crash backtrace

With this PR's numpy>=2,!=2.3.5, pip resolves to 2.3.4 (highest non-broken). The -8fb3d286 bundle was IsaacLab CI's resolved version for ~4 months before 2.3.5 was released, but it wasn't tested against the new CUDA 13.2 driver / runner environment that started showing the SIGSEGV pattern on 2026-05-12. If a reviewer prefers maximally-conservative, the constraint can be tightened to numpy>=2,<2.3.2 which forces the bit-identical-to-base-image 2.3.1.

Why not bump to numpy ≥ 2.4.1 (which has the upstream OpenBLAS fix)?

pin-pink (Pink IK library) depends on pin (Pinocchio) → libpinocchio 3.9.0cmeel-boost ~=1.89.0. The latest cmeel-boost 1.89.0 declares:

Requires-Dist: numpy >=2.3, <2.4 ; python_version >= '3.11.0'

Forcing numpy>=2.4 + pin>=2.6.3 produces ResolutionImpossible. Until cmeel-boost upstream lifts its cap (or IsaacLab moves Pinocchio to a non-PyPI install path), the highest numpy IsaacLab can resolve to is 2.3.x. Excluding 2.3.5 is therefore the short-term fix at the dependency-resolution layer.

Related in-flight work

This PR operates at the dependency-resolution layer. The four approaches address different layers of the same problem; reviewers can choose which set to land.

Files touched

source/isaaclab/setup.py:21              "numpy>=2"          → "numpy>=2,!=2.3.5"
source/isaaclab_tasks/setup.py:21        "numpy>=2"          → "numpy>=2,!=2.3.5"
source/isaaclab_rl/setup.py:22           "numpy"             → "numpy>=2,!=2.3.5"
source/isaaclab_visualizers/setup.py:13  "numpy"             → "numpy>=2,!=2.3.5"
+ one changelog fragment per package

All four sites must agree per IsaacLab convention (pip first-declaration-wins on transitive resolution).

References

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist

NumPy 2.3.5 ships a vendored OpenBLAS
(libscipy_openblas64_-fdde5778.so) whose pthread_atfork handler
calls blas_thread_shutdown_ -> pthread_join on workers that don't
exist in the child of fork().  Kit's libomni.platforminfo calls
fork() during SimulationApp startup, which triggers the handler and
SIGSEGVs the test process.

IsaacLab's CI Docker layer (docker/Dockerfile.base:117) runs
'isaaclab.sh --install' which pip-installs the source packages and
resolves numpy>=2 against the IsaacLab dep tree.  pin-pink ->
pinocchio (pin) -> cmeel-boost transitively caps numpy <2.4, so pip
picks the highest 2.3.x: 2.3.5.  That landed the broken OpenBLAS
into site-packages, shadowing Isaac Sim's prebundled numpy 2.3.1.

Tightening to 'numpy>=2,!=2.3.5' across the four packages that
declare a numpy dep (isaaclab, isaaclab_tasks, isaaclab_rl,
isaaclab_visualizers) keeps the loose lower bound but excludes the
single known-broken release.  Pip resolves to 2.3.4, which ships a
different bundled OpenBLAS hash (libscipy_openblas64_-8fb3d286.so)
that was the resolved version for IsaacLab CI prior to numpy 2.3.5
without these crashes.

Refs:
- numpy/numpy#30092
- scipy/scipy#23686
- OpenMathLib/OpenBLAS#5520
- JIRA OMPE-92261

Verified locally:
- numpy 2.3.0 / 2.3.1 ship libscipy_openblas64_-56d6093b.so (safe)
- numpy 2.3.2 / 2.3.3 / 2.3.4 ship libscipy_openblas64_-8fb3d286.so
- numpy 2.3.5 ships libscipy_openblas64_-fdde5778.so (broken)
- Isaac Sim base image (both 5/11 and 5/15 candidates) prebundles
  numpy 2.3.1 with -56d6093b at omni.kit.pip_archive/pip_prebundle

A longer-term fix is bumping cmeel-boost upstream so numpy 2.4.1+
becomes resolvable; coordination with Isaac Sim base image is also
in flight separately.
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label May 15, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR correctly addresses a critical CI stability issue by excluding NumPy 2.3.5, which ships a broken OpenBLAS bundle that causes SIGSEGV during SimulationApp startup.

✅ Strengths

  1. Well-researched root cause analysis — The PR description thoroughly documents the OpenBLAS bisection across numpy 2.3.x versions and correctly identifies the problematic -fdde5778 bundle in 2.3.5.

  2. Consistent application — All 4 packages declaring a numpy dependency are updated:

    • isaaclab/setup.py: "numpy>=2""numpy>=2,!=2.3.5"
    • isaaclab_tasks/setup.py: "numpy>=2""numpy>=2,!=2.3.5"
    • isaaclab_rl/setup.py: "numpy""numpy>=2,!=2.3.5"
    • isaaclab_visualizers/setup.py: "numpy""numpy>=2,!=2.3.5"
  3. Proper documentation — Changelog fragments are added for all 4 packages with clear explanations and issue references.

  4. Inline comments — The comments in setup.py files reference the upstream issue (numpy/numpy#30092) and JIRA ticket, which helps future maintainers understand why the exclusion exists.

  5. Minimal, targeted fix — Rather than over-constraining (e.g., numpy<2.3.5), the PR surgically excludes only the known-broken version.

📝 Minor Observations

  1. Changelog fragment duplicationisaaclab_rl and isaaclab_visualizers share identical changelog text that says "(was unconstrained)", which is accurate but the text could be slightly differentiated. This is a very minor nit and not blocking.

  2. Future cleanup — Once cmeel-boost lifts its <2.4 cap and numpy 2.4.1+ becomes resolvable, the !=2.3.5 exclusion will become redundant (since pip would skip 2.3.5 anyway). Consider adding a tracking comment or TODO. However, this is also very minor since the exclusion causes no harm if left in place.

⚙️ Technical Validation

  • The numpy upstream issue numpy#30092 is closed/fixed, confirming this is a known upstream regression
  • The PEP 440 version specifier >=2,!=2.3.5 is syntactically correct
  • The cmeel-boost transitive cap <2.4 means pip will resolve to 2.3.4 (highest non-excluded 2.3.x), which ships the -8fb3d286 OpenBLAS bundle that was working in CI prior to 2.3.5

Verdict

This is a clean, well-documented fix for a CI-breaking regression. The approach is sound — excluding a single broken release is preferable to pinning to a specific "known good" version, as it allows for maximum flexibility while avoiding the crash.

LGTM once CI passes. 👍

@pbarejko
Copy link
Copy Markdown
Collaborator

Can we push this fix upstream?

@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented May 15, 2026

Can we push this fix upstream?

My understanding is evolving going in this direction. Originally thought it was the docker image, but the pin/pin-pink dependency is declared by lab, and the cmeel-boost required by pin-pink/pin has numpy < 2.4 which always got resolved to this bad version.

hujc7 added a commit to hujc7/IsaacLab that referenced this pull request May 16, 2026
Importing numpy before pytest registers the broken OpenBLAS atfork
handler in this shell, then Kit's libomni.platforminfo fork() trips
it and SIGSEGVs - exactly the bug isaac-sim#5642 targets.  Surface the
diagnostic AFTER pytest instead, with pytest's exit code preserved
so the job still passes/fails based on real test outcomes.
hujc7 added a commit to hujc7/IsaacLab that referenced this pull request May 16, 2026
The setup.py constraint "numpy>=2,!=2.3.5" landed in isaac-sim#5642 is silently
overridden during isaaclab.sh --install because pip resolves each
submodule install independently:

  - isaaclab            -> numpy stays at 2.3.1 (already satisfied)
  - isaaclab_mimic[h5py]-> numpy 1.26.4 (h5py wheel ABI)
  - isaaclab_rl         -> numpy 2.4.5
  - isaaclab_teleop[dex-retargeting] -> numpy 2.3.5 (cmeel-boost <2.4 cap)
  - isaaclab_visualizers-> numpy 2.3.4
  - isaaclab_mimic[robomimic] -> numpy 1.26.4
  - _ensure_pink_ik_dependencies_installed force-reinstall -> numpy 2.3.5

The final pin-pink force-reinstall sees only pin-pink's numpy>=1.19 plus
cmeel-boost's numpy<2.4 cap and lands on numpy 2.3.5 - the exact release
whose vendored OpenBLAS (libscipy_openblas64_-fdde5778.so) registers a
buggy pthread_atfork handler that SIGSEGVs Kit's libomni.platforminfo
fork() during SimulationApp startup.

After the pin-pink force-reinstall, append one more pip invocation that
explicitly upgrades numpy to >= 2.4.1. pip prints a resolver warning
about cmeel-boost's cap but installs numpy 2.4.5 anyway; numpy's stable
C ABI (numpy >= 2.0) keeps cmeel's compiled extensions (libpinocchio,
libcoal, ...) working at runtime. The atfork fix landed upstream in
numpy 2.4.1, so the entire 2.3.x risk class is bypassed.

Validated locally on env_isaaclab_test (numpy 2.4.5 + pinocchio 3.9.0 +
pin 3.9.0 + daqp + qpsolvers):
- import numpy, pinocchio, pink, daqp: OK
- Bundled OpenBLAS hash: -32a4b2a6 (not the broken -fdde5778)
- IsaacLab Pink IK unit tests: 54/54 pass
  (test_pink_ik_components.py 21/21, test_local_frame_task.py 24/24,
   test_null_space_posture_task.py 9/9)

Related: numpy/numpy#30092, OpenMathLib/OpenBLAS#5520
hujc7 added a commit to hujc7/IsaacLab that referenced this pull request May 16, 2026
The setup.py constraint "numpy>=2,!=2.3.5" landed in isaac-sim#5642 is silently
overridden during isaaclab.sh --install: each pip install -e <submodule>
runs an independent resolve, and the final pin-pink force-reinstall in
_ensure_pink_ik_dependencies_installed lands on numpy 2.3.5 because pip
sees only pin-pink's own deps (numpy>=1.19) plus cmeel-boost's numpy<2.4
cap.  numpy 2.3.5 ships a vendored OpenBLAS
(libscipy_openblas64_-fdde5778.so) whose pthread_atfork handler crashes
Kit's libomni.platforminfo fork() during SimulationApp startup.

Two changes, both restating an explicit "pip install --upgrade
numpy>=2.4.1" as the *last* pip invocation in each install path:

1. _ensure_numpy_above_openblas_atfork_bug() in install.py — runs
   unconditionally at the end of --install (not gated by the pink-ik
   probe outcome), so upgrades on an already-functioning env also
   pull numpy forward.
2. Dockerfile.curobo — apply the same upgrade after its post-install
   steps (nvidia-curobo + isaaclab_teleop editable install), which
   otherwise drag numpy back to 2.3.5 via dex-retargeting -> pin ->
   cmeel-boost.

pip prints a resolver warning about cmeel-boost's cap then installs
numpy 2.4.5 anyway.  numpy 2.4.1+ ships the upstream OpenBLAS atfork
fix, so the entire 2.3.x risk class is bypassed.  numpy's stable C
ABI keeps cmeel's compiled extensions (libpinocchio, libcoal, ...)
working at runtime.

Validated:
- env_isaaclab_test smoke test (numpy 2.4.5 + cmeel pinocchio + pink +
  daqp + qpsolvers all import; toy IK solve OK).
- IsaacLab Pink IK unit tests: 54/54 pass against numpy 2.4.5
  (test_pink_ik_components 21/21, test_local_frame_task 24/24,
   test_null_space_posture_task 9/9).
- PR isaac-sim#5655 (validation): every base-image test job reports numpy 2.4.5
  + openblas -32a4b2a6 (clean, not the broken -fdde5778).  Worst-case
  import order (numpy imported before pytest spawns Kit) also passes —
  confirming the upstream atfork fix is real, not just dodge-by-order.

Related: numpy/numpy#30092, OpenMathLib/OpenBLAS#5520
hujc7 added a commit to hujc7/IsaacLab that referenced this pull request May 16, 2026
The setup.py constraint "numpy>=2,!=2.3.5" landed in isaac-sim#5642 is silently
overridden during isaaclab.sh --install: each pip install -e <submodule>
runs an independent resolve, and the final pin-pink force-reinstall in
_ensure_pink_ik_dependencies_installed lands on numpy 2.3.5 because pip
sees only pin-pink's own deps (numpy>=1.19) plus cmeel-boost's numpy<2.4
cap.  numpy 2.3.5 ships a vendored OpenBLAS
(libscipy_openblas64_-fdde5778.so) whose pthread_atfork handler crashes
Kit's libomni.platforminfo fork() during SimulationApp startup.

Two changes, both restating an explicit "pip install --upgrade
numpy>=2.4.1" as the *last* pip invocation in each install path:

1. _ensure_numpy_above_openblas_atfork_bug() in install.py — runs
   unconditionally at the end of --install (not gated by the pink-ik
   probe outcome), so upgrades on an already-functioning env also
   pull numpy forward.
2. Dockerfile.curobo — apply the same upgrade after its post-install
   steps (nvidia-curobo + isaaclab_teleop editable install), which
   otherwise drag numpy back to 2.3.5 via dex-retargeting -> pin ->
   cmeel-boost.

pip prints a resolver warning about cmeel-boost's cap then installs
numpy 2.4.5 anyway.  numpy 2.4.1+ ships the upstream OpenBLAS atfork
fix, so the entire 2.3.x risk class is bypassed.  numpy's stable C
ABI keeps cmeel's compiled extensions (libpinocchio, libcoal, ...)
working at runtime.

Validated:
- env_isaaclab_test smoke test (numpy 2.4.5 + cmeel pinocchio + pink +
  daqp + qpsolvers all import; toy IK solve OK).
- IsaacLab Pink IK unit tests: 54/54 pass against numpy 2.4.5
  (test_pink_ik_components 21/21, test_local_frame_task 24/24,
   test_null_space_posture_task 9/9).
- PR isaac-sim#5655 (validation): every base-image test job reports numpy 2.4.5
  + openblas -32a4b2a6 (clean, not the broken -fdde5778).  Worst-case
  import order (numpy imported before pytest spawns Kit) also passes —
  confirming the upstream atfork fix is real, not just dodge-by-order.

Related: numpy/numpy#30092, OpenMathLib/OpenBLAS#5520
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants