Skip to content

core: extend userns gid_map with SupplementaryGroups= IDs#42106

Open
rocker-zhang wants to merge 1 commit into
systemd:mainfrom
rocker-zhang:fix/41994-private-users-supplementary-groups
Open

core: extend userns gid_map with SupplementaryGroups= IDs#42106
rocker-zhang wants to merge 1 commit into
systemd:mainfrom
rocker-zhang:fix/41994-private-users-supplementary-groups

Conversation

@rocker-zhang
Copy link
Copy Markdown

Summary

Fixes #41994. When a unit is run with PrivateUsers=yes and SupplementaryGroups=, the supplementary groups currently appear as the overflow GID (nobody/65534) inside the user namespace instead of as themselves.

Reporter's repro:

$ systemd-run --pipe --wait --quiet -p User=testuser -p DynamicUser=true -p SupplementaryGroups=testgroup id
uid=65135(testuser) gid=65135(testuser) groups=65135(testuser),967(testgroup)

$ systemd-run --pipe --wait --quiet -p PrivateUsers=yes -p User=testuser -p DynamicUser=true -p SupplementaryGroups=testgroup id
uid=65135(testuser) gid=65135(testuser) groups=65135(testuser),65534(nogroup)   ← bug

Root cause

The PRIVATE_USERS_SELF branch in setup_private_users() writes a gid_map that contains only saved_gid → saved_gid and *gid → *gid. The supplementary GIDs were already applied via setgroups() in the host namespace before the userns was created — but since the kernel can't map them through the namespace's gid_map, it substitutes the overflow GID for display in /proc/self/status, getgroups(), etc.

Fix

Extend the gid_map under PRIVATE_USERS_SELF to identity-map each SupplementaryGroups=/PAM-derived GID. Multi-line gid_map writes already require CAP_SETGID in the parent userns (the existing two-line case gates on this); we reuse that gate.

To avoid a startup regression on the older end of systemd's supported kernel range, the number of appended supplementary GIDs is capped (USERNS_SUPPLEMENTARY_GIDS_MAX = 3):

  • Pre-v5.10 kernels limit a single gid_map write to 5 extents (raised to 340 in 6397fac4915a).
  • systemd's documented baseline is v5.4 (NEWS, current).
  • Capping at 3 keeps the total at ≤5 lines on those kernels.
  • Excess GIDs continue to appear as the overflow GID, which is exactly the pre-fix behavior — so capping never regresses startup. A log_debug records the truncation.

Test

Added to test/units/TEST-07-PID1.private-users.sh:

  • Asserts id -G inside a PrivateUsers=yes (and PrivateUsersEx=self) DynamicUser=yes unit lists SupplementaryGroups=2.
  • Asserts /proc/self/gid_map contains the expected identity line for GID 2.

Verified locally: meson test -C build --print-errorlogs test-user-util passes; full meson compile succeeds. The new TEST-07 case runs in the integration VM.

Notes / caveats

  • The fix specifically targets the privileged service-manager path. The unprivileged early setup_private_users call (line ~6116) also passes the supplementary list through but its can_map_more gate is false without CAP_SETGID, so behavior there is unchanged (and enforce_groups later in the deny-setgroups userns is the pre-existing unprivileged-path limitation, not addressed here).
  • PRIVATE_USERS_IDENTITY and PRIVATE_USERS_FULL already cover all 32-bit GIDs, so they need no change.
  • The dedup against saved_gid/*gid prevents writing duplicate gid_map lines, which the kernel would reject.

Closes: #41994

@github-actions github-actions Bot added tests please-review PR is ready for (re-)review by a maintainer labels May 15, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Claude review of PR #42106 (12b4c39)

Suggestions

  • int-to-size_t implicit conversion (call site 1)src/core/exec-invoke.c:6120ngids_to_enforce is int but setup_private_users expects size_t; may warn under -Wsign-conversion
  • int-to-size_t implicit conversion (call site 2)src/core/exec-invoke.c:6200 — same issue at the second setup_private_users call

Nits

  • Fixed supplementary GID cap vs dynamic budgetsrc/core/exec-invoke.c:2575 — when baseline gid_map is only 1 line (gid invalid or == saved_gid), up to 4 supplementary GIDs could fit within the pre-v5.10 5-extent limit; current fixed cap of 3 is conservative but correct
  • Test assumes GID 2 existstest/units/TEST-07-PID1.private-users.sh:24 — numeric GIDs work without /etc/group entries, but a guard would make the test more robust across environments

Workflow run

Comment thread src/core/exec-invoke.c
&outside_uid,
&outside_gid,
gids_to_enforce,
ngids_to_enforce,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: ngids_to_enforce is declared as int (line 6039) but setup_private_users expects size_t n_supplementary_gids. While this is safe in practice (since the negative/error case is handled earlier), it may produce a -Wsign-conversion warning on strict builds. Consider either casting explicitly — (size_t) ngids_to_enforce — or declaring ngids_to_enforce as size_t (storing the merge_gid_lists result in a temporary int for the error check).

Comment thread src/core/exec-invoke.c
&gid,
&outside_uid,
&outside_gid,
gids_to_enforce,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Same signed-to-unsigned implicit conversion: int ngids_to_enforce is passed where size_t n_supplementary_gids is expected. Same fix as the other call site would apply here.

When PrivateUsers=yes (PRIVATE_USERS_SELF), the user-namespace gid_map
only maps the service manager's GID and the unit's GID. Any GIDs from
SupplementaryGroups= (or added by PAM) fall outside the map, so the
kernel substitutes them with the namespace's overflow GID and the unit
sees its supplementary groups collapsed to "nobody"/65534 — even though
the groups were correctly applied via setgroups() in the host namespace
before the userns was created.

Extend the gid_map under PRIVATE_USERS_SELF to also identity-map each
SupplementaryGroups=/PAM-derived GID so they remain visible inside the
namespace. Multi-line gid_map writes require CAP_SETGID in the parent
userns, which the existing two-line case already gates on; we reuse the
same gate.

The kernel limits a single gid_map write to 5 extents on pre-v5.10
kernels (raised to 340 in commit 6397fac4915a "userns: bump idmap
limits to 340"). systemd's baseline is v5.4, so cap the supplementary
GIDs appended here at 3 (leaving 2 for the baseline) to stay within
the older limit. Excess GIDs continue to appear as the overflow GID,
which is exactly the pre-fix behavior — so the cap never regresses
startup.

Add a TEST-07-PID1.private-users.sh regression check using GID 2
("bin"), which is stable across the mkosi test distros.

Fixes: systemd#41994

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
@rocker-zhang rocker-zhang force-pushed the fix/41994-private-users-supplementary-groups branch from 2747942 to 12b4c39 Compare May 16, 2026 04:53
@rocker-zhang
Copy link
Copy Markdown
Author

Pushed a fixup: changed the new setup_private_users parameter from size_t n_supplementary_gids to int n_supplementary_gids (force-push, single amended commit, no other changes).

Rationale per the bot's two suggestions on src/core/exec-invoke.c:6120 / :6200:

  • merge_gid_lists() returns int (a count or a negative errno), so ngids_to_enforce is naturally int.
  • enforce_groups(), which consumes the same value, already takes int ngids.
  • Matching int here removes the implicit signed→unsigned conversion at both call sites and stays consistent with the existing convention.

FOREACH_ARRAY inside the function is unaffected — it checks m > 0 before stepping.

Comment thread src/core/exec-invoke.c
* append here so that, combined with the up-to-two baseline lines, the total stays
* within the pre-v5.10 limit. Excess groups remain visible as the overflow GID
* inside the namespace — the same as pre-fix behavior. */
if (can_map_more) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: The fixed cap of 3 supplementary GIDs is safe and well-reasoned for the common case (baseline + saved_gid line = 2 lines already used). However, when gid_is_valid(*gid) is false or *gid == saved_gid, the baseline gid_map has only 1 line, meaning up to 4 supplementary GIDs could fit within the pre-v5.10 limit of 5 extents. You could compute the remaining budget dynamically (5 - baseline_lines) to map one more GID in that corner case. Minor optimization — the fixed cap is correct and never regresses.

# user namespace rather than being collapsed to the kernel's overflow GID
# (nogroup). Picks a stable, low-numbered system group ("bin", GID 2) so
# the assertion does not depend on /etc/group contents.
systemd-run -p PrivateUsers=yes -p DynamicUser=yes -p SupplementaryGroups=2 --wait \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: The test assumes GID 2 exists across all mkosi test distros. Since SupplementaryGroups= with numeric GIDs doesn't require a /etc/group entry for setgroups() to succeed, this likely works everywhere. But if a future test environment lacks GID 2 in its group database, id -G output validation could behave unexpectedly. Consider adding a brief guard (getent group 2 >/dev/null 2>&1 || exit 0) or using GID 0 which is universally present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-review please-review PR is ready for (re-)review by a maintainer tests

Development

Successfully merging this pull request may close these issues.

PrivateUsers=yes and SupplementaryGroups= interacts weirdly

2 participants