libc: Make sure we don't depend on glibc > 2.34#42100
Conversation
Claude review of PR #42100 (e8c6460)Must fix
Suggestions
Nits
|
|
not a fan of the dlsym approach for this. can't |
Not for symbols that don't exist in the older glibc. So we need dlsym for those anyway. We could use it for strtol() exp10(), fmod() and such but I prefer sticking to one mechanism for all, hence dlsym(). |
d11010b to
ece7938
Compare
|
BTW, what is the motivation of this?? Do you want to downgrade glibc without rebuilding systemd?? |
I want to be able to upgrade systemd without having to pull in new glibc (or any other library). One of the previous issues with dlopen is that if we build on a build system with newer headers available, it won't run on systems with an older version of the library, and since rpm or other package managers can't detect the newer required versions of the library yet, hence we'd fail to dlopen the library because we try to dlopen some newer symbol that isn't available. That was already fixed by my previous PR. For glibc, when we build on a system with newer glibc and it happens to have a newer symbol like openat2(), wherever we install the new systemd we will be forced to upgrade glibc as well to make sure openat2() is available. This is suboptimal if you ask me, a systemd upgrade shouldn't force a glibc upgrade just to make sure the openat2() symbol is available. Rather we should pick up the openat2() symbol if it is available, but gracefully degrade to the system call if it isn't. And this should be a runtime decision, not a build time one. |
ece7938 to
a53753d
Compare
Renames src/shared/bpf-dlopen.{c,h} to src/shared/bpf-util.{c,h} and
folds the former src/shared/bpf-compat.h (struct forward decl and
compat_bpf_map_create() helper) into the new header.
Aligns dlopen_bpf() with the standard wrapper pattern: drops the
manual dlopen_safe()/dlsym_many_or_warn()/TAKE_PTR(dl) plumbing and
the bespoke 'cached' int in favor of dlopen_many_sym_or_warn() inside
a FOREACH_STRING() soname-fallback loop.
Unifies declaration of the version-specific symbols (bpf_create_map,
bpf_map_create, bpf_object__next_map, bpf_token_create) into a single
DISABLE_WARNING_REDUNDANT_DECLS block in the header, and alphabetically
merges the DLSYM_PROTOTYPE list. DLSYM_OPTIONAL is used to load each
one — call sites already handle NULL (compat_bpf_map_create() and the
sym_bpf_object__next_map guard in userns-restrict.c). bpf_token_create
additionally defaults to a missing_bpf_token_create() stub returning
-ENOSYS, so callers can branch on the errno instead of NULL-checking
the pointer.
Updates test-bpf-token to match: drops the compile-time
LIBBPF_MAJOR_VERSION ≥ 1.5 gate and the direct <bpf/bpf.h> include in
favor of dlopen_bpf() + sym_bpf_token_create(), and treats -ENOSYS as
the test-skip path (covering both 'libbpf too old' and 'kernel lacks
BPF_TOKEN_CREATE support').
| double xexp10i(int n) { | ||
| /* Powers of 10 up to 10^22 are exact in IEEE-754 binary64. */ | ||
| static const double table[] = { | ||
| 1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9, 1e10, 1e11, | ||
| 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19, 1e20, 1e21, 1e22, | ||
| }; | ||
| bool negative = n < 0; | ||
|
|
||
| /* Why not just __builtin_powi(10.0, n)? At runtime it resolves to libgcc's __powidf2, which | ||
| * computes 10^n by repeatedly squaring: 10^16 → 10^32 → 10^64 → … . That's fast, but a double | ||
| * can only store 10^k exactly for k ≤ 22 — beyond that, every squaring has to round to fit in | ||
| * 53 mantissa bits, and the errors compound across squarings. By 10^308 (close to the largest | ||
| * finite double) the answer is off by a few of the smallest possible double-steps. That sounds | ||
| * tiny, but at the edge it's decisive: parsing DBL_MAX back from its JSON representation does | ||
| * (mantissa × 10^308), and if 10^308 is even slightly too big the product overflows to +Inf and | ||
| * the round-trip fails. | ||
| * | ||
| * So this does it the slower-but-safer way — a 23-entry table for 10^0..10^22 (all exact) plus | ||
| * a multiply-by-10 loop for larger exponents. Each result beyond 10^22 picks up at most one | ||
| * rounding instead of a whole chain. */ | ||
|
|
||
| /* Cast before negation so n == INT_MIN doesn't invoke signed-overflow UB. Unsigned negation | ||
| * wraps to the magnitude we want. */ | ||
| unsigned k = negative ? -(unsigned) n : (unsigned) n; | ||
| double r; | ||
|
|
||
| if (k < ELEMENTSOF(table)) | ||
| r = table[k]; | ||
| else { | ||
| /* 10^309 already overflows binary64 to +Inf; anything beyond just stays there. */ | ||
| k = MIN(k, 309u); | ||
| r = table[ELEMENTSOF(table) - 1]; | ||
| for (unsigned i = ELEMENTSOF(table) - 1; i < k; i++) | ||
| r *= 10.0; | ||
| } | ||
|
|
||
| return negative ? 1.0 / r : r; | ||
| } |
There was a problem hiding this comment.
I'm not super happy with open-coding this but sure.
There was a problem hiding this comment.
yeah I'd prefer to avoid open coding this stuff, is it really necessary? Can't we just keep the dependency for these symbols? Same with the other vendored math macros
There was a problem hiding this comment.
All the other options were a bigger pain, doing asm symver to pick and older symbol version didn't work because the older symbol version differs depending on the architecture. Using dlsym() didn't work because with one of my local followups we get rid of the rest of the libm dependencies and then dlsym() stops working and we have to do dlopen first as it's not linked in by default anymore.
I guess we could do dlopen() for libm but we hardly need anything from it, just this function and one other one coming in an upcoming PR, so I'm inclined to opencode the two functions we need so we can drop the dependency entirely.
There was a problem hiding this comment.
I'd prefer to just dlopen it, and let the libc people worry about doing math right tbh
There was a problem hiding this comment.
I guess, will change to dlopen
There was a problem hiding this comment.
Actually after trying it I still like open coding more. By dlopening it every function that does math can suddenly fail whereas previously it can't. And it means we also need to take care of dlopening libm sufficently early when we fork something off that runs in different namespaces and such, it's generally just a pain to deal with.
The open coded version works, has a bunch of tests, CI is green, and it's really not that complex, I'd lean towards keeping it like this.
But we should probably let @poettering be the final arbiter, I sense he's gonna have an opinion on this as well.
This avoids having to subpackage the tokens separately. If they link directly against libcryptsetup, package manager will automatically add a dependency on libcryptsetup to the package containing the tokens. With this change, the tokens can ship in the main systemd package without necessarily pulling in libcryptsetup. It also makes things more consistent. Once we also do the same for pam, any direct linking will be limited to just libc, which for example simplifies writing tests for ensuring we don't link unnecessarily as we don't have to add exceptions for the cryptsetup tokens. This actually drops the dependency on cryptsetup-libs for the fedora/centos/opensuse systemd-udev package so install it explicitly in the initrd now to keep the tests working.
Same reasoning as for cryptsetup tokens. It means we can include the pam plugins in the main systemd package without the package manager introducing a dependency on libpam. It also makes things more consistent and makes writing the upcoming linking test script a lot simpler. At the same time, we get rid of the libpam_misc dependency as the one symbol we were using from it is trivial to reimplement ourselves.
The test only uses 9 symbols (5 from glib, 4 from libdbus) for its interop checks; dlopen them at runtime so the binary no longer carries a hard link-time dependency on either library. Headers are still pulled in through the *_cflags partial dependencies for the type declarations. While we're at it, drop the compat glue for glib 2.36 which is long obsolete at this point.
We can just put a timeout on the parent process completing rather than doing it inside the subprocess.
In hsv_to_rgb, restructure the conversion around the sector index k = (int)(h/60) and fractional offset f = h/60 - k. The auxiliary x value becomes c * (k & 1 ? 1.0 - f : f) and the six branches turn into a switch on k. This drops the two xfmod() calls that were doing the modulo work, in exchange for a single assert(h >= 0 && h < 360) — all in-tree callers satisfy this and never relied on the wrap. In rgb_to_hsv, the two xfmod() calls were no-ops (their arguments were always within the divisor's magnitude). The trailing xfmod(*ret_h, 360) appeared to be wrapping negative hues from the r-max branch back into [0, 360), but fmod is sign-preserving so it never did. Drop the no-ops and add an explicit +360 wrap so magenta (1, 0, 1) now yields h ≈ 300 instead of -60. Extend the tests to cover all six primary/secondary colors at sector boundaries, all six sector midpoints (to catch any future inversion of the ramp direction), the h-near-360 edge of the last sector, and the rgb_to_hsv negative-wrap path via magenta. Switch the new and existing integer-channel checks to ASSERT_EQ from tests.h; the double-typed h/s/v range checks stay on ASSERT_TRUE since the ASSERT_* comparison macros only support integer types. Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
550328e to
9a3ab9e
Compare
|
Moved the ABS() changes to the right commit |
To make this work, ABS() is made generic so it also works on floats and doubles. While at it, fold the __ABS_INTEGER indirection and the assert_cc(sizeof(long long) == sizeof(intmax_t)) away. The previous form switched between __builtin_llabs (clang) and __builtin_imaxabs (gcc), with the assert keeping the two paths behaviorally identical on every platform we build for. imaxabs was originally chosen because intmax_t is conceptually the widest signed integer type the platform exposes, but the _Generic ABS already casts to (long long) before the call, so the extra width imaxabs could in theory carry was being narrowed away immediately anyway. With both paths collapsed to __builtin_llabs((long long) (a)), the size relationship between long long and intmax_t is no longer relevant. Also add explicit unsigned long long / unsigned long / unsigned int cases that pass the argument through unchanged. The previous default branch cast unsigned values to (long long); for values above LLONG_MAX this reinterprets them as negative, and __builtin_llabs(LLONG_MIN) is UB. Unsigned values are already non-negative, so passing them through is both correct and avoids the narrowing. Smaller unsigned types (unsigned char, unsigned short) still go through the default branch but promote to int first and fit in long long losslessly. Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
exp10() has a symbol version > 2.34 on latest glibc. To allow dropping our baseline required glibc runtime version to <= 2.34, let's add our own version to prevent pulling in the newer symbol from glibc.
dgettext() lives in libc on glibc and in libintl.so.8 on musl with
gettext. Resolve it via dlsym() so neither configuration produces a
hard link-time dependency on libintl: try libintl.so.8 first and fall
back to RTLD_DEFAULT (which finds dgettext in libc on glibc).
The _() macro now expands to a runtime check that returns the
untranslated string if dlopen_libintl() has not run successfully, so
callers don't have to gate every translatable message on a runtime
check. pam_systemd_home — currently the only consumer of _() — calls
dlopen_libintl() best-effort from each PAM entry point.
The meson find_library('intl') dance is replaced with a has_header()
check; the only thing we need at build time is the prototype.
And drop the libm dependency.
Our baseline glibc is 2.34, which merged libdl, libpthread (the
dependency('threads') target), and librt into libc. Empty .so/.a stubs
remain for backward compatibility with old binaries, but new builds
resolve dl_*, pthread_*, mq_*, timer_*, etc. directly from libc.
On musl the same libraries are likewise empty stubs.
Drop the libdl, threads, and librt entries from every meson.build, and
remove the now-stale 'Libs.private: -lrt -pthread' from libudev.pc.in
since both flags resolve to empty link-time stubs on glibc 2.34+ and
musl.
Verified with readelf -d that libsystemd.so, libudev.so, and systemd no
longer carry DT_NEEDED entries for libdl/libpthread/librt.
Weak symbols still introduce a version requirement on a newer libc. Resolve each libc symbol via dlsym(RTLD_DEFAULT) from a per-shim constructor and cache the result in a file-scope static instead. This avoids the version requirement, keeps the call path free of atomics (constructors run single-threaded before main() and before any signal handler can fire), and keeps dlsym() out of contexts where it is not async-signal-safe.
When _GNU_SOURCE is defined, glibc will always use c23 versions of strtol(), sscanf() and friends if available (introduced after glibc 2.34). Which means that any binaries built with headers from newer glibc won't load on glibc < 2.38. To work around this, redefine the appropriate constants to zero make sure the c99 versions are used instead.
9a3ab9e to
e195dda
Compare
|
Pushed updated commit message for the ABS()/MAX()/MIN() commit |
|
|
||
| int dlopen_libintl(int log_level) { | ||
| #ifdef __GLIBC__ | ||
| return 1; |
There was a problem hiding this comment.
Claude: nit: On glibc, dlopen_libintl unconditionally returns 1 on every call. Other dlopen_* functions in the project return 0 on subsequent calls ("already loaded") vs 1 on the first successful load via dlopen_many_sym_or_warn. While callers currently only check for >= 0, the inconsistent semantics may confuse readers and break if a caller ever distinguishes first-load from already-loaded.
For every built executable, internal shared library, and plugin module, verify two link-time properties via readelf: 1. No imported GLIBC symbol's version is newer than 2.34. 2. The dynamic section's NEEDED entries reference only glibc, the runtime linker, our own libraries.
e195dda to
e8c6460
Compare
yuwata
left a comment
There was a problem hiding this comment.
LGTM.
It is fine to do that later, but could you address this?
#42100 (comment)
No description provided.