Skip to content

Conversation

@krobelus
Copy link
Contributor

Turn on the "embed-data" feature by default, to load our functions/completions directly from the fish binary.
Mainly to remove weird behavior in running shells when upgrading (#11921),
but also as a part of the effort to reduce differences between standalone and installed mode.

I wonder if we should already remove the ability to turn off embed-data.

(Draft state; this PR still includes some unrelated changes;
happy to accomodate review better but I still need to do self-review)

krobelus and others added 30 commits October 27, 2025 15:37
… are in sync

Commit 1fe6b28 (rustfmt.toml: specify edition to allow 2024 syntax,
2025-10-19) mentions that "cargo fmt" has different behavior than
"rustfmt" before that commit.  Probably because when .rustfmt.toml
exists, rustfmt implicitly uses a different edition (2018?)  that
doesn't support c"" yet.  That commit bumped the edition to 2024,
which caused yet another deviation from "cargo fmt":

    Error writing files: failed to resolve mod `tests`: cannot parse /home/johannes/git/fish-shell/src/wutil/tests.rs
    error: expected identifier, found reserved keyword `gen`
      --> /home/johannes/git/fish-shell/src/tests/topic_monitor.rs:48:9
       |
    48 |     for gen in &mut gens_list {
       |         ^^^ expected identifier, found reserved keyword

This has since been fixed by
0078424 (Update to rust 2024 edition, 2025-10-22).

Let's add a test so that such changes won't randomly break "rustfmt"
again.

Fix that by using 2021 edition, like we do in Cargo.toml.

In future, rustfmt should probably default to a current edition (or
maybe read the edition from Cargo.toml?)  Not yet sure which one is the
upstream issue, maybe rust-lang/rustfmt#5650
For historical reasons[^1], most of our Rust tests are in src/tests,
which
1. is unconventional (Rust unit tests are supposed to be either in the
   same module as the implementation, or in a child module).
   This makes them slightly harder to discover, navigate etc.
2. can't test private APIs (motivating some of the "exposed for
   testing" comments).

Fix this by moving tests to the corresponding implementation file.
Reviewed with

        git show $commit \
            --color-moved=dimmed-zebra \
            --color-moved-ws=allow-indentation-change

- Shared test-only code lives in
  src/tests/prelude.rs,
  src/builtins/string/test_helpers.rs
  src/universal_notifier/test_helpers.rs
  We might want to slim down the prelude in future.
- I put our two benchmarks below tests ("mod tests" followed by "mod bench").

Verified that "cargo +nightly bench --features=benchmark" still
compiles and runs. For some weird reason, the moved benchmarks throw
an unused-import warning though.

[^1]: Separate files are idiomatic in most other languages; also
separate files makes it easy to ignore when navigating the call graph.
("rg --vimgrep | rg -v tests/").  Fortunately, rust-analyzer provides
a setting called references.excludeTests for textDocument/references,
the textDocument/prepareCallHierarchy family, and potentially
textDocument/documentHighlight (which can be used to find all
references in the current file).

Closes fish-shell#11992
It failed for me on Linux.  Same as abf3f50 (Relax test
`test_dup2_during_select_ebadf` (fish-shell#11976), 2025-10-19).
Their msgfmt doesn't support --output-file=- yet, so use a temporary
file if "msgfmt" doesn't support "--check-format".  Else we should
have GNU gettext which supports both.

See fish-shell#11982
Functions are always qualified with "libc::".
These are obsolete as of c8001b5 (encoding: use UTF-8 everywhere,
2025-10-18). The only place where we still read LC_CTYPE is wcwidth()
(parent commit).
With this patch, our test driver unsets all LC_* variables
and LANGUAGE, and sets LANG=C (because LANG=en would change
quotes in the output).  The only place where explicit Unicode
support seems to be necessary is GNU sed called in the
tests/checks/sphinx-markdown-changelog.fish test.

In future we might want to avoid these issues by setting LANG=C.UTF-8
(or even LANG=en_US.UTF-8) instead of LANG=C, but for now, this shows
that our fish/python scripts use UTF-8 regardless of environment.
Link to history, printf and "builtin _" which are the only(?) users
of LC_TIME, LC_NUMERIC and LC_MESSAGES respectively (besides the core
equivalent of "builtin _").
These include variables that are ignored by fish, which seems
questionable for __fish_set_locale?  The the effect seems benign
because __fish_set_locale will do nothing if any of those variables
is defined.  Maybe we can get rid of this function eventually.
We may have used LC_COLLATE in the past via libc functions but I
don't think we do today.
* since c8001b5 (encoding: use UTF-8 everywhere, 2025-10-18)
  we always use UTF-8, which simplifies docs.
* emphasize that we (as of an earlier commit) document only the locale
  variables actually used by fish. We could change this in future,
  as long as the docs make it obvious whether it's about fish or
  external programs.
* make things a bit more concise
* delete a stale comment - missing encoding support is no longer a problem
Commit 8dc3982 (Always use LC_NUMERIC=C internally (fish-shell#8204),
2021-10-13) made us use LC_NUMERIC=C internally, to make C library
functions behave in a predictable way.

We no longer use library functions affected by LC_NUMERIC.  (Except for
snprintf in tests).

Since the effective value of LC_NUMERIC no longer matters, let's
simplify things by using the user locale again, like we do for the
other locale variables.
Commit 046db09 (Try to set LC_CTYPE to something UTF-8 capable
(fish-shell#8031), 2021-06-06) forced UTF-8 encoding if available.

Since c8001b5 (encoding: use UTF-8 everywhere, 2025-10-18) we no
longer override LC_CTYPE, since we no longer use it for anything like
mbrtowc or iswalpha.

However there is one remaining use: our fallbacks to system wcwidth().
If we are started as

	LC_ALL=C fish

then wcwidth('😃') will fail to return the correct value, even if
the UTF-8 locale would have been available.

Restore the previous behavior, so locale variables don't affect
emoji width.

This is consistent with the direction in c8001b5 which stops us
from falling back to ASCII characters if our desired multibyte locale
is missing (that was not the ideal criteria).

In future we should maybe stop using wcwidth().
We use the following locale-variables via locale-aware C functions
(you can look them up by grepping for "libc::$function_name"):

- LC_CTYPE:    wcwidth
- LC_MESSAGES: strerror, strerror 
- LC_NUMERIC:  localeconv/localeconv_l, snprintf (only in tests)
- LC_TIME:     strftime

Additionally, we interpret LC_MESSAGES in our own code.

As of today, the PCRE2 library does not seem to use LC_MESSAGES (error
messages are English-only); and I don't think it uses LC_CTYPE unless
we use "pcre2_maketables()".

Let's make it more obvious which locale categories are actually used
by setting those instead of LC_ALL.

This means that instead of logging the return value of «
setlocale(LC_ALL, "") » (which is an opaque binary string on Linux),
we can log the actual per-category locales that are in effect.
This means that there's no longer really a reason to hardcode a log
for the LC_MESSAGES locale. We're not logging at early startup but
we will log as soon as any locale var is detected as set.
Since c8001b5 (encoding: use UTF-8 everywhere, 2025-10-18), a
change in locale variables can no longer cause us to toggle between
"…" or "...".  Have the control flow reflect this.
When "status fish-path" fails, we fall back to argv[0],
hoping that it contains an absolute path to fish, or at
least is equal to "fish" (which can happen on OpenBSD, see
rust-lang/rust#60560 (comment)).

I don't think it makes sense to duplicate logic that's probably
already in std::env::current_exe().
Commit 7b59ae0 (Unbreak hack to strip " (deleted)" suffix from
executable path, 2025-10-02) accidentally droped the "!".
This value is either "fish" or an absolute path, it can't be empty.
This was used in fish_tests.cpp 754b78a (Suppress certain
stderr-printing during tests, 2016-12-03) which is no more.  I haven't
noticed this specific outptu in "cargo test" output.
As mentioned earlier commits, "status fish-path" is either an absolute
path or "fish".  At startup, we try to canonicalize this path. This
is wrong if "status fish-path" prints "fish" -- we'll do subtly wrong
things if that file happens to exist in the current working directory.
If fish is invoked as

	execve("/bin/fish", "fish")

on OpenBSD, "status fish-path" will output "fish".  As a result,
our wrapper for fish_indent tries to execute "./fish" if it exists.

We actually no longer need to call any executable, since "fish_indent"
is available as builtin now.

Stop using the wrapper, fixing the OpenBSD issue.
In future, we should ask "renovatebot" to update these version. I don't
have an opinion on whether to use "uv" or something else, but I think
we do want lockfiles.  No particular reason for this Python version.

Somehow tests/checks/sphinx-markdown-changelog.fish doesn't seem
to have run in CI before this (or possibly another change).  Anway,
it's failing on GitHub Actions because no tags are available (shallow
clone).  Disable that part of the this test on CI for now; the most
important part is the RST->Markdown conversion and syntax check.

Part of fish-shell#11996
The new automation workflow doesn't sign the Git tag or the tarball as
we used to.  Since the tag is still created locally, sign it directly.

For signing the tarball; build it locally and compare the extracted
tarball with the one produced by GitHub.

Closes fish-shell#11996
Google cloud CLI ships >10k man pages for subcommands, but the
completions are not useful because they don't know to replace
underscores by spaces, e.g. in:

	complete -c gcloud_access-approval_requests_approve
On first startup, we run this script, but not for standalone
(embed-data) builds.  Rectify that.
Seems to save only 1KiB in binary size (might be noise).
It's always the CMake output directory, so call it
FISH_CMAKE_BINARY_DIR. It's possible to set it via some other build
system but if such builds exist, they are likely subtly broken, or
at least with the following commit which adds the assumption that
"share/__fish_build_paths.fish.in" exists in this directory.

We could even call it CMAKE_BINARY_DIR but let's namespace it to make
our use more obvious. Also, stop using the $CMAKE environment variable,
it's not in our namespace.
Today, this file is only supported in CMake builds.  This is the last
place where CMake builds use $__fish_data_dir.

Let's embed it so we can treat it like other assets.

This enables users to do "rm -rf $__fish_data_dir" (though that might
break plugins).
According to
https://cmake.org/cmake/help/latest/command/install.html
default paths look like

	Type    variable                    default
	INCLUDE ${CMAKE_INSTALL_INCLUDEDIR} include
	SYSCONF ${CMAKE_INSTALL_SYSCONFDIR} etc
	DATA    ${CMAKE_INSTALL_DATADIR}    <DATAROOT dir>
	MAN     ${CMAKE_INSTALL_MANDIR}     <DATAROOT dir>/man

so "etc" is supposed to be a sibling of "include", not a child of DATA
(aka "share/").

This allows us to remove a hack where we needed to know the data dir
in non-standalone builds.
This matches the convention used by Make/CMake and others.
Some packagers such as Homebrew use the "relocatable-tree" logic,
i.e. "fish -d config" shows paths like:

	/usr/local/etc/fish
	/usr/local/share/fish

Other packagers use

	/etc/fish
	/usr/share/fish

which we don't treat as relocatable tree,
because "/usr/etc/fish" does **not** exists.

To get embed-data in shape for being used as a default for installed
builds, we want to support both cases.

Today, embed-data builds only handle the "in-build-dir" logic.
Teach them also about the relocatable-tree logic.  This will make
embed-data builds installed via Homebrew (i.e. CMake) use the correct
sysconfdir ("/usr/local/etc").
We use absence of "$__fish_data_dir[1]" as criteria to use the
"standalone" code paths that use "status list-files/get-file" instead
of $__fish_data_dir.

However, third party software seems slow to react to such breaking
changes, see ajeetdsouza/zoxide#1045

So keep $__fish_data_dir for now to give plugins more time.
This commit makes us ignore $__fish_data_dir on standalone builds
even if defined; a  following commit will actually define it again.

I guess the approach in b815fff (Set $__fish_data_dir to empty
for embed-data builds, 2025-04-01) made sense back when we didn't
anticipate switching to standalone builds by default yet.
To prepare for defining __fish_data_dir/__fish_help_dir again on
embed-data builds, reverts commit 04a2398. No functional change.
Implied by bee1e12 (Remove unused locale path code, 2025-09-16).
While at it, reorder the assignments to match the order in ConfigPaths.
As mentioned in a previous commit ("Don't special-case __fish_data_dir
in standalone builds"), we want to enable embed-data by default.
To reduce the amount of breakage, we'll keep installing files,
and keep defining those variables at least for some time.

For example, this will allow installed "standalone" builds to use
local docs via our "help" function.
When users update fish by replacing files, existing shells might
throw weird errors because internal functions in share/functions/
might have changed.

This is easy to remedy by restarting the shell,
but I guess it would be nice if old shells kept using old data.
This is somewhat at odds with lazy-loading.

But we can use the standalone mode.  Turn that on by default.

Additionally, this could simplify packaging.  Some packages incorrectly
third party files into /usr/share/fish/completion/ when they ought to
use /usr/share/fish/vendor_completions.d/ for that.  That packaging
mistake can make file conflicts randomly appearing when either fish
or foo ships a completion for foo.  With this patch, it no longer
causes conflicts but will silently not work, for better or worse.

The only advantage of having /usr/share/fish/ on the file system
is discoverability.  But the full source code is better.

Note that we still install (redundant) $__fish_data_dir when using
CMake.  This is to not unnecessarily break both
1. already running (old) shells as users upgrade to 4.2
2. plugins (see parent commit),

We can stop installing $__fish_data_dir once we expect 99% of users
to upgrade from at least 4.2.

Closes fish-shell#11921

To-do:
- maybe make it a feature flag so users can turn it off without
  recompiling with "set -Ua no-embed-data".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants