Skip to content

Conversation

@danielrainer
Copy link

Description

This allows explicit specification of the argument order in sprintf invocations, e.g. when using these arguments "%2$s %1$s" "a" "b", the expected result would be "b a". These specifiers are also available for the width and precision, and should have the same semantics as other sprintf implementations.

The main motivation for this change is giving translators the freedom to change the argument order, which can allow for more idiomatic translations.

This can also be made available to fish's printf command, but that has some additional input validation in
src/builtins/printf.rs:print_formatted which would have to be updated to make it compatible with the expanded syntax.

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.rst

@danielrainer
Copy link
Author

Looking at printf it seems that the its current semantics are incompatible with explicit argument positions due to:

The FORMAT argument is re-used as many times as necessary to convert all of the given arguments. So printf %s\n flounder catfish clownfish shark will print four lines.

So I think we have these options:

  1. Keep the current behavior, meaning repeated application but no explicit argument positions.
  2. Stop supporting repeated application but support explicit argument positions.
  3. Keep printf behavior, add new command (or flag to printf?) which supports the same syntax as our sprintf and replace all printf uses in shell scripts whose format string is localized with the new command.

The first option is what is implemented now. There would be no user-facing changes. Translators could use argument reordering, but only for messages which don't appear in fish scripts. This is problematic, because supporting different syntax for different messages is highly unintuitive. If messages appear both in Rust and a fish script, they will show up in the Rust section at the beginning of the PO file, so there are no messages which can safely use the new syntax. Under these circumstances, I don't think the first option makes sense.

Options 2 and 3 are similar. For option 2, the printf code needs major changes (but I think it could be simplified a bit). For option 3, we essentially need a modified copy of the current printf code, which could maybe share some code with the existing printf code. In either case, the essential task for the new code is, similarly to the old one, to figure out which Rust type each argument should be converted to, and to perform the conversion before passing the results to sprintf. The difference is that instead of going one conversion specifier at a time, we now need to look at all specifiers, then do the conversions, and pass the entire format string and all arguments to sprintf. We should also check for type conflicts.

Option 2 has the downside of potentially breaking existing user code. Its advantage over option 3 is that we would continue to have a single printf command, eliminating the need to come up with a new command name for the new behavior.

If we are ok with breaking the implicit printf repetition, I'd go for option 2. Otherwise, option 3 seems reasonable.

@Berrysoft
Copy link
Contributor

Option 1 and 3 are OK, I think. For option 1, the repeated behavior still exists when an argument doesn't specify position explicitly.

And I hope that no code that needs i18n uses this behavior:)

@danielrainer
Copy link
Author

For option 1, the repeated behavior still exists when an argument doesn't specify position explicitly.

That might be the best approach:

  1. Check the first placeholder to see if it uses explicit position.
  2. If not, use the existing printf code.
  3. If it does, use a new printf implementation which supports explicit position, but not automatic repetition.

This doesn't break existing behavior, while adding support for the new behavior.

@Berrysoft
Copy link
Contributor

That would be great. I hope this feature would complete as soon as possible :)

Daniel Rainer added 7 commits October 2, 2025 21:53
The `have_foo: bool` + `foo: i64` combination is more idiomatically
represented as `foo: Option<i64>`. This change is applied for
`field_width` and `precision`.

In addition, the sketchy explicit cast from `i64` to `c_int`, and the
subsequent implicit cast from `c_int` to `i32` are avoided by using
`i64` consistently.
The abbreviation is ambiguous, which makes the code unnecessarily hard
to read. (possible misleading expansions: direct, direction, director,
...)
As with `msgmerge`, this introduces unwanted empty comment lines above
`#, c-format`
lines. We don't need this strict formatting, so we get rid of the flag
and the associated empty comment lines.
This allows explicit specification of the argument order in sprintf
invocations, e.g. when using these arguments `"%2$s %1$s" "a" "b"`, the
expected result would be `"b a"`. These specifiers are also available
for the width and precision, and should have the same semantics as other
sprintf implementations.

The main motivation for this change is giving translators the freedom to
change the argument order, which can allow for more idiomatic
translations.

This can also be made available to fish's `printf` command, but that has
some additional input validation in
`src/builtins/printf.rs:print_formatted` which would have to be updated
to make it compatible with the expanded syntax.
This allows specifying explicit argument positions in the format string
of the `printf` command, e.g. `'%2$*1$.*2$d' 10 4`

Note that this feature is incompatible with applying arguments to the
format string until all arguments are used up.
Also note that using explicit argument positions is all-or-nothing.
If they appear in the format string, referring to arguments implicitly
is not possible.

Documentation updates and tests are missing for now.
@danielrainer danielrainer force-pushed the printf_explicit_arg_positions branch from becc0ef to 3d0b89c Compare October 3, 2025 02:05
@danielrainer
Copy link
Author

I added code to make the printf command work with explicit argument positions. Tests and documentation updates are missing. The printf code is quite a mess now, with a lot of similar, but still slightly different functionality. Ideally, this would be cleaned up, but I'm not sure how much we can improve the current code.

I'm in favor of getting rid of length modifier support. We have no need for them, they are not mentioned in our docs, they make the code somewhat more complicated, add unnecessary characters to format string, and result in a few duplicate entries in our PO files which only differ in length modifiers. I have a branch with according changes ready at https://github.com/danielrainer/fish-shell/tree/printf_remove_length_modifier_support, but these changes will probably result in a lot of conflicts with the two ongoing translation update PRs, so I think these changes should probably wait until these PRs are finished, then I can apply the necessary changes on top of them.

This PR is now also based on top of #11863, to avoid useless empty comments in PO files.

@krobelus krobelus added this to the fish 4.2 milestone Oct 3, 2025

# Get rid of duplicates and sort.
msguniq --no-wrap --strict --sort-output $rust_extraction_file
msguniq --no-wrap --sort-output $rust_extraction_file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at printf it seems that the its current semantics are
incompatible with explicit argument positions

yeah builtin printf's implicit repetition is used a lot; breaking
that would be unnecessarily surprising. printf also doesn't take any
options so the design space is pretty narrow.
builtin printf.

It's probably a big problem in practice to leave the reordering
feature out of fish script for now, as long as we can add it later.


What I could glean from the implementation so far look good.

It's less permissive than ZSH behavior;
ZSH allows combining repetition with ordering parameters:

zsh% printf '%2$s\n' a b c d
b
d
zsh% printf '%2$s\n' a b c
zsh:printf:1: 2: argument specifier out of range
b

I think we would be able to allow this too in future without breaking change,
but I don't know if we want to.


If we want to make bigger steps, we can give it a different name,
maybe abandon printf format-string syntax altogether (like we're also
trying to move Rust sources to format! syntax):

status localize-messages my-msgid --key1=value ...

which would remove the need to touch the printf builtin, but it seems worse overall.

As of today, user-written scripts cannot bring their own translations anyway;
If we want to allow that in future, we could do that either way

status add-translation my-msgid --keyword-argument=key1 -- zh_CN=... de_DE=...

though I don't expect that this will be a good idea:
shell scripts can already use the gettext command for simple things, and if they become more complicated, users are better off writing them in Python etc,
so I'm not sure how much actual use there is for exposing the reordering feature in our custom gettext engine.
(In general, I'm not sure how many good reasons there are to write nontrival fish scripts)

Copy link
Contributor

@krobelus krobelus Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the only reason why we are still stuck on legacy1 printf syntax is that there were concerns around localization.
I don't think those concerns are relevant anymore (not sure if they ever were);
if we can translate "%s", then we can also translate "{}" the same way.

so we want to

  1. convert all non-lozalized sprintf etc. calls to Rust syntax (simple but tedious)
  2. do the same for localized ones, and make sure to update existing translations accordingly
    • this implies adding the parameter-ordering syntax for Rust
  3. have the option to switch to fluent later, which uses different format string syntax

I think 2 and 3 suggest
that we may not want to add this feature to builtin printf:
this is because we want to change the parameter syntax of translatable
strings (most of which are in Rust sources2) once or twice (first to Rust syntax, and possibly to fluent),
and crucially, neither of the destination syntaxes works for builtin printf.

Unless we are okay with a mix of printf-style and non-printf-style msgids in po/,
I'd rather create a dedicated builtin like status localize-messages sketched above that calls wgettext_fmt!,
that has exactly the same powers available to Rust code.

Footnotes

  1. one reason why printf syntax is bad is because it forces programmers
    to add redundant information to their programs, which causes needless
    extra edit-and-recompile cycles.

  2. our localized scripts are few and not very important, see rg 'printf[^\n]*\(_\b' -l share/ | path basename

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can translate "%s", then we can also translate "{}" the same way.

We can do that if our arguments are available as the desired types in Rust, which is not the case if they are taken from a shell script. If we want type-less format strings in Rust, we will have to use a different syntax for fish scripts, or treat all inputs as strings.

Having more than one syntax style might make translation more annoying, but generally the formatting syntax can just be copied, so that might be not much of an issue.

Do you have a concrete proposal for how Rust format strings should look, especially considering reordering for localization? We would have to write our own formatter for it, otherwise we lose the ability to print arbitrary bytes.

I'd rather create a dedicated builtin like status localize-messages sketched above that calls wgettext_fmt!,
that has exactly the same powers available to Rust code.

I agree that would be nice, but as mentioned above, we don't have type info in fish scripts, so it's just not possible to do the same stuff we can do in Rust.

So essentially I see these options:

Stick with printf for everything

Advantages:

  • No need to come up with a new formatting syntax (and converting to it)
  • Works for Rust and shell scripts, so we can have a single formatter implementation
  • Well-known format
  • Consistent format is nicer for translators
  • Reasonably flexible for both Rust and shell scripts
  • Argument reordering can be supported using explicit argument index syntax

Disadvantages:

  • Requires redundant type info in Rust
  • Unnecessarily restrictive in Rust (arbitrary types can have formatting implementations). Currently one would have to work around this by first formatting the argument separately, and then using %s in the format string.
  • Fairly ugly syntax for more complicated directives

Use Rust-style strings for everything

Advantages:

  • Eliminates redundant type info in Rust (although it does not shorten the strings, both {} and %s take two chars)
  • Allows more Rust types to be formatted directly
  • Nicer, more intuitive style, hopefully?
  • Compile-time checks for some things. Checking that the arguments implement some formatting trait should be simple. Other checks seem harder, especially when the format string is localized. Maybe checking Rust's format! can help with understanding what can be done there.

Disadvantages:

  • Dumbs down formatting for shell scripts. Every argument would have to be treated as a string, unless there is explicit info telling us otherwise (e.g. if the width can be specified as an argument, we can probably treat that argument as an integer. This depends on the exact format we choose).
  • All usages need to be converted. A large part might be easily automatable (converting %s to {} is not exactly hard). But avoiding false positives, and handling more complicated cases might require significant effort.
  • Unclear how argument reordering would be handled. Argument indices in brackets are one option, similar to how it is done in printf. Argument names provide more info (especially to translators), but are a bit harder to parse. For localization, it would also require having both the original and the translated format string available to figure out how the reordering should be done. I don't have a good idea yet how that could be implemented sensibly.

Use Rust-style strings for Rust, separate syntax for shell scripts

Advantages:

  • Choices can be made independently to best suit the respective use cases (see previous sections for how/why that might be helpful)

Disadvantages:

  • Need to maintain 2 formatting implementations (3 if shell formatting uses something other than printf)
  • Translators don't have one consistent syntax to work with

My conclusion

I think a reasonable case can be made for all three approaches, with none being clearly better or worse than the others. If someone is willing to propose a more detailed plan for the second or third option, and is willing to implement it, I don't think I would object. If that's not the case, I think sticking with the current approach is by far the least effort, since it already works and there is a clear way of implementing the missing feature of argument reordering, so that would be my default choice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a big problem in practice to leave the reordering
feature out of fish script for now, as long as we can add it later.

I'm assuming you meant that its not a big problem. While I would agree with that, the primary reason for introducing the feature is providing more flexibility for translators. If the printf command doesn't support explicit positions, translators need to know whether a string might be used by the printf command, which would already be really annoying if it could be determined from the section headers in the PO files, but that's not the case, since deduplication means that if a message is present in a fish script and the Rust sources, it will show up in the Rust section. So I prefer not introducing the feature at all over introducing it only for sprintf but not the printf command.

Regarding implicit/explicit mixing, see #11874 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with taking this PR assuming we add docs and tests, and assuming
it doesn't add any behavior to printf that we ever want to remove again.
(I haven't read too much of the second commit though, but you can take care
of keeping it correct and reasonably simple. It's an ugly problem because,
as you say, only Rust consumers have type info.)

But I'm deifnitely interested in the next steps:

Do you have a concrete proposal for how Rust format strings should look,

I think I have a clear picture of our new localizable format strings' would look like in Rust
(there are only loose ends on the equivalent for fish script, should we want to use a subset of the syntax there).
Roughly:

  • replace sprintf!() with wformat!().

    • maybe share no code with the printf implementation, not yet sure
    • wformat!() is to support a subset of format!()'s format string syntax
    • The main difference is that wformat!() uses our PUA-trick (str2wcstring ), so the result can represent non-UTF-8 data
    • The format string argument to wformat!() must always be a compile-time constant
      (except perhaps for translations, see below).
      • even if wformat!() doesn't do compile-time checking for format args yet,
        it's a step in the right direction, and we can add that later.
  • For localizable strings where we need parameter reordering,
    implement either a relevant subset of

    • explicit parameter positions
    • or of named parameters
      I probably wouldn't bother with positionals, since we never use them,
      and using named parameters everywhere seems like better practice anyway.
      I have not looked into how complicated it would be to implement this,
      but remember, it's fine if we do everything at runtime for now
      (because that's what sprintf does today), so I don't expect any blockers.
      There are some pre-existing issue:
      just like with sprintf!, we can fail (i.e. panic) if we're given a string that we don't support yet.
      If something unsupported like wformat!("{:02}") sneaks through code review, that'll be tragic but we can probably write a clippy check for that.
      We'd also want to check all translations.
  • Regarding localizable strings in fish script: I checked all existing uses,
    and no placeholder produces anything other that what we'd get with format!("{}", val).
    So I think we get away with implementing only that for now.
    I didn't see anything like octal (printf %d 0123) or floating point numbers, where printf uses a locale-specific decimal separator.
    If we need those, we can still use printf, and feed the result

    fish_localize_message "{percentage}% complete" -- percentage=(printf %0.2f (math $have / $want))

This new DSL template language might be weird for a shell.
but I also don't expect that we want to stick to printf for translations forever,
since it probably can't support future translations features like plurals.
I haven't worked out the concrete details how this would work in fish script
(since I'm not yet sure when we would do this)

Stick with printf for everything

yeah, this seems fine as a first step

Use Rust-style strings for everything
Use Rust-style strings for Rust, separate syntax for shell scripts

I think either of these two options is a good next step.

  • Dumbs down formatting for shell scripts.

This is only about localized-message formatting right.
I'm no i18n expert but I'd think that's fine.
It does take away some (already very limited) power from translators
(e.g. they'd no longer have the choice between %X and %x)
but Fluent probably has the same issue (coming from the Javascript land),
so I'm sure they have a way more flexible solution for letting translators format dynamic values.
To be safe, I'd do some research into Fluent,
we don't want to make it harder for us to switch to that.

All uses of %s need to be converted.

that's something we want to do regardless

Unclear how argument reordering would be handled.

As suggested above, we'd implement a subset of Rust's std::fmt syntax

For localization, it would also require having both the original and the translated format string available to figure out how the reordering should be done. I don't have a good idea yet how that could be implemented sensibly.

this is about the hypothetical fish_localize_message builtin, right?
That builtin takes a msgid and some key-value pairs for placeholders,
looks like the msgid like _ does, and then

  1. either invokes a version of wformat!() that takes a dynamic format string
  2. or have our CATALOGS not expose translations as &wstr but rather
    as a Fn(msgid: &wstr, params: ParameterBag) -> WString.

This should never fail unless there's a bug in our code.

Again, I'd want to take a look at how fluent-rs does it
(our switch to using that.. I know that it might be awkward for completion scripts,
but maybe we can rework our existing po file generation to create *.ftl files.. not sure if that's feasible).

If the printf command doesn't support explicit positions, translators need to know whether a string might be used by the printf command,

Right, if we didn't add this feature to the printf builtin,
that would speak for option 3, to make it more obvious that
one of the syntaxes doesn't allow reordering.
But since it looks like we are gonna add it to printf, it doesn't matter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't read too much of the second commit though, but you can take care
of keeping it correct and reasonably simple.

Yeah, don't bother with reviewing the current state. Before we add features to the printf implementation it should see some major refactoring.

(e.g. they'd no longer have the choice between %X and %x)

It's worse than that, they'd no longer have the choice to format anything as something other than a string if the format specifier does not contain type info. Which may be ok, I haven't actually checked if anything else is used/needed.

this is about the hypothetical fish_localize_message builtin, right?

I was thinking of the Rust side as well. I just realized that default Rust formatting supports named parameters by specifying arguments as key-value pairs. With k-v approach for arguments (both in Rust and shell scripts) the concerns I raised about named parameters are void.

One option, which for some reason I didn't think of earlier, is using existing formatting implementations (like Rust's format!, or the formatx crate for localized format strings), and then add a post-processing step which converts PUA-encoded bytes back into the original bytes before printing. This should also work with fluent, I think. Is there anything preventing this approach from working?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we add features to the printf implementation it should see some major refactoring.

the sprintf change seems like a lot of bang for the buck (if translators are waiting on this),
so we could do only that initially, your call

Which may be ok

it probably is, I'd be very surprised if any existing translation changes any format specifiers relative to the source string

Is there anything preventing this approach from working?

So you're saying we should operate on String, because WString
invariants already guarantee that every char is a valid char,
i.e. doing effectively:

let fish_string: WString;
let rust_string: String = fish_string.to_string()
let formatted_message = format!("some message: {rust_string}");
let formatted_wide_string = WString::from_str(&formatted_message);

yeah that should work actually.
The number of temporary allocations seems odd but probably not a show-stopper.

A wide-char-native implementation would write to WString directly,
but IIRC the standard traits like Display, Debug want String,
so that might be an uphill battle

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sprintf change seems like a lot of bang for the buck (if translators are waiting on this),
so we could do only that initially, your call

The actual buck is quite a bit larger, since it requires more refactoring of printf (the current code path for explicit arg positions doesn't support escape sequences, for example, and there is already way too much code duplication). I might continue to work on it, but I haven't decided yet.

i.e. doing effectively:

let fish_string: WString;
let rust_string: String = fish_string.to_string()
let formatted_message = format!("some message: {rust_string}");
let formatted_wide_string = WString::from_str(&formatted_message);

I think the second step is unnecessary, our wstr type implements Display: https://docs.rs/widestring/latest/widestring/utfstr/struct.Utf32Str.html#impl-Display-for-Utf32Str

While we could convert the formatting result back to WString, it might make sense to directly convert to bytes in the cases where we want to output and not keep the formatted string around (which I assume is most/all cases). With that, the only allocation we need is for formatted_message, which would happen even if we were only dealing with Rust's str and no non-UTF-8 bytes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format strings themselves should of course be Rust strs, but that should not be a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second step is unnecessary,

Right. So as long as we implement Display for everything, it's at most one extra allocation, which is surely fine. Nice.

it might make sense to directly convert to bytes in the cases where we want to output

yeah, if we're filling an IoBuffer or similar, we can skip WString::from_str+wcs2string in favor of

	fn internal_chars_to_external_bytes(out: &mut Vec<u8>, cs: Iterator<Item = char>) {
		for c in cs {
			if is_pua(c) {
				out.push(decode_byte_from_char(c));
			} else {
				for b in wcrtomb(c) {
					out.push(b);
				}
			}
		}
	}

/// The format string contains a format specifier using explicit argument indices and one using
/// implicit indices.
/// Only all-explicit and all-implicit are allowed.
InconsistentUseOfArgumentIndices,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent Arg/Argument.
I'd change everything to Argument,
and use use printf::Error::* in callers.


zsh supports mixing explicit and implicit indices:

$ zsh -c 'printf \'%2$s%s\n\' a b c d'
ba
dc

I don't have a super strong opinion here,
as long as we're implementing a subset of POSIX printf(3p) or zsh printf,
(that we can still extend it later if we really want).
Maybe we should document which printf dialect we're following, if any.

InconsistentUseOfArgumentIndices,
/// Not all arguments are used.
/// Vector contains indices of unused arguments (1-based)
UnusedArgs(Vec<usize>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem reasonable to err on the side of giving better errors for now rather than implementing rare edge cases.

buffer.as_str()
}

fn take_arg_position(&mut self, num_args: usize) -> Result<Option<usize>, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oof, the (existing) duplication is ugly.
I guess we could try to extract a shared trait.

s.advance_by(1);
let arg_index = match arg_positions {
ArgPositions::Uninitialized => {
panic!("arg_positions must be initialized before parsing width.")
Copy link
Contributor

@krobelus krobelus Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace ArgPositions::Uninitialized enum variant with
Option<ArgPositions>, to get rid of repeated assertions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants