Skip to content

Conversation

@danielrainer
Copy link

This time, there are some behavioral changes. The I flag change is very minor, and mostly useful for people looking at the code, to avoid suggesting that it's supported when it's not.

The important changes here are to avoid panicking when sprintf_locale fails. The printf implementation parses the format string before calling sprintf_locale which includes some error handling, but it is a bad idea to assume that it catches all errors sprintf_locale might encounter. Crashing fish because of an invalid printf invocation is unreasonable, so stop doing that and instead print an error message.

Daniel Rainer added 2 commits October 3, 2025 23:55
It is not reasonable for fish to panic when `sprintf_locale` encounters
an issue.
This flag is not supported by our sprintf, so remove it here, to avoid
suggesting that it might be supported.
@krobelus krobelus added this to the fish 4.2 milestone Oct 4, 2025
}
use fish_printf::Error::*;
let msg = match err {
BadFormatString => wgettext!("Bad format string").into(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important changes here are to avoid panicking when
sprintf_locale fails. The printf implementation parses the
format string before calling sprintf_locale which includes some
error handling, but it is a bad idea to assume that it catches all
errors sprintf_locale might encounter. Crashing fish because of
an invalid printf invocation is unreasonable, so stop doing that
and instead print an error message.

not sure about this; the panic suggests these can never happen,
i.e. all of the argument valid checks are already done earlier.
So a panic would indicate a bug or at least inconsistency.
So unless this can ever happen(?), I prefer panicking
and not adding redundant translatable strings
(especially since we own printf)

Of course we should probably be more explicit about preconditions.
Maybe instead of calling printf::sprintf_locale() which seems to take a raw format string,
we could call a function that takes a real data structure
where BadFormatString, MissingArg, ExtraArg and BadArgType
are impossible by construction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to help future changes like #11857, we could make all the impossible cases explicit:

match err {
    BadFormatString => panic!(),
    MissingArg => panic!(),
    ...
}

i.e., remove the panic on unhandled (future) errors.
Then the compiler will help us, so that's a good change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could call a function that takes a real data structure
where BadFormatString, MissingArg, ExtraArg and BadArgType
are impossible by construction.

Having spent more time with the printf code, I have also come to this conclusion. The format string validation/parsing duplication is ugly an it's only gonna get worse if we add more features.

What I think makes sense:

  • Have a function in the printf crate which parses a format string but ignores the arguments. The result would be a sequence of literal strings (with double %% replaced by %) and format specifiers, which would be a struct containing all relevant information from a % directive.
  • The printf command would then use this function, apply it's escape sequences to the literal strings and parse the argument strings into the appropriate types (and check for errors/conflicts). This part should take care of repeated application.
  • (s)printf calls from other code can skip this step since arguments are provided as typed Rust values.
  • Then, we have a function taking the structure extracted from the format string and the typed arguments which combines them into one data structure, at which point argument types and presence is validated.
  • Finally, the resulting data structure is taken and formatted, at which point most errors will no longer be possible.

Then, I think it makes sense to provide a function for formatting the error messages. When called from code other than the printf command, we continue to panic, but with better error messages (currently we just unwrap the sprintf_locale result. For the printf command, we print the error message but don't panic.

This approach should make it feasible to add support for explicit argument positions later on. You mention in #11857 (comment) that zsh supports mixing implicit and explicit argument positions, but I think the semantics of that would be highly unintuitive, annoying to implement (especially with repetition), and zsh even recommends not doing so. Note that they don't even specify semantics:

Normally, conversion specifications are applied to each argument in order but they can explicitly specify the nth argument is to be used by replacing '%' by '%n$' and '*' by '*n$'. It is recommended that you do not mix references of this explicit style with the normal style and the handling of such mixed styles may be subject to future change.

So I think we should not open that can of worms and just prohibit such mixing.

There is certain behavior I'm not sure about yet.

  • At the moment printf uses default values if arguments are missing, e.g. printf %d prints 0 instead of an error. I think it makes more sense to print errors in such situations, including repetitions like printf '%d %d ' 1 2 3.
  • The \c escape might make the code somewhat ugly. I'm not sure if there is a good reason to support it.
  • We currently have differently-sized signed integer types, whose only purpose is casting negative numbers to unsigned numbers when printing with the x or X conversion char, so -1i8 turns into ff, -1i16 into ffff and so on. This does not use length modifiers, but relies on the Rust type, meaning it's only available for our own Rust code. AFAICT, the only place outside of tests this feature is used is src/input.rs:describe_char (only the second sprintf!), which seems to be dead code, and I'm not sure it would ever get a negative argument. So I think we can safely remove it and use i64 for all signed ints, but if there is any reason to keep it let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into the printf code much;
but this sounds reasonable and if it makes the parameter position changes easier
we should probably do something like this, but it's probably not strictly necessary AFAICT.

(s)printf calls from other code can skip this step since arguments are provided as typed Rust values.

yes but maybe no need to invest too much effort into making this code path less bad,
because we're hopefully gonna stop using printf syntax in Rust code,
at least for non-translated strings,
so in future, the only client of printf might be the builtin
(and potential consumers of the library),
but it's all runtime checks.

Then, I think it makes sense to provide a function for formatting

ok so the above would make error handling flexible enough so we can consolidate it in a single place, which would allow getting rid of redundant unwrap/panic

So I think we should not open that can of worms and just prohibit such mixing.

yeah that makes sense.
Overall, printf is sort of legacy thing; we'll implement what users expect (especially coming from other implementations), but there are limits on what it can reasonably do additionally

  • At the moment printf uses default values if arguments are missing

all of bash/dash/zsh behave the same on printf %d
I don't think we should change this historical behavior without a strong reason.
For any new functionality (like the explicit parameter order),
we can definitely err on the side of printing an error on weird input,
and maybe revisit that later.

  • The \c escape might make the code somewhat ugly. I'm not sure if there is a good reason to support it.

TIL it's "produce no further output".
Not sure why that was added.

I don't expect that it will make the code ugly;
because we can have an isolated preprocessing step that converts `"%s\c..." to "%s".

src/input.rs:describe_char

I'm pretty sure it was only ever passed character values (or ReadlineCmd
enum values cast to that), i.e. char, which is a subset of u32.
So this would be fine.
There is a comment saying // Keep this function for debug purposes
but I'm sure that's obsolete, since ReadlineCmd implements Debug now.

So I think we can safely remove it and use i64 for all signed ints

sprintf!("%x" -1_i16) should probably keep working, for any (future) consumers of the printf library.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can have an isolated preprocessing step that converts `"%s\c..." to "%s"

Ah yes, that's a sensible approach.

sprintf!("%x" -1_i16) should probably keep working, for any (future) consumers of the printf library.

By "keep working", do you mean its behavior should remain identical, or it should not be an error to use it but it would be fine if it resulted in different output (i.e. ffffffffffffffff)?

Do we know if there are any consumers outside of fish itself? And if so, whether they care about this feature? Keeping around unused functionality for hypothetical future consumers seems strange.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah actually I don't know what %x is supposed to do for signed integers.
C/C++ define it only for unsigned integers.
In shells, it seems like printf %x -1 outputs target_pointer_width worth of bits. I guess we are not touching that.

And yeah, I don't think this library has any other serious user.
So for the library part, I think I'd prefer to make %x of signed integers a compile error, unless there is a reasonable explanation for another behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#11889 removes the feature. For differentiating between printf and library users, more refactoring is needed, so that'll have to wait.

MissingArg => wgettext!("Missing argument").into(),
ExtraArg => wgettext!("Too many arguments").into(),
BadArgType => wgettext!("Argument type does not match conversion character").into(),
Overflow => wgettext!("Number out of range").into(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(better extract a constant for the error message when reusing it)

while continue_looking_for_flags {
match f.char_at(0) {
'I' | '\'' => {
'\'' => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know this code but yeah this looks like it would only change behavior on %I which is not documented, so looks good

danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Oct 5, 2025
See fish-shell#11874 (comment)
> There is a comment saying `// Keep this function for debug purposes`
but I'm sure that's obsolete, since ReadlineCmd implements `Debug` now.
krobelus pushed a commit that referenced this pull request Oct 6, 2025
See #11874 (comment)
> There is a comment saying `// Keep this function for debug purposes`
but I'm sure that's obsolete, since ReadlineCmd implements `Debug` now.

Closes #11887
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants