-
Notifications
You must be signed in to change notification settings - Fork 2.2k
More printf cleanup #11874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
More printf cleanup #11874
Conversation
It is not reasonable for fish to panic when `sprintf_locale` encounters an issue.
This flag is not supported by our sprintf, so remove it here, to avoid suggesting that it might be supported.
| } | ||
| use fish_printf::Error::*; | ||
| let msg = match err { | ||
| BadFormatString => wgettext!("Bad format string").into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important changes here are to avoid panicking when
sprintf_localefails. Theprintfimplementation parses the
format string before callingsprintf_localewhich includes some
error handling, but it is a bad idea to assume that it catches all
errorssprintf_localemight encounter. Crashing fish because of
an invalidprintfinvocation is unreasonable, so stop doing that
and instead print an error message.
not sure about this; the panic suggests these can never happen,
i.e. all of the argument valid checks are already done earlier.
So a panic would indicate a bug or at least inconsistency.
So unless this can ever happen(?), I prefer panicking
and not adding redundant translatable strings
(especially since we own printf)
Of course we should probably be more explicit about preconditions.
Maybe instead of calling printf::sprintf_locale() which seems to take a raw format string,
we could call a function that takes a real data structure
where BadFormatString, MissingArg, ExtraArg and BadArgType
are impossible by construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to help future changes like #11857, we could make all the impossible cases explicit:
match err {
BadFormatString => panic!(),
MissingArg => panic!(),
...
}
i.e., remove the panic on unhandled (future) errors.
Then the compiler will help us, so that's a good change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could call a function that takes a real data structure
where BadFormatString, MissingArg, ExtraArg and BadArgType
are impossible by construction.
Having spent more time with the printf code, I have also come to this conclusion. The format string validation/parsing duplication is ugly an it's only gonna get worse if we add more features.
What I think makes sense:
- Have a function in the printf crate which parses a format string but ignores the arguments. The result would be a sequence of literal strings (with double
%%replaced by%) and format specifiers, which would be a struct containing all relevant information from a%directive. - The
printfcommand would then use this function, apply it's escape sequences to the literal strings and parse the argument strings into the appropriate types (and check for errors/conflicts). This part should take care of repeated application. (s)printfcalls from other code can skip this step since arguments are provided as typed Rust values.- Then, we have a function taking the structure extracted from the format string and the typed arguments which combines them into one data structure, at which point argument types and presence is validated.
- Finally, the resulting data structure is taken and formatted, at which point most errors will no longer be possible.
Then, I think it makes sense to provide a function for formatting the error messages. When called from code other than the printf command, we continue to panic, but with better error messages (currently we just unwrap the sprintf_locale result. For the printf command, we print the error message but don't panic.
This approach should make it feasible to add support for explicit argument positions later on. You mention in #11857 (comment) that zsh supports mixing implicit and explicit argument positions, but I think the semantics of that would be highly unintuitive, annoying to implement (especially with repetition), and zsh even recommends not doing so. Note that they don't even specify semantics:
Normally, conversion specifications are applied to each argument in order but they can explicitly specify the nth argument is to be used by replacing '%' by '%n$' and '*' by '*n$'. It is recommended that you do not mix references of this explicit style with the normal style and the handling of such mixed styles may be subject to future change.
So I think we should not open that can of worms and just prohibit such mixing.
There is certain behavior I'm not sure about yet.
- At the moment
printfuses default values if arguments are missing, e.g.printf %dprints0instead of an error. I think it makes more sense to print errors in such situations, including repetitions likeprintf '%d %d ' 1 2 3. - The
\cescape might make the code somewhat ugly. I'm not sure if there is a good reason to support it. - We currently have differently-sized signed integer types, whose only purpose is casting negative numbers to unsigned numbers when printing with the
xorXconversion char, so-1i8turns intoff,-1i16intoffffand so on. This does not use length modifiers, but relies on the Rust type, meaning it's only available for our own Rust code. AFAICT, the only place outside of tests this feature is used issrc/input.rs:describe_char(only the secondsprintf!), which seems to be dead code, and I'm not sure it would ever get a negative argument. So I think we can safely remove it and usei64for all signed ints, but if there is any reason to keep it let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked into the printf code much;
but this sounds reasonable and if it makes the parameter position changes easier
we should probably do something like this, but it's probably not strictly necessary AFAICT.
(s)printfcalls from other code can skip this step since arguments are provided as typed Rust values.
yes but maybe no need to invest too much effort into making this code path less bad,
because we're hopefully gonna stop using printf syntax in Rust code,
at least for non-translated strings,
so in future, the only client of printf might be the builtin
(and potential consumers of the library),
but it's all runtime checks.
Then, I think it makes sense to provide a function for formatting
ok so the above would make error handling flexible enough so we can consolidate it in a single place, which would allow getting rid of redundant unwrap/panic
So I think we should not open that can of worms and just prohibit such mixing.
yeah that makes sense.
Overall, printf is sort of legacy thing; we'll implement what users expect (especially coming from other implementations), but there are limits on what it can reasonably do additionally
- At the moment
printfuses default values if arguments are missing
all of bash/dash/zsh behave the same on printf %d
I don't think we should change this historical behavior without a strong reason.
For any new functionality (like the explicit parameter order),
we can definitely err on the side of printing an error on weird input,
and maybe revisit that later.
- The
\cescape might make the code somewhat ugly. I'm not sure if there is a good reason to support it.
TIL it's "produce no further output".
Not sure why that was added.
I don't expect that it will make the code ugly;
because we can have an isolated preprocessing step that converts `"%s\c..." to "%s".
src/input.rs:describe_char
I'm pretty sure it was only ever passed character values (or ReadlineCmd
enum values cast to that), i.e. char, which is a subset of u32.
So this would be fine.
There is a comment saying // Keep this function for debug purposes
but I'm sure that's obsolete, since ReadlineCmd implements Debug now.
So I think we can safely remove it and use
i64for all signed ints
sprintf!("%x" -1_i16) should probably keep working, for any (future) consumers of the printf library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can have an isolated preprocessing step that converts `"%s\c..." to "%s"
Ah yes, that's a sensible approach.
sprintf!("%x" -1_i16)should probably keep working, for any (future) consumers of the printf library.
By "keep working", do you mean its behavior should remain identical, or it should not be an error to use it but it would be fine if it resulted in different output (i.e. ffffffffffffffff)?
Do we know if there are any consumers outside of fish itself? And if so, whether they care about this feature? Keeping around unused functionality for hypothetical future consumers seems strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah actually I don't know what %x is supposed to do for signed integers.
C/C++ define it only for unsigned integers.
In shells, it seems like printf %x -1 outputs target_pointer_width worth of bits. I guess we are not touching that.
And yeah, I don't think this library has any other serious user.
So for the library part, I think I'd prefer to make %x of signed integers a compile error, unless there is a reasonable explanation for another behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#11889 removes the feature. For differentiating between printf and library users, more refactoring is needed, so that'll have to wait.
| MissingArg => wgettext!("Missing argument").into(), | ||
| ExtraArg => wgettext!("Too many arguments").into(), | ||
| BadArgType => wgettext!("Argument type does not match conversion character").into(), | ||
| Overflow => wgettext!("Number out of range").into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(better extract a constant for the error message when reusing it)
| while continue_looking_for_flags { | ||
| match f.char_at(0) { | ||
| 'I' | '\'' => { | ||
| '\'' => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know this code but yeah this looks like it would only change behavior on %I which is not documented, so looks good
See fish-shell#11874 (comment) > There is a comment saying `// Keep this function for debug purposes` but I'm sure that's obsolete, since ReadlineCmd implements `Debug` now.
See #11874 (comment) > There is a comment saying `// Keep this function for debug purposes` but I'm sure that's obsolete, since ReadlineCmd implements `Debug` now. Closes #11887
This time, there are some behavioral changes. The
Iflag change is very minor, and mostly useful for people looking at the code, to avoid suggesting that it's supported when it's not.The important changes here are to avoid panicking when
sprintf_localefails. Theprintfimplementation parses the format string before callingsprintf_localewhich includes some error handling, but it is a bad idea to assume that it catches all errorssprintf_localemight encounter. Crashing fish because of an invalidprintfinvocation is unreasonable, so stop doing that and instead print an error message.