py: Add support for nested f-strings within f-strings #18588
+63
−49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
During the discussion about t-string support #17557, I thought that it might not be much effort to support nested f-strings with the current f-string parser. And it turned out relatively simple.
The way the MicroPython f-string parser works is:
The temporary buffer can easily hold f-strings itself (ie nested f-strings) and they can be re-parsed by the lexer using the same algorithm. The only thing stopping that from working is that the temporary buffer can't be reused for the nested f-string because it's currently being parsed.
This PR fixes that by adding a second temporary buffer, which is the "injection" buffer. That allows arbitrary number of nestings with a simple modification to the original algorithm:
fstring_argsfstring_argsis inserted into the current position ininject_chrs(which is the start of that buffer if no injection is ongoing)fstring_argsis now cleared and ready for any further f-strings (nested or not)inject_chrsif it's not already reading from itinject_chrsand can be processed as before, extracting its arguments intofstring_args, which can then be inserted again intoinject_chrsinject_chrsis exhausted (meaning that all levels of f-strings have been fully processed) the lexer switched back to tokenizing the streamAmazingly, this scheme supports arbitrary numbers of nestings of f-strings using the same quote style.
Testing
A new test is added which will run under CI.
Trade-offs and Alternatives
This adds some code size and a bit more memory usage for the lexer. In particular for a single (non-nested) f-string it now makes an extra copy of the
fstring_argsdata, when copying it across toinject_chrs. That could possibly be optimized to reuse the same buffer (inject_chrswould steal the memory fromfstring_args).Otherwise, memory use only goes up with the complexity of nested f-strings.