Fix string literals in other encodings which should be utf-8 encoding#6837
Merged
enebo merged 3 commits intojruby:masterfrom Sep 15, 2021
Merged
Fix string literals in other encodings which should be utf-8 encoding#6837enebo merged 3 commits intojruby:masterfrom
enebo merged 3 commits intojruby:masterfrom
Conversation
We have the test of time just created a new string using the lexers encoding. This worked until I fixed the last known issues in the parser reporting errors with mixed strings (like binary data + UTF-8 escapes). This is not a valid string and the parser should error out. Unfortunately, by setting the lex encoding right away we are not in a position of knowing whether the string should just have that encoding or whether we are in a new string where we have not yet figured out what encoding it should be. Case in point. jruby#6832. It is evaling as a Window-31J string BUT the literal string is only utf-8 escapes. It should resolve as being UTF-8 yet it errors because it sees the escapes and then errors out thinking it is already a windows-31J string. The solution is to emulate how MRI does this by indicating that we have not yet determined the encoding for the string and then if nothing special happens during the processing we just set it to the lex encoding.
Member
Author
|
All failures seem to be related to picking up failures on master from openstruct. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We have the test of time just created a new string using the lexers
encoding. This worked until I fixed the last known issues in the
parser reporting errors with mixed strings (like binary data + UTF-8
escapes). This is not a valid string and the parser should error out.
Unfortunately, by setting the lex encoding right away we are not in a
position of knowing whether the string should just have that encoding
or whether we are in a new string where we have not yet figured out
what encoding it should be.
Case in point. #6832. It is evaling as a Window-31J string BUT the
literal string is only utf-8 escapes. It should resolve as being
UTF-8 yet it errors because it sees the escapes and then errors out
thinking it is already a windows-31J string.
The solution is to emulate how MRI does this by indicating that we
have not yet determined the encoding for the string and then if nothing
special happens during the processing we just set it to the lex encoding.