Skip to content

Fix string literals in other encodings which should be utf-8 encoding#6837

Merged
enebo merged 3 commits intojruby:masterfrom
enebo:fix_6832
Sep 15, 2021
Merged

Fix string literals in other encodings which should be utf-8 encoding#6837
enebo merged 3 commits intojruby:masterfrom
enebo:fix_6832

Conversation

@enebo
Copy link
Member

@enebo enebo commented Sep 15, 2021

We have the test of time just created a new string using the lexers
encoding. This worked until I fixed the last known issues in the
parser reporting errors with mixed strings (like binary data + UTF-8
escapes). This is not a valid string and the parser should error out.

Unfortunately, by setting the lex encoding right away we are not in a
position of knowing whether the string should just have that encoding
or whether we are in a new string where we have not yet figured out
what encoding it should be.

Case in point. #6832. It is evaling as a Window-31J string BUT the
literal string is only utf-8 escapes. It should resolve as being
UTF-8 yet it errors because it sees the escapes and then errors out
thinking it is already a windows-31J string.

The solution is to emulate how MRI does this by indicating that we
have not yet determined the encoding for the string and then if nothing
special happens during the processing we just set it to the lex encoding.

 We have the test of time just created a new string using the lexers
 encoding.  This worked until I fixed the last known issues in the
 parser reporting errors with mixed strings (like binary data + UTF-8
 escapes).  This is not a valid string and the parser should error out.

 Unfortunately, by setting the lex encoding right away we are not in a
 position of knowing whether the string should just have that encoding
 or whether we are in a new string where we have not yet figured out
 what encoding it should be.

 Case in point.  jruby#6832.  It is evaling as a Window-31J string BUT the
 literal string is only utf-8 escapes.  It should resolve as being
 UTF-8 yet it errors because it sees the escapes and then errors out
 thinking it is already a windows-31J string.

 The solution is to emulate how MRI does this by indicating that we
 have not yet determined the encoding for the string and then if nothing
 special happens during the processing we just set it to the lex encoding.
@enebo enebo added this to the JRuby 9.3.0.0 milestone Sep 15, 2021
@enebo
Copy link
Member Author

enebo commented Sep 15, 2021

All failures seem to be related to picking up failures on master from openstruct.

@enebo enebo merged commit 806f6cb into jruby:master Sep 15, 2021
@enebo enebo deleted the fix_6832 branch September 15, 2021 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant