Fix string literals in other encodings which should be utf-8 encoding by enebo · Pull Request #6837 · jruby/jruby

enebo · 2021-09-15T19:37:02Z

We have the test of time just created a new string using the lexers
encoding. This worked until I fixed the last known issues in the
parser reporting errors with mixed strings (like binary data + UTF-8
escapes). This is not a valid string and the parser should error out.

Unfortunately, by setting the lex encoding right away we are not in a
position of knowing whether the string should just have that encoding
or whether we are in a new string where we have not yet figured out
what encoding it should be.

Case in point. #6832. It is evaling as a Window-31J string BUT the
literal string is only utf-8 escapes. It should resolve as being
UTF-8 yet it errors because it sees the escapes and then errors out
thinking it is already a windows-31J string.

The solution is to emulate how MRI does this by indicating that we
have not yet determined the encoding for the string and then if nothing
special happens during the processing we just set it to the lex encoding.

We have the test of time just created a new string using the lexers encoding. This worked until I fixed the last known issues in the parser reporting errors with mixed strings (like binary data + UTF-8 escapes). This is not a valid string and the parser should error out. Unfortunately, by setting the lex encoding right away we are not in a position of knowing whether the string should just have that encoding or whether we are in a new string where we have not yet figured out what encoding it should be. Case in point. jruby#6832. It is evaling as a Window-31J string BUT the literal string is only utf-8 escapes. It should resolve as being UTF-8 yet it errors because it sees the escapes and then errors out thinking it is already a windows-31J string. The solution is to emulate how MRI does this by indicating that we have not yet determined the encoding for the string and then if nothing special happens during the processing we just set it to the lex encoding.

enebo · 2021-09-15T23:16:51Z

All failures seem to be related to picking up failures on master from openstruct.

enebo added this to the JRuby 9.3.0.0 milestone Sep 15, 2021

enebo added 2 commits September 15, 2021 16:14

heredocs need same fix

7906c79

This is messing up Java 9+ somehow

9bd9fdb

enebo merged commit 806f6cb into jruby:master Sep 15, 2021

enebo deleted the fix_6832 branch September 15, 2021 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix string literals in other encodings which should be utf-8 encoding#6837

Fix string literals in other encodings which should be utf-8 encoding#6837
enebo merged 3 commits intojruby:masterfrom
enebo:fix_6832

enebo commented Sep 15, 2021

Uh oh!

enebo commented Sep 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

enebo commented Sep 15, 2021

Uh oh!

enebo commented Sep 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant