Skip to content

Less eager coderange when converting Java String#8711

Merged
headius merged 1 commit intojruby:masterfrom
headius:java_strings_unknown_cr
Mar 24, 2025
Merged

Less eager coderange when converting Java String#8711
headius merged 1 commit intojruby:masterfrom
headius:java_strings_unknown_cr

Conversation

@headius
Copy link
Member

@headius headius commented Mar 24, 2025

By going to VALID, we force all Java strings produced by this function to do full Unicode validation when they are manipulated or written to another string. This was the root cause of #8710, when an already-broken StringIO saw an incoming VALID string and followed full MBC verification logic.

This patch assumes that if the resulting byte[] string has the same number of bytes as the original string had chars, and the destination encoding is 7-bit ASCII compatible, then the code range can be 7-BIT, avoiding extra encoding logic for such strings.

This feels like a bit of a hack; CRuby fails the cases in #8710 exactly like JRuby if an MBC element is in the backtrace, meaning that it was just "lucky" that the test case from rake produced only 7-bit backtrace elements. There's more discussion needed here, because I'm not sure this test is actually testing anything reliable.

By going to VALID, we force all Java strings produced by this
function to do full Unicode validation when they are manipulated
or written to another string. This was the root cause of
jruby#8710, when an already-broken StringIO saw an incoming
VALID string and followed full MBC verification logic.

This patch assumes that if the resulting byte[] string has the
same number of bytes as the original string had chars, and the
destination encoding is 7-bit ASCII compatible, then the code range
can be 7-BIT, avoiding extra encoding logic for such strings.

This feels like a bit of a hack; CRuby fails the cases in
jruby#8710 exactly like JRuby if an MBC element is in the
backtrace, meaning that it was just "lucky" that the test case
from rake produced only 7-bit backtrace elements. There's more
discussion needed here, because I'm not sure this test is actually
testing anything reliable.
@headius
Copy link
Member Author

headius commented Mar 24, 2025

@enebo I'm going to merge this to get a new 10 build out there, but please review and discuss with me when you get a chance.

@headius headius merged commit a304916 into jruby:master Mar 24, 2025
55 of 72 checks passed
@headius headius deleted the java_strings_unknown_cr branch March 24, 2025 19:45
@headius headius linked an issue Mar 24, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Encoding error with bad raise message in StringIO

1 participant