Default to universal newline conversion on Windows#9067
Open
headius wants to merge 1 commit intojruby:masterfrom
Open
Default to universal newline conversion on Windows#9067headius wants to merge 1 commit intojruby:masterfrom
headius wants to merge 1 commit intojruby:masterfrom
Conversation
This was referenced Nov 8, 2025
Open
Merged
Universal newline conversion is a transcoding mode used when the
conversion of newlines from one encoding to another is desired
during the IO process. It also handles Windows newlines ("\r\n"),
and was originally the mechanism by which CRuby handled newline
normalization on Windows.
See ruby/ruby@8761467#diff-686754e19b3c08fbc0880fade77986fed2c09fdd27dcc163fc68e0a7e22b7913R319
Later, CRuby was modified to leverage the O_TEXT mode on Windows,
which does OS-level normalization of newlines, and to indicate
this new watered-down default they switched it to a non-universal
CRLF_NEWLINE mode. Logic later in the transcoding creation and
read conversion processes would check whether NEED_READCONV and
then use O_TEXT mode instead of a transcoder.
See ruby/ruby@f9a6a1d#diff-686754e19b3c08fbc0880fade77986fed2c09fdd27dcc163fc68e0a7e22b7913R318-R321
JRuby's ported logic largely matches this, except we have no way
to specify O_TEXT when opening files with the JDK, so that mode
ends up getting ignored and we don't actually do the newline
conversions.
A short-term fix, which is really what we should have done years
ago, is to switch the default windows transcoding flag from the
watered down CRLF_NEWLINE mode back to the UNIVERSAL_NEWLINE mode,
forcing the use of the transcoder.
Future improvements to this code could restore the optimized logic
if O_TEXT becomes available to us.
This should fix a number of newline conversion issues, but may have
other unexpected consequences.
1c611f9 to
2058b6d
Compare
Member
Author
|
This is obviously still not right. Punting to 10.0.4.0 for more investigation. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Universal newline conversion is a transcoding mode used when the conversion of newlines from one encoding to another is desired during the IO process. It also handles Windows newlines ("\r\n"), and was originally the mechanism by which CRuby handled newline normalization on Windows.
See ruby/ruby@8761467#diff-686754e19b3c08fbc0880fade77986fed2c09fdd27dcc163fc68e0a7e22b7913R319
Later, CRuby was modified to leverage the O_TEXT mode on Windows, which does OS-level normalization of newlines, and to indicate this new watered-down default they switched it to a non-universal CRLF_NEWLINE mode. Logic later in the transcoding creation and read conversion processes would check whether NEED_READCONV and then use O_TEXT mode instead of a transcoder.
See ruby/ruby@f9a6a1d#diff-686754e19b3c08fbc0880fade77986fed2c09fdd27dcc163fc68e0a7e22b7913R318-R321
JRuby's ported logic largely matches this, except we have no way to specify O_TEXT when opening files with the JDK, so that mode ends up getting ignored and we don't actually do the newline conversions.
A short-term fix, which is really what we should have done years ago, is to switch the default windows transcoding flag from the watered down CRLF_NEWLINE mode back to the UNIVERSAL_NEWLINE mode, forcing the use of the transcoder.
Future improvements to this code could restore the optimized logic if O_TEXT becomes available to us.
This should fix a number of newline conversion issues, but may have other unexpected consequences.