-
-
Notifications
You must be signed in to change notification settings - Fork 942
Description
While investigating fixes for #8682 I discovered that rbByteEncode does not properly handle the case of source and destination encoding being the same.
As described in #8682 (comment):
IOOutputStream calls RubyIO.write and passes only the incoming bytes, offsets, and encoding:
jruby/core/src/main/java/org/jruby/util/IOOutputStream.java
Lines 152 to 163 in 14ec399
| @Override | |
| public void write(final byte[] b,final int off, final int len) throws IOException { | |
| ThreadContext context = runtime.getCurrentContext(); | |
| RubyIO realIO = this.realIO; | |
| if (realIO != null) { | |
| realIO.write(context, b, off, len, encoding); | |
| } else { | |
| IRubyObject io = this.io; | |
| writeAdapter.call(context, io, io, RubyString.newStringLight(runtime, new ByteList(b, off, len, encoding, false))); | |
| } | |
| } |
This path was added as an optimization in 236f7ba to avoid constructing string objects just to immediately unwrap them for IO writes.
Unfortunately it hits a bug eventually in EncodingUtils.rbByteEncode where two identical encodings will not no-op, but instead will trigger an encoding error because it rejects encoding from and to the same encoding.
The short-circuit checks here:
jruby/core/src/main/java/org/jruby/util/io/EncodingUtils.java
Lines 983 to 989 in 844c151
| if (encoding.isAsciiCompatible() && to.isAsciiCompatible()) { | |
| if (cr == StringSupport.CR_7BIT) { | |
| return null; | |
| } | |
| } else if (encodingEqual(sname, dname)) { | |
| return null; | |
| } |
...were never updated when the sister logic for RubyString was updated in 621369c. The correct logic, with additional refactoring, looks like this:
jruby/core/src/main/java/org/jruby/util/io/EncodingUtils.java
Lines 1094 to 1101 in 844c151
| if (senc != null && senc == denc) { | |
| return strTranscodeScrub(context, forceEncoding, str, ecflags, ecopts, result, explicitlyInvalidReplace, denc, senc); | |
| } else if (is7BitCompat(str, denc, senc)) { | |
| return result.apply(context, str, denc, str); | |
| } else if (encodingEqual(sname, dname)) { | |
| if (forceEncoding.isNil()) denc = null; | |
| return result.apply(context, str, denc, str); | |
| } |
Ignoring the problems with IOOutputStream only being able to specify a single encoding for all String writes, we need to fix this issue in rbByteEncode and ensure same-encoding calls no-op the same way as rbStrEncode.