-
-
Notifications
You must be signed in to change notification settings - Fork 942
Description
Environment
$ bin/jruby -v
jruby 9.2.8.0-SNAPSHOT (2.5.3) 2019-04-23 1679826 Java HotSpot(TM) 64-Bit Server VM 25.131-b11 on 1.8.0_131-b11 +jit [darwin-x86_64]
Expected Behavior
Splitting an encoded string with a null byte delimiter should returns the expected array of strings.
Example script (test.rb):
str1 = "AA\0BB\0CC".encode('utf-16le')
str2 = "\0".encode('utf-16le')
array = str1.split(str2)
puts array.inspect
Expected result (CRuby):
$ ruby test.rb
["AA", "BB", "CC"]
Actual Behavior
JRuby does not properly split the string:
$ jruby test.rb
["", "", "CC"]
The issue is in indexOf() method from RubyString.java (https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/RubyString.java#L4258). This method looks for the index of a specified substring (or character) in a byte array without considering the real size of the encoded characters.
In this example, the byte array related to str1 is:
byte_array => [65, 0, 65, 0, 0, 0, 66, 0, 66, 0, 0, 0, 67, 0, 67, 0]
and the delimiter character (str2) is:
delim => [0, 0]
The first time it is called, indexOf() will match byte_array[3] and byte_array[4] instead of matching byte_array[4] and byte_array[5] and returning 4.