-
-
Notifications
You must be signed in to change notification settings - Fork 942
Closed
Labels
Milestone
Description
The UTF-32 encoding seems to be broken when converting to UTF-8. The input doesn't seem to matter as long as it is not an empty string.
I have jruby-head from today:
tobi@tobi-desktop ~/github/lotus_components/utils $ ruby -v
jruby 9.0.0.0-SNAPSHOT (2.2.0p0) 2015-02-08 cc00fd4 OpenJDK 64-Bit Server VM 24.75-b04 on 1.7.0_75-b13 +jit [linux-amd6
The problem seems to be that as soon as there is a character in a string and it is converted to UTF-32 it then throws an error when converting to UTF_8
jruby-head:
jruby-head :013 > "a".encode("UTF-32")
=> "\uFEFFa"
jruby-head :014 > "a".encode("UTF-32").encode(Encoding::UTF_8)
Encoding::InvalidByteSequenceError: "\x00\x00\xFE\xFF" on UTF-32
from org/jruby/RubyString.java:5671:in `encode'
from (irb):14:in `evaluate'
from org/jruby/RubyKernel.java:1000:in `eval'
from org/jruby/RubyKernel.java:1310:in `loop'
from org/jruby/RubyKernel.java:1120:in `catch'
from org/jruby/RubyKernel.java:1120:in `catch'
from /home/tobi/.rvm/rubies/jruby-head/bin/irb:13:in `__script__'
2.2:
2.2.0 :016 > "a".encode("UTF-32")
=> "\uFEFFa"
2.2.0 :017 > "a".encode("UTF-32").encode(Encoding::UTF_8)
=> "a"
Discovered on lotus utils
Tobi
Reactions are currently unavailable