Skip to content

Error handling converting UTF-32 to UTF-8 is broken [9k] [lotus] #2581

@PragTob

Description

@PragTob

The UTF-32 encoding seems to be broken when converting to UTF-8. The input doesn't seem to matter as long as it is not an empty string.

I have jruby-head from today:

tobi@tobi-desktop ~/github/lotus_components/utils $ ruby -v
jruby 9.0.0.0-SNAPSHOT (2.2.0p0) 2015-02-08 cc00fd4 OpenJDK 64-Bit Server VM 24.75-b04 on 1.7.0_75-b13 +jit [linux-amd6

The problem seems to be that as soon as there is a character in a string and it is converted to UTF-32 it then throws an error when converting to UTF_8

jruby-head:

jruby-head :013 > "a".encode("UTF-32")
 => "\uFEFFa" 
jruby-head :014 > "a".encode("UTF-32").encode(Encoding::UTF_8)
Encoding::InvalidByteSequenceError: "\x00\x00\xFE\xFF" on UTF-32
    from org/jruby/RubyString.java:5671:in `encode'
    from (irb):14:in `evaluate'
    from org/jruby/RubyKernel.java:1000:in `eval'
    from org/jruby/RubyKernel.java:1310:in `loop'
    from org/jruby/RubyKernel.java:1120:in `catch'
    from org/jruby/RubyKernel.java:1120:in `catch'
    from /home/tobi/.rvm/rubies/jruby-head/bin/irb:13:in `__script__'

2.2:

2.2.0 :016 > "a".encode("UTF-32")
 => "\uFEFFa" 
2.2.0 :017 > "a".encode("UTF-32").encode(Encoding::UTF_8)
 => "a"

Discovered on lotus utils

Tobi

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions