Skip to content

"invalid byte sequence in UTF-8" after String#encode!; behavior differs from MRI (JRuby-1.7.13) #1900

@lenny

Description

@lenny
1.9.3-p385 :007 > "\xE6".encoding
 => #<Encoding:UTF-8> 
1.9.3-p385 :008 > "\xE6".valid_encoding?
 => false 
1.9.3-p385 :009 > "\xE6".encode!('UTF-8', 'BINARY', :invalid => :replace, :undef => :replace).valid_encoding?
 => true 
1.9.3-p385 :013 > "\xE6".encode!('UTF-8', 'BINARY', :invalid => :replace, :undef => :replace).match(/\S/)
 => #<MatchData "�"> 
jruby-1.7.13 :016 >  "\xE6".encoding
 => #<Encoding:UTF-8> 
jruby-1.7.13 :017 > "\xE6".valid_encoding?
 => false 
jruby-1.7.13 :018 > "\xE6".encode!('UTF-8', 'BINARY', :invalid => :replace, :undef => :replace).valid_encoding?
 => false 
jruby-1.7.13 :042 > "\xE6".encode!('UTF-8', 'BINARY', :invalid => :replace, :undef => :replace).match(/\S/)
ArgumentError: invalid byte sequence in UTF-8
    from org/jruby/RubyRegexp.java:1697:in `match'
    from org/jruby/RubyString.java:1734:in `match'
    from (irb):42:in `evaluate'
    from org/jruby/RubyKernel.java:1101:in `eval'
    from org/jruby/RubyKernel.java:1501:in `loop'
    from org/jruby/RubyKernel.java:1264:in `catch'
    from org/jruby/RubyKernel.java:1264:in `catch'
    from /Users/Shared/lenny/.rvm/rubies/jruby-1.7.13/bin/jirb:13:in `(root)'
jruby-1.7.13 :043 > 

encode as opposed to #encode! works as expected

jruby-1.7.13 :044 > "\xE6".encode('UTF-8', 'BINARY', :invalid => :replace, :undef => :replace).valid_encoding?
 => true 
jruby-1.7.13 :045 > "\xE6".encode('UTF-8', 'BINARY', :invalid => :replace, :undef => :replace).match(/\S/)
 => #<MatchData "�"> 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions