Skip to content

Different Encoding behavior from all other Rubies #2580

@bf4

Description

@bf4

See rspec/rspec-support#172 (comment) for origin of this issue.

In brief, on all other Rubies, "\x80".force_encoding("US-ASCII").chars.map{|char| char.valid_encoding? ? char : "?" }.join.encode("UTF-8") returns ?, but JRUBY returns \x80

LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
   rvm ruby-1.9.2-p330,ruby-1.9.3-p551,ruby-2.0.0-p598,ruby-2.1.5,ruby-2.2.0,jruby-1.7.18,rbx-2.2.2 do \
   ruby -e 'p [RUBY_VERSION, RUBY_ENGINE, Encoding.default_external, __ENCODING__] << "\x80".force_encoding("US-ASCII").chars.map{|char| char.valid_encoding? ? char : "?" }.join.encode("UTF-8")'

 for version in 1.9 2.0; do \
   export JRUBY_OPTS="-Xcompat.version=${version}" ; \
   LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 \
   rvm jruby-1.7.18 do \
   ruby -e 'p [RUBY_VERSION, RUBY_ENGINE, Encoding.default_external, __ENCODING__] << "\x80".force_encoding("US-ASCII").chars.map{|char| char.valid_encoding? ? char : "?" }.join.encode("UTF-8")';
 done
["1.9.2", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["1.9.3", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["2.0.0", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["2.1.5", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["2.2.0", "ruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["1.9.3", "jruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "\x80"]
["2.1.0", "rbx", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "?"]
["1.9.3", "jruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "\x80"]
["2.0.0", "jruby", #<Encoding:UTF-8>, #<Encoding:UTF-8>, "\x80"]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions