Skip to content

Psych fails with MBC strings in ASCII-8BIT #2901

@nirvdrum

Description

@nirvdrum

Generally, the JRuby version of psych can handle MBC strings. However, if the encoding is ASCII-8BIT, as it would be by default when reading from a socket, JRuby psych is no longer able to parse the YAML.

Simple example:

MRI:

> ruby -v -e 'require "yaml"; p YAML.load("nokogiri: 鋸".force_encoding("ASCII-8BIT"))'
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
{"nokogiri"=>"鋸"}

JRuby:

> bin/jruby -v -e 'require "yaml"; p YAML.load("nokogiri: 鋸".force_encoding("ASCII-8BIT"))'
jruby 9.0.0.0-SNAPSHOT (2.2.2) 2015-04-30 d34f7e9 Java HotSpot(TM) 64-Bit Server VM 25.45-b02 on 1.8.0_45-b14 +jit [linux-amd64]
Psych::SyntaxError: (<unknown>): 'reader' unacceptable character '�' (0x8B) special characters are not allowed
in "'reader'", position 11 at line 0 column 0
         parse at org/jruby/ext/psych/PsychParser.java:219
  parse_stream at /home/nirvdrum/dev/workspaces/jruby/lib/ruby/stdlib/psych.rb:376
         parse at /home/nirvdrum/dev/workspaces/jruby/lib/ruby/stdlib/psych.rb:324
          load at /home/nirvdrum/dev/workspaces/jruby/lib/ruby/stdlib/psych.rb:251
         <top> at -e:1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions