Skip to content

Encoding of symbol literals does not respect the encoding of the source file #1328

@DavidEGrayson

Description

@DavidEGrayson

Hello. JRuby does not seem to respect encoding marking at the top of source files when deciding the encoding of symbol literals. In a file marked as UTF-8, all symbols get marked as US-ASCII even if they contain special characters. Here is the code that reproduces the issue:

# coding: UTF-8
puts .encoding  # jruby says US-ASCII, MRI says UTF-8
puts :a.encoding  # jruby and MRI both say US-ASCII

It is dangerous to have an symbol like µ with its encoding set to US-ASCII because there are many common things you might do, like calling inspect, that result in "ArgumentError: invalid byte sequence in US-ASCII".

One workaround is to use a string literal followed by .to_sym.

Here is the output of my jruby -v:

jruby 1.7.9 (1.9.3p392) 2013-12-06 87b108a on Java HotSpot(TM) 64-Bit Server VM
1.7.0_07-b10 [Windows 8-amd64]

The MRI I compared this to is: "ruby 2.0.0p0 (2013-02-24) [x64-mingw32]".

Sorry if this is a duplicate; I did check every closed or open github issue tagged with "encoding" trying to avoid that.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions