Encoding of symbol literals does not respect the encoding of the source file

Hello.  JRuby does not seem to respect encoding marking at the top of source files when deciding the encoding of symbol literals.  In a file marked as UTF-8, all symbols get marked as US-ASCII even if they contain special characters.  Here is the code that reproduces the issue:

``` ruby
# coding: UTF-8
puts :µ.encoding  # jruby says US-ASCII, MRI says UTF-8
puts :a.encoding  # jruby and MRI both say US-ASCII
```

It is dangerous to have an symbol like µ with its encoding set to US-ASCII because there are many common things you might do, like calling `inspect`, that result in "ArgumentError: invalid byte sequence in US-ASCII".

One workaround is to use a string literal followed by `.to_sym`.

Here is the output of my `jruby -v`:

```
jruby 1.7.9 (1.9.3p392) 2013-12-06 87b108a on Java HotSpot(TM) 64-Bit Server VM
1.7.0_07-b10 [Windows 8-amd64]
```

The MRI I compared this to is: "ruby 2.0.0p0 (2013-02-24) [x64-mingw32]".

Sorry if this is a duplicate; I did check every closed or open github issue tagged with "encoding" trying to avoid that.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Encoding of symbol literals does not respect the encoding of the source file #1328

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Encoding of symbol literals does not respect the encoding of the source file #1328

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions