-
-
Notifications
You must be signed in to change notification settings - Fork 942
Description
Hello. JRuby does not seem to respect encoding marking at the top of source files when deciding the encoding of symbol literals. In a file marked as UTF-8, all symbols get marked as US-ASCII even if they contain special characters. Here is the code that reproduces the issue:
# coding: UTF-8
puts :µ.encoding # jruby says US-ASCII, MRI says UTF-8
puts :a.encoding # jruby and MRI both say US-ASCIIIt is dangerous to have an symbol like µ with its encoding set to US-ASCII because there are many common things you might do, like calling inspect, that result in "ArgumentError: invalid byte sequence in US-ASCII".
One workaround is to use a string literal followed by .to_sym.
Here is the output of my jruby -v:
jruby 1.7.9 (1.9.3p392) 2013-12-06 87b108a on Java HotSpot(TM) 64-Bit Server VM
1.7.0_07-b10 [Windows 8-amd64]
The MRI I compared this to is: "ruby 2.0.0p0 (2013-02-24) [x64-mingw32]".
Sorry if this is a duplicate; I did check every closed or open github issue tagged with "encoding" trying to avoid that.