Skip to content

Unmarshaling regexp loses UTF-8 encoding #4008

@jczyzewski

Description

@jczyzewski

Environment

  • JRuby version: 9.1.2.0
  • Observed on Mac OS X Yosemite, Ubuntu

Expected Behavior

When marshalling and unmarshalling an UTF-8 regexp, I expect it to preserve its UTF-8 encoding

Actual Behavior

It looks like Marshal.load loses the encoding and turns the regexp into ASCII-8BIT:

r = /\brapidísimo\b/i # => /\brapidísimo\b/i
p r.encoding # => #<Encoding:UTF-8>
p m = Marshal.dump(r) # => "\x04\bI/\x14\\brapid\xC3\xADsimo\\b\x11\x06:\x06ET"
p Marshal.load(m).encoding # => #<Encoding:ASCII-8BIT>

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions