Skip to content

Strange encoding differences with UTF-8 strings #1474

@jcoyne

Description

@jcoyne
irb(main):010:0* test = "/fixtures/公共"
=> "/fixtures/公共"
irb(main):011:0> Regexp.escape(test).encoding
=> #<Encoding:UTF-8>
irb(main):013:0> test = "/fixtures/"
=> "/fixtures/"
irb(main):014:0> Regexp.escape("#{test}/公共").encoding
=> #<Encoding:US-ASCII>

I'm not sure why one is Encoding::UTF-8 and the other is Encoding:US-ASCII. I suspect this is causing a failure in this rails test: https://github.com/rails/rails/blob/35e56f6fa535288abf1de7fa70c2faed5e2d88ff/actionpack/test/dispatch/static_test.rb#L164

ArgumentError: regexp preprocess failed: invalid multibyte character
    from /Users/justin/workspace/rails/actionpack/lib/action_dispatch/middleware/static.rb:9:in `initialize'
    from /Users/justin/workspace/rails/actionpack/lib/action_dispatch/middleware/static.rb:52:in `initialize'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions