Skip to content

Hash has bug to set encoding into key string wrongly when the key string is used once with different encoding #3405

@tagomoris

Description

@tagomoris

On JRuby 9.0.1.0, Hash sets wrong encoding only when the key string is used as Hash key with different encoding.

hash1 = {}
hash1['str'.force_encoding('ASCII-8BIT')] = 1
p hash1.keys.first.encoding # ASCII-8BIT

hash2 = {}
hash2['str'.force_encoding('UTF-8')] = 1
p hash2.keys.first.encoding # ASCII-8BIT !? (expected: UTF-8)

Script to show situations to reproduce bugs:

# encoding: ascii-8bit

str = 'hello'
obj1 = {'hello' => 1}
obj2 = {str => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'hello'.force_encoding('UTF-8')
obj1 = {'hello'.force_encoding('UTF-8') => 1}
obj2 = {str => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'hello'.force_encoding('UTF-8')
obj1 = {}
obj1[str] = 1
obj2 = {}
obj2[str] = 2
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'world'.force_encoding('UTF-8')
obj1 = {'world'.force_encoding('UTF-8') => 1}
obj2 = {str => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'unused'
obj1 = {'unused'.force_encoding('UTF-8') => 1}
obj2 = {str.force_encoding('UTF-8') => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

Expected result is:

  • First line: all keys are encoded in ASCII-8BIT
  • Rest: all keys are encoded in UTF-8

Actual result in JRuby 9.0.1.0 is:

  • First line: all keys are encoded in ASCII-8BIT
  • All hello as hash keys are encoded in ASCII-8BIT

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions