-
-
Notifications
You must be signed in to change notification settings - Fork 942
Closed
Milestone
Description
JRuby should support matching "grapheme clusters" (glyphs), which are constructed using mutliple Unicode codepoints.
Expected Behavior (MRI)
glyphs = "\u{61 308 62}".scan(/\X/) # => ["ä", "b"]`
glyphs.map{ |e| e.codepoints.map{ |f| f.to_s(16) } } #=> [["61", "308"], ["62"]]Actual Behavior (JRuby)
glyphs = "\u{61 308 62}".scan(/\X/) # => ["a", "b"]`
glyphs.map{ |e| e.codepoints.map{ |f| f.to_s(16) } } #=> [["61"], ["62"]]Related Links
- Unicode® Standard Annex Patch for String#encode(opts) [JRUBY-5633] #29: Unicode Text Segmentation: http://unicode.org/reports/tr29/
- Introduction of
\Xto Onigmo: k-takata/Onigmo@dde0c43- Non-Unicode should match:
(?>\x0D\x0A|(?m:.))
- Non-Unicode should match:
- Seems to only work properly in MRI 2.4: Remove
AS::Multibyte's unicode table rails/rails#26743 - Related Ruby issue: https://bugs.ruby-lang.org/issues/12831
- Example graphemes: https://github.com/janlelis/uniscribe/blob/master/spec/uniscribe_spec.rb
Reactions are currently unavailable