Skip to content

String#scan raises java.lang.ArrayIndexOutOfBoundsException with multi-byte characters #5513

@pocke

Description

@pocke

Problem

ruby -e '"aaaaaaaaaa".scan("あああ")' raises an error in JRuby.

Environment

JRuby version

I can reproduce this error with JRuby 9.0.1.0, 9.2.5.0 and HEAD of master branch (d03c357).

$ ruby -v
jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 OpenJDK 64-Bit Server VM 25.192-b26 on 1.8.0_192-b26 +jit [linux-amd64]

$ ruby -v
jruby 9.2.5.0 (2.5.0) 2018-12-06 6d5a228 OpenJDK 64-Bit Server VM 25.192-b26 on 1.8.0_192-b26 +jit [linux-x86_64]

$ bin/jruby -v
jruby 9.2.6.0-SNAPSHOT (2.5.3) 2018-12-13 d03c357 OpenJDK 64-Bit Server VM 25.192-b26 on 1.8.0_192-b26 +jit [linux-x86_64]

I cannot reproduce with JRuby 1.7.27.
I also tried it with JRuby 9.0.0.0, but installing it was failed, so I'm not sure v9.0.0.0 has this error.

Operating system

Arch Linux

$ uname -a
Linux jigglypuff 4.19.8-arch1-1-ARCH #1 SMP PREEMPT Sat Dec 8 13:49:11 UTC 2018 x86_64 GNU/Linux

Expected Behavior

Do not raise any errors.

Actual Behavior

It raises java.lang.ArrayIndexOutOfBoundsException.

$ ruby -e 'p "aaaaaaaaaa".scan("あああ")'
Unhandled Java exception: java.lang.ArrayIndexOutOfBoundsException: -1342547898
java.lang.ArrayIndexOutOfBoundsException: -1342547898
  rb_memsearch_qs_utf8 at org/jruby/util/StringSupport.java:2503
             memsearch at org/jruby/util/StringSupport.java:2048
           strseqIndex at org/jruby/RubyString.java:3275
         patternSearch at org/jruby/RubyString.java:4402
              scanOnce at org/jruby/RubyString.java:4362
                  scan at org/jruby/RubyString.java:4330
                  call at org/jruby/RubyString$INVOKER$i$1$0$scan.gen:-1
                  call at org/jruby/internal/runtime/methods/JavaMethod.java:399
          cacheAndCall at org/jruby/runtime/callsite/CachingCallSite.java:346
                  call at org/jruby/runtime/callsite/CachingCallSite.java:172
     invokeOther2:scan at -e:1
                <main> at -e:1
   invokeWithArguments at java/lang/invoke/MethodHandle.java:627
                  load at org/jruby/ir/Compiler.java:94
             runScript at org/jruby/Ruby.java:849
           runNormally at org/jruby/Ruby.java:772
           runNormally at org/jruby/Ruby.java:790
           runFromMain at org/jruby/Ruby.java:602
         doRunFromMain at org/jruby/Main.java:415
           internalRun at org/jruby/Main.java:307
                   run at org/jruby/Main.java:234
                  main at org/jruby/Main.java:206

Note

  • If the receiver is shorter than the example, it works.
    • e.g. ruby -e '"aaaa".scan("あああ")' does not raise error.
  • If the argument string size is not 3, it does not raise error.
    • e.g. ruby -e '"aaaaaaaaaa".scan("ああ")' and ruby -e '"aaaaaaaaaa".scan("ああああ")' do not raise error.

An example in the real world

I occurred this error in natto gem's test. https://github.com/buruzaemon/natto
The test cases are failed in JRuby 9.2.5.0.

# In natto gem directory
$ bundle exec rake
/home/pocke/.rbenv/versions/jruby-9.2.5.0/bin/jruby  test/test_natto.rb 
Run options: --seed 45048

# Running:

[INFO] setup: could not delete test.dic, you might want to remove manually.
reading /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/test_userdic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
.[INFO] setup: could not delete test.dic, you might want to remove manually.
reading /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/test_userdic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
.[INFO] setup: could not delete test.dic, you might want to remove manually.
reading /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/test_userdic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
.[INFO] setup: could not delete test.dic, you might want to remove manually.
reading /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/test_userdic.csv ... 1
emitting double-array: 100% |###########################################| 

done!
.......:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
........E...E.......:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
:lattice-level is DEPRECATED, please use :marginal or :nbest
..............

Finished in 1.511805s, 29.1043 runs/s, 398.8610 assertions/s.

  1) Error:
TestMeCab#test_parse_tostr_feature_constraints:
Natto::MeCabError: 
    /home/pocke/ghq/github.com/buruzaemon/natto/lib/natto/natto.rb:339:in `block in initialize'
    /home/pocke/ghq/github.com/buruzaemon/natto/lib/natto/natto.rb:479:in `parse'
    /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/tc_mecab.rb:476:in `test_parse_tostr_feature_constraints'

  2) Error:
TestMeCab#test_parse_tonodes_feature_constraints:
Natto::MeCabError: 
    /home/pocke/ghq/github.com/buruzaemon/natto/lib/natto/natto.rb:420:in `block in initialize'
    org/jruby/RubyGenerator.java:102:in `each'
    org/jruby/RubyEnumerator.java:326:in `each'
    org/jruby/RubyEnumerator.java:332:in `each'
    /home/pocke/ghq/github.com/buruzaemon/natto/test/natto/tc_mecab.rb:645:in `test_parse_tonodes_feature_constraints'

44 runs, 603 assertions, 0 failures, 2 errors, 0 skips
rake aborted!
Command failed with status (1): [/home/pocke/.rbenv/versions/jruby-9.2.5.0/...]
/home/pocke/ghq/github.com/buruzaemon/natto/Rakefile:11:in `block in <main>'
/home/pocke/.rbenv/versions/jruby-9.2.5.0/bin/bundle:23:in `<main>'
Tasks: TOP => default => test
(See full trace by running task with --trace)

It calls ruby -e '"心の中で3回唱え、 ヒーロー見参!ヒーロー見参!ヒーロー見参!".scan("ヒーロー見参")' in the test, and it's failed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions