Skip to content

Error when matching regex in multiple threads #3670

@marshalium

Description

@marshalium

1.7.24 and 9.0.5.0 both have a pretty serious regression when matching against utf8 strings in multiple threads at the same time. This bug does not appear to be present in 1.7.23 or 9.0.4.0.

If multiple threads are running code like this, with completely unshared string and regex objects:

str = "foobar"
str.force_encoding("UTF-8")
str.gsub(/foo/i, '')

eventually one of the threads will throw an error like this:

Exception in thread "Ruby-0-Thread-6: ./recreate_utf8_bug.rb:3" java.lang.ArrayIndexOutOfBoundsException: 6
  at org.jcodings.specific.BaseUTF8Encoding.mbcCaseFold(BaseUTF8Encoding.java:167)
  at org.jcodings.specific.UTF8Encoding.mbcCaseFold(UTF8Encoding.java:24)
  at org.joni.SearchAlgorithm$SLOW_IC.lowerCaseMatch(SearchAlgorithm.java:238)
  at org.joni.SearchAlgorithm$SLOW_IC.search(SearchAlgorithm.java:206)
  at org.joni.Matcher.forwardSearchRange(Matcher.java:140)
  at org.joni.Matcher.searchInterruptible(Matcher.java:451)
  at org.jruby.RubyRegexp$SearchMatchTask.run(RubyRegexp.java:273)
  at org.jruby.RubyThread.executeBlockingTask(RubyThread.java:1066)
  at org.jruby.RubyRegexp.matcherSearch(RubyRegexp.java:235)
  at org.jruby.RubyString.gsubCommon19(RubyString.java:3123)
  at org.jruby.RubyString.gsubCommon19(RubyString.java:3106)
  at org.jruby.RubyString.gsub19(RubyString.java:3101)
  at org.jruby.RubyString.gsub19(RubyString.java:3069)
  at org.jruby.RubyString$INVOKER$i$gsub19.call(RubyString$INVOKER$i$gsub19.gen)
  at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrTwoOrNBlock.call(JavaMethod.java:367)
  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
  at $_dot_.recreate_utf8_bug.block_2$RUBY$__file__(./recreate_utf8_bug.rb:13)
  at $_dot_$recreate_utf8_bug$block_2$RUBY$__file__.call($_dot_$recreate_utf8_bug$block_2$RUBY$__file__)
  at org.jruby.runtime.CompiledBlock19.yieldSpecificInternal(CompiledBlock19.java:117)
  at org.jruby.runtime.CompiledBlock19.yieldSpecific(CompiledBlock19.java:92)
  at org.jruby.runtime.Block.yieldSpecific(Block.java:111)
  at org.jruby.RubyFixnum.times(RubyFixnum.java:275)
  at org.jruby.RubyFixnum$INVOKER$i$0$0$times.call(RubyFixnum$INVOKER$i$0$0$times.gen)
  at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
  at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
  at $_dot_.recreate_utf8_bug.chained_0_rescue_1$RUBY$SYNTHETIC__file__(./recreate_utf8_bug.rb:10)
  at $_dot_.recreate_utf8_bug.block_1$RUBY$__file__(./recreate_utf8_bug.rb:9)
  at $_dot_$recreate_utf8_bug$block_1$RUBY$__file__.call($_dot_$recreate_utf8_bug$block_1$RUBY$__file__)
  at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:159)
  at org.jruby.runtime.CompiledBlock19.call(CompiledBlock19.java:87)
  at org.jruby.runtime.Block.call(Block.java:101)
  at org.jruby.RubyProc.call(RubyProc.java:300)
  at org.jruby.RubyProc.call(RubyProc.java:230)
  at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
  at java.lang.Thread.run(Thread.java:745)

The error doesn't occur if I remove the call to #force_encoding or if I remove the "i" flag from the regex.

See this gist for a full reproducible test case and examples of running it on different JRuby versions:
https://gist.github.com/marshalium/3e62c2affbd2ce95757f

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions