Skip to content

Simultaneous client reads / writes to the same socket from different threads can deadlock #4854

@camlow325

Description

@camlow325

Environment

Provide at least:

  • JRuby version (jruby -v) and command line (flags, JRUBY_OPTS, etc)
    9.1.13.0, 9.1.14.0, maybe others?

  • Operating system and platform (e.g. uname -a)
    Windows 10, x86_64

Expected Behavior

  • Describe your expectation of how JRuby should behave, perhaps by showing how CRuby/MRI behaves.

When a client interleaves blocking calls to read data and write data to the same TCP socket from different threads, the client is able to eventually complete the read and write calls with no errors. This is what I've seen with the script below with MRI Ruby 2.4.2, running the script indefinitely. On JRuby 9k, however, the process deadlocks after a few seconds, never seeming to recover.

  • Provide an executable Ruby script or a link to an example repository.

Here's the script I've used to reproduce this problem. I just ran jruby.exe .\socket_sample.rb to launch the script. The script opens a server socket and accepts a single connection from a client. The script performs interleaved read/write calls on the client side of the socket (with no synchronization). On the server side, reads and writes are performed as well, but all from a single thread.

require 'socket'

server = TCPServer.open(50000)

Thread.start(server) do |server|
  Thread.current.name = "ServerSocket"
  client = server.accept
  server_loops = 1
  while true
    begin
      client.read_nonblock(20000)
      server_write = "Server write #{server_loops}"
      puts(server_write)
      client.puts(server_write)
      server_loops += 1
      sleep(0.1)
    rescue IO::WaitReadable
      puts("Server: waiting for data to read...")
      IO.select([client])
    end
  end
end

client = TCPSocket.open('localhost', 50000)

Thread.start(client) do |client|
  Thread.current.name = "ClientReader"
  client_read_loops = 1
  while true
    data_read = client.gets
    puts("Client received #{client_read_loops}: #{data_read}")
    client_read_loops += 1
  end
end

Thread.current.name = "ClientWriter"
client_write_loops = 1
while true
  client_write = "Client write #{client_write_loops}"
  if client_write_loops % 10000 == 0
    puts(client_write)
  end
  client.puts(client_write)
  client_write_loops += 1
end

Actual Behavior

  • Describe or show the actual behavior.

Running with MRI Ruby 2.4.2, the script appears to run fine indefinitely, with messages like the following in the output:

Client received 901: Server write 901
Client write 950000
Server write 902
Client received 902: Server write 902
Server write 903
Client received 903: Server write 903

Running with JRuby 9.1.14.0, however, the client read/write threads appear to deadlock one another after only a few seconds of running. The console output looks like the following:

Client write 50000
Server write 9
Client received 9: Server write 9
Server write 10
Server write 11
Server write 12
...
Server write 54
Server: waiting for data to read...

Running jstack against the Java process for a JRuby 9.1.14.0 run after the 'waiting for data to read...' message above, I see the following:

Found one Java-level deadlock:
=============================
"Ruby-0-Thread-2@ClientReader: socket_sample.rb:1":
  waiting for ownable synchronizer 0x00000000fa52ca88, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "main"
"main":
  waiting to lock monitor 0x00000000034e9bf8 (object 0x00000000fa52c920, a java.lang.Object),
  which is held by "Ruby-0-Thread-2@ClientReader: socket_sample.rb:1"

Java stack information for the threads listed above:
===================================================
"Ruby-0-Thread-2@ClientReader: socket_sample.rb:1":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000fa52ca88> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at org.jruby.util.io.OpenFile.lock(OpenFile.java:2672)
        at org.jruby.util.io.OpenFile.removeBlockingThread(OpenFile.java:2606)
        - locked <0x00000000fa528df8> (a org.jruby.util.io.OpenFile)
        at org.jruby.RubyThread.select(RubyThread.java:1811)
        - locked <0x00000000fa52c920> (a java.lang.Object)
        at org.jruby.RubyThread.select(RubyThread.java:1701)
        at org.jruby.util.io.OpenFile.readInternal(OpenFile.java:1368)
        at org.jruby.util.io.OpenFile.fillbuf(OpenFile.java:1264)
        at org.jruby.util.io.OpenFile.getlineFast(OpenFile.java:1594)
        at org.jruby.RubyIO.getlineInner(RubyIO.java:725)
        at org.jruby.RubyIO.getline(RubyIO.java:695)
        at org.jruby.RubyIO.getline(RubyIO.java:683)
        at org.jruby.RubyIO.gets(RubyIO.java:2233)
        at org.jruby.RubyIO$INVOKER$i$gets.call(RubyIO$INVOKER$i$gets.gen)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:125)
        at C_3a_.socket_sample.invokeOther21:gets(C:\socket_sample.rb:30)
        at C_3a_.socket_sample.RUBY$block$\=C\-socket_sample\,rb$1(C:\socket_sample.rb:30)
        at java.lang.invoke.LambdaForm$DMH/468121027.invokeStatic_L8_L(LambdaForm$DMH)
        at java.lang.invoke.LambdaForm$MH/1537458459.invokeExact_MT(LambdaForm$MH)
        at org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:145)
        at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:71)
        at org.jruby.runtime.Block.call(Block.java:124)
        at org.jruby.RubyProc.call(RubyProc.java:289)
        at org.jruby.RubyProc.call(RubyProc.java:246)
        at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:104)
        at java.lang.Thread.run(Thread.java:748)
"main":
        at org.jruby.RubyThread.select(RubyThread.java:1757)
        - waiting to lock <0x00000000fa52c920> (a java.lang.Object)
        at org.jruby.util.io.OpenFile.ready(OpenFile.java:511)
        at org.jruby.util.io.OpenFile.waitWritable(OpenFile.java:452)
        at org.jruby.util.io.OpenFile.waitWritable(OpenFile.java:464)
        at org.jruby.util.io.OpenFile.binwrite(OpenFile.java:2200)
        at org.jruby.util.io.OpenFile.fwrite(OpenFile.java:2056)
        at org.jruby.RubyIO.write(RubyIO.java:1456)
        at org.jruby.RubyIO.write(RubyIO.java:1428)
        at org.jruby.RubyIO$INVOKER$i$1$0$write.call(RubyIO$INVOKER$i$1$0$write.gen)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:153)
        at org.jruby.RubyIO.write(RubyIO.java:2584)
        at org.jruby.RubyIO.putsSingle(RubyIO.java:2562)
        at org.jruby.RubyIO.puts1(RubyIO.java:2478)
        at org.jruby.RubyIO.puts(RubyIO.java:2451)
        at org.jruby.RubyIO$INVOKER$i$puts.call(RubyIO$INVOKER$i$puts.gen)
        at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:153)
        at C_3a_.Users.socket_sample.invokeOther47:puts(C:\socket_sample.rb:43)
        at C_3a_.Users.socket_sample.RUBY$script(C:\socket_sample.rb:43)
        at java.lang.invoke.LambdaForm$DMH/1450495309.invokeStatic_L7_L(LambdaForm$DMH)
        at java.lang.invoke.LambdaForm$BMH/242355057.reinvoke(LambdaForm$BMH)
        at java.lang.invoke.LambdaForm$MH/171497379.invoker(LambdaForm$MH)
        at java.lang.invoke.LambdaForm$MH/434176574.invokeExact_MT(LambdaForm$MH)
        at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:627)
        at org.jruby.ir.Compiler$1.load(Compiler.java:95)
        at org.jruby.Ruby.runScript(Ruby.java:828)
        at org.jruby.Ruby.runNormally(Ruby.java:747)
        at org.jruby.Ruby.runNormally(Ruby.java:765)
        at org.jruby.Ruby.runFromMain(Ruby.java:578)
        at org.jruby.Main.doRunFromMain(Main.java:417)
        at org.jruby.Main.internalRun(Main.java:305)
        at org.jruby.Main.run(Main.java:232)
        at org.jruby.Main.main(Main.java:204)

Found 1 deadlock.

The reader thread has taken a lock on the channel within the RubyThread.select method but is waiting to take a lock (which the writer holds) within the OpenFile.lock() method. The writer thread has taken the lock held by the OpenFile object but is waiting to take the lock on the channel (which the reader holds) within the RubyThread.select method.

I've seen this occur not just in cases where the writing thread encounters a waitWritable condition but also in cases where the writer thread is blocked on a call to the synchronized OpenFile.unread method, which cannot proceed because the reader thread is in the middle of executing the synchronized OpenFile.removeBlockingThread method. In this case, the reader thread is unable to obtain the reentrant lock within OpenFile, though, because the writer thread already holds it - leading to another form of deadlock. I haven't yet been able to get a more narrow code sample to reproduce this second issue as frequently, though.

If a mutex is synchronized around the client read and write calls, the deadlocks no longer occur (even for JRuby):

require 'thread'
...
socket_mutex = Mutex.new
...
  socket_mutex.synchronize do
    data_read = client.gets
  end
...
  socket_mutex.synchronize do
    client.puts(client_write)
  end

This seems like a reasonable workaround to the problem, although maybe with some performance implications. It may be that MRI Ruby is more immune to this kind of problem because of the GIL, not sure?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions