Temporarily revert open-addressing Hash for concurrency reasons.#5412
Temporarily revert open-addressing Hash for concurrency reasons.#5412headius merged 1 commit intojruby:masterfrom
Conversation
|
@ChrisBr We are going to delay the release of your open-addressing Hash until 9.2.2, so we can investigate how to make it a bit more robust under concurrent mutation. This is not an indictment of your work...the fact that the 9.2 Hash impl seems to survive more unsafe scenarios may be accidental, or simply due to the fact that it starts using a heavy bucket system immediately. See discussions under #5406 about @enebo's discovery, but ignore my original assertions because they were based on a flawed understanding of the new Hash impl. Here are a few examples of scripts that appear to work in 9.2 and fail in 9.2.1: Leads to NPE in h = {}
Thread.abort_on_exception = true
t1 = Thread.new { i = 0; loop { h[i+=1] = 1 } }
t2 = Thread.new { loop { h.shift } }
sleepLeads to ArrayIndexOutOfBounds quickly: h = {}
Thread.abort_on_exception = true
loop {
x = false
t1 = Thread.new {|t| Thread.pass until x; i = 0; 10_000_000.times { h[i+=1] = Thread.current } }
t2 = Thread.new { Thread.pass until x; i = 0.1; 10_000_000.times { h[i+=1] = Thread.current } }
x = true
t1.join
t2.join
}I have not included a third script that used |
While verifying JRuby 9.2.1, @enebo discovered that certain concurrent uses of a simple Rails app were triggering intermittent errors, and forcing the JIT to run very quickly produced NPEs under concurrent load. In 6b2fa45 @enebo was able to fix this by adding a null check, but we could not explain how a null could appear at that point in the code unless the hash had been corrupted, perhaps across threads. Additional testing with a number of contrived scripts seems to show that the new open-addressing hash (plus its linear mode) is more fragile under concurrency than the old linked buckets implementation. This is in a way, by design: the new implementation reduces indirection and allocation by packing keys and values into a contiguous area of memory, opening up potential for more races against the same data. In order to have a stable 9.2.1 release, we are temporarily rolling back this change until we can get a better grasp of the changes needed to make it as least as concurrency-friendly as the old hash implementation, if not properly concurrency-safe (which may or may not be easier in this new implementation). Original open addressing work by @ChrisBr.
de0283e to
5ad17f7
Compare
|
@headius hm ok! Any idea how we can make it (more) thread safe? |
While verifying JRuby 9.2.1, @enebo discovered that certain
concurrent uses of a simple Rails app were triggering intermittent
errors, and forcing the JIT to run very quickly produced NPEs
under concurrent load. In 6b2fa45 @enebo was able to fix this
by adding a null check, but we could not explain how a null could
appear at that point in the code unless the hash had been
corrupted, perhaps across threads.
Additional testing with a number of contrived scripts seems to
show that the new open-addressing hash (plus its linear mode) is
more fragile under concurrency than the old linked buckets
implementation. This is in a way, by design: the new
implementation reduces indirection and allocation by packing
keys and values into a contiguous area of memory, opening up
potential for more races against the same data.
In order to have a stable 9.2.1 release, we are temporarily
rolling back this change until we can get a better grasp of the
changes needed to make it as least as concurrency-friendly as the
old hash implementation, if not properly concurrency-safe (which
may or may not be easier in this new implementation).