Fix issues in global variable thread-safety#8003
Merged
headius merged 8 commits intojruby:masterfrom Nov 8, 2023
Merged
Conversation
headius
added a commit
to headius/jruby
that referenced
this pull request
Nov 3, 2023
The SwitchPoint used to invalidate a given value should always be acquired prior to retrieving the value. This ensures that value will always be guarded by its corresponding SwitchPoint, or if the value changes between the two accesses, the value will be associated with a prior invalid SwitchPoint and very shortly end up back in the fallback to get the new value. With several of these changes, the SwitchPoint was acquired after the guarded value, meaning we might accidentally associate an old value with a new SwitchPoint. If that new SwitchPoint is never invalidated, we will cache the old value forever. This relates to global variable fixes in prior commits as part of jruby#8003.
headius
added a commit
to headius/jruby
that referenced
this pull request
Nov 3, 2023
The SwitchPoint used to invalidate a given value should always be acquired prior to retrieving the value. This ensures that value will always be guarded by its corresponding SwitchPoint, or if the value changes between the two accesses, the value will be associated with a prior invalid SwitchPoint and very shortly end up back in the fallback to get the new value. With several of these changes, the SwitchPoint was acquired after the guarded value, meaning we might accidentally associate an old value with a new SwitchPoint. If that new SwitchPoint is never invalidated, we will cache the old value forever. This relates to global variable fixes in prior commits as part of jruby#8003.
740cfc5 to
1f6fe4a
Compare
headius
added a commit
to headius/jruby
that referenced
this pull request
Nov 3, 2023
The SwitchPoint used to invalidate a given value should always be acquired prior to retrieving the value. This ensures that value will always be guarded by its corresponding SwitchPoint, or if the value changes between the two accesses, the value will be associated with a prior invalid SwitchPoint and very shortly end up back in the fallback to get the new value. With several of these changes, the SwitchPoint was acquired after the guarded value, meaning we might accidentally associate an old value with a new SwitchPoint. If that new SwitchPoint is never invalidated, we will cache the old value forever. This relates to global variable fixes in prior commits as part of jruby#8003.
989bfe1 to
00131e9
Compare
When a variable does not exist but we wish to acquire the GlobalVariable object for it, this logic will attempt to create a new GlobalVariable and add it to the map. Unfortunately it does this without any locking or atomicity, potentially overwriting a value written to the same name on a different thread. When the GlobalVariable returned by this method is cached, as it is in the JIT logic for global variable reads, it may never see updates because those writes are going to the winner of the race. This caused at least one set of specs, for Process.kill, to hang when the global variable used for indicating a signal had been damaged in this way. Instead we computeIfAbsent to ensure only one GlobalVariable will ever be inserted for this name.
The logic here acquires the Invalidator from the variable and uses that to guard the cached value. However, if this is a SwitchPoint invalidator (assumed by the code) then it replaces the SwitchPoint with a new one after invalidation. If we do not grab the actual SwitchPoint before the value, we may end up associating the current value with a newer SwitchPoint and never see the subsequent update. This is likely the main cause of intermittent global variable failures with indy global caching turned on.
The SwitchPoint used to invalidate a given value should always be acquired prior to retrieving the value. This ensures that value will always be guarded by its corresponding SwitchPoint, or if the value changes between the two accesses, the value will be associated with a prior invalid SwitchPoint and very shortly end up back in the fallback to get the new value. With several of these changes, the SwitchPoint was acquired after the guarded value, meaning we might accidentally associate an old value with a new SwitchPoint. If that new SwitchPoint is never invalidated, we will cache the old value forever. This relates to global variable fixes in prior commits as part of jruby#8003.
If this variable object has been cached, it must be invalidated so such code goes back to the table for the new variable.
df949da to
cdc95ce
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A few places where we access and cache global variables were found to have races while investigating some
Process.killspecs hanging only in JIT mode. I found the following issues:With these fixes made the
Process#killspecs in question always run to completion, and I have re-enabled indy-based global variable caching (up to 100 invalidations, and then it falls back on a slow path).Based on these changes I also audited other uses of
SwitchPoint.guardWithTestand found a few more cases where the SwitchPoint was acquired after the value. All the cases I found are fixed in this PR.