Skip to content

Fix issues in global variable thread-safety#8003

Merged
headius merged 8 commits intojruby:masterfrom
headius:global_var_race
Nov 8, 2023
Merged

Fix issues in global variable thread-safety#8003
headius merged 8 commits intojruby:masterfrom
headius:global_var_race

Conversation

@headius
Copy link
Member

@headius headius commented Nov 3, 2023

A few places where we access and cache global variables were found to have races while investigating some Process.kill specs hanging only in JIT mode. I found the following issues:

  • GlobalVariables.getVariable can race with a parallel global variable set, ending up with two different GlobalVariable accessor objects getting created. Caching the wrong one will result in a stale value cached permanently.
  • The indy logic for caching global variable values had a race in acquiring the value and the SwitchPoint that guards it. Because they were acquired in the wrong order (value first) there's a chance that an old value might be guarded by a newer SwitchPoint and never get invalidated.

With these fixes made the Process#kill specs in question always run to completion, and I have re-enabled indy-based global variable caching (up to 100 invalidations, and then it falls back on a slow path).

Based on these changes I also audited other uses of SwitchPoint.guardWithTest and found a few more cases where the SwitchPoint was acquired after the value. All the cases I found are fixed in this PR.

@headius headius added this to the JRuby 9.4.6.0 milestone Nov 3, 2023
headius added a commit to headius/jruby that referenced this pull request Nov 3, 2023
The SwitchPoint used to invalidate a given value should always be
acquired prior to retrieving the value. This ensures that value
will always be guarded by its corresponding SwitchPoint, or if the
value changes between the two accesses, the value will be
associated with a prior invalid SwitchPoint and very shortly end
up back in the fallback to get the new value.

With several of these changes, the SwitchPoint was acquired after
the guarded value, meaning we might accidentally associate an old
value with a new SwitchPoint. If that new SwitchPoint is never
invalidated, we will cache the old value forever.

This relates to global variable fixes in prior commits as part of
jruby#8003.
headius added a commit to headius/jruby that referenced this pull request Nov 3, 2023
The SwitchPoint used to invalidate a given value should always be
acquired prior to retrieving the value. This ensures that value
will always be guarded by its corresponding SwitchPoint, or if the
value changes between the two accesses, the value will be
associated with a prior invalid SwitchPoint and very shortly end
up back in the fallback to get the new value.

With several of these changes, the SwitchPoint was acquired after
the guarded value, meaning we might accidentally associate an old
value with a new SwitchPoint. If that new SwitchPoint is never
invalidated, we will cache the old value forever.

This relates to global variable fixes in prior commits as part of
jruby#8003.
headius added a commit to headius/jruby that referenced this pull request Nov 3, 2023
The SwitchPoint used to invalidate a given value should always be
acquired prior to retrieving the value. This ensures that value
will always be guarded by its corresponding SwitchPoint, or if the
value changes between the two accesses, the value will be
associated with a prior invalid SwitchPoint and very shortly end
up back in the fallback to get the new value.

With several of these changes, the SwitchPoint was acquired after
the guarded value, meaning we might accidentally associate an old
value with a new SwitchPoint. If that new SwitchPoint is never
invalidated, we will cache the old value forever.

This relates to global variable fixes in prior commits as part of
jruby#8003.
@headius headius force-pushed the global_var_race branch 3 times, most recently from 989bfe1 to 00131e9 Compare November 3, 2023 21:45
@headius headius linked an issue Nov 3, 2023 that may be closed by this pull request
When a variable does not exist but we wish to acquire the
GlobalVariable object for it, this logic will attempt to create
a new GlobalVariable and add it to the map. Unfortunately it does
this without any locking or atomicity, potentially overwriting a
value written to the same name on a different thread.

When the GlobalVariable returned by this method is cached, as it
is in the JIT logic for global variable reads, it may never see
updates because those writes are going to the winner of the race.
This caused at least one set of specs, for Process.kill, to hang
when the global variable used for indicating a signal had been
damaged in this way.

Instead we computeIfAbsent to ensure only one GlobalVariable will
ever be inserted for this name.
The logic here acquires the Invalidator from the variable and uses
that to guard the cached value. However, if this is a SwitchPoint
invalidator (assumed by the code) then it replaces the SwitchPoint
with a new one after invalidation. If we do not grab the actual
SwitchPoint before the value, we may end up associating the
current value with a newer SwitchPoint and never see the
subsequent update.

This is likely the main cause of intermittent global variable
failures with indy global caching turned on.
The SwitchPoint used to invalidate a given value should always be
acquired prior to retrieving the value. This ensures that value
will always be guarded by its corresponding SwitchPoint, or if the
value changes between the two accesses, the value will be
associated with a prior invalid SwitchPoint and very shortly end
up back in the fallback to get the new value.

With several of these changes, the SwitchPoint was acquired after
the guarded value, meaning we might accidentally associate an old
value with a new SwitchPoint. If that new SwitchPoint is never
invalidated, we will cache the old value forever.

This relates to global variable fixes in prior commits as part of
jruby#8003.
If this variable object has been cached, it must be invalidated
so such code goes back to the table for the new variable.
@headius headius merged commit 8705da4 into jruby:master Nov 8, 2023
@headius headius deleted the global_var_race branch November 8, 2023 03:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rework global variables, caching to eliminate race conditions

1 participant