ROX-33328: Avoid cache re-poisoning during coalesced fetches #19600
ROX-33328: Avoid cache re-poisoning during coalesced fetches #19600
Conversation
|
This change is part of the following stack: Change managed by git-spice. |
|
Skipping CI for Draft Pull Request. |
|
Images are ready for the commit at ce07e41. To use with deploy scripts, first |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19600 +/- ##
========================================
Coverage 49.59% 49.60%
========================================
Files 2766 2766
Lines 208583 208692 +109
========================================
+ Hits 103457 103522 +65
- Misses 97449 97489 +40
- Partials 7677 7681 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6836bf6 to
4d98322
Compare
4031ed6 to
348220e
Compare
4d98322 to
e994f13
Compare
348220e to
ec0f7f7
Compare
e994f13 to
97acf6d
Compare
1ad1e5f to
782897c
Compare
97acf6d to
ce07e41
Compare
📝 WalkthroughWalkthroughThis PR adds targeted image cache invalidation to the admission control system. New protocol buffer messages enable selective cache entry removal instead of full purges. A generation-tracking mechanism prevents stale in-flight image fetches from re-populating the cache after invalidation. Changes
Sequence DiagramsequenceDiagram
participant Central as Central Service
participant Manager as Admission Control Manager
participant Cache as Image Cache
participant GenTracker as Generation Tracker
participant InFlightFetch as In-Flight Image Fetch
Central->>Manager: Send AdmCtrlImageCacheInvalidation
activate Manager
Manager->>GenTracker: Increment generation for cache key
Manager->>Cache: Remove targeted cache entry
Manager->>Cache: Remove name→key mapping
deactivate Manager
Note over InFlightFetch: Stale fetch completes<br/>after invalidation
InFlightFetch->>GenTracker: Check current generation
GenTracker-->>InFlightFetch: Return new generation value
alt Generation unchanged
InFlightFetch->>Cache: Cache image result
else Generation changed
InFlightFetch->>InFlightFetch: Discard stale result
Note over Cache: Cache not re-poisoned
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ce07e41 to
d93efd5
Compare
🚀 Build Images ReadyImages are ready for commit dbcacfd. To use with deploy scripts: export MAIN_IMAGE_TAG=4.11.x-605-gdbcacfd9be |
633958e to
0ea06a1
Compare
3603619 to
3c5b3b2
Compare
3c5b3b2 to
d97556c
Compare
eb958d5 to
348a8b6
Compare
d97556c to
71e898a
Compare
348a8b6 to
4953faf
Compare
71e898a to
e8acd3c
Compare
915f7d7 to
7361076
Compare
e8acd3c to
936bdfa
Compare
7361076 to
872624e
Compare
936bdfa to
0b23161
Compare
872624e to
2a2b7e9
Compare
d685b40 to
0c477c1
Compare
0c477c1 to
dbcacfd
Compare
|
/test all |
|
/test gke-nongroovy-e2e-tests |
|
/test gke-qa-e2e-tests |
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Hey - I've found 1 issue
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location path="sensor/admission-control/manager/manager_impl.go" line_range="97" />
<code_context>
imageNameCacheSize = 8192
)
+// imageGenTracker tracks per-key generation counters to prevent stale in-flight
+// fetches from re-caching data after an invalidation or full purge.
+type imageGenTracker struct {
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the per-key imageGenTracker with a single atomic global generation counter to simplify stale-fetch protection logic while preserving behavior.
You can keep the stale‑fetch protection while substantially simplifying the tracking logic.
### 1. Collapse `imageGenTracker` into a single global generation
Given the current use (invalidate on targeted keys + purge on cacheVersion change), a single monotonic cache generation is enough to prevent stale in‑flight writes, without per‑key tracking and without duplicating `cacheVersion`.
You can remove `imageGenTracker` entirely and add an atomic generation field to `manager`:
```go
type manager struct {
// ...
imageCacheGen atomic.Uint64 // global generation for all image cache state
// ...
}
```
Increment it on any event that invalidates cache correctness (cacheVersion change or targeted invalidation):
```go
func (m *manager) processImageCacheInvalidation(inv *sensor.AdmCtrlImageCacheInvalidation) {
s := m.currentState()
flatten := s != nil && s.GetFlattenImageData()
// bump global generation once per invalidation batch
m.imageCacheGen.Add(1)
invalidated := 0
for _, key := range inv.GetImageKeys() {
cacheKey := key.GetImageId()
if flatten && key.GetImageIdV2() != "" {
cacheKey = key.GetImageIdV2()
}
fullName := key.GetImageFullName()
if cacheKey != "" {
m.imageCache.Remove(cacheKey)
m.imageFetchGroup.Forget(cacheKey)
invalidated++
}
if fullName != "" {
m.imageNameToImageCacheKey.Remove(fullName)
m.imageFetchGroup.Forget(fullName)
}
}
log.Infof("Targeted image cache invalidation: invalidated %d entries", invalidated)
}
```
On settings changes, use the same generation instead of separate `cacheVersion` in the tracker:
```go
if newSettings.GetCacheVersion() != m.cacheVersion {
log.Infof("CacheVersion changed (%s -> %s): purging image cache",
m.cacheVersion, newSettings.GetCacheVersion())
m.imageCache.Purge()
m.imageNameToImageCacheKey.Purge()
m.imageCacheGen.Add(1) // invalidate all in-flight fetches
m.cacheVersion = newSettings.GetCacheVersion()
}
```
### 2. Simplify the fetch path API
Instead of `Snapshot` + `Changed` with `gen` and `cacheVersion`, expose a minimal “snapshot + stale check” around the global generation:
At fetch start:
```go
startGen := m.imageCacheGen.Load()
// ... perform remote fetch ...
if startGen != m.imageCacheGen.Load() {
// stale: cache was invalidated during fetch; do not write result
return result // or just skip storing
}
// safe to store in caches
```
You can wrap this in a helper if you want to keep call sites clean:
```go
type cacheGenSnapshot uint64
func (m *manager) snapshotCacheGen() cacheGenSnapshot {
return cacheGenSnapshot(m.imageCacheGen.Load())
}
func (m *manager) isCacheSnapshotStale(s cacheGenSnapshot) bool {
return uint64(s) != m.imageCacheGen.Load()
}
```
Usage:
```go
snap := m.snapshotCacheGen()
// ... fetch ...
if m.isCacheSnapshotStale(snap) {
return img // but do not update cache
}
// update cache
```
This preserves the “don’t re‑cache after invalidation/purge” behavior while:
- Removing the separate `imageGenTracker` type, mutex, and per‑key map.
- Avoiding the dual `cacheVersion` + per‑key generation tracking.
- Eliminating `Get`/`Clear` methods used only for tests; tests instead assert behavior via the public fetch/invalidation API and `imageCacheGen` if needed.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Description
Generation counter for stale-write prevention: A targeted invalidation or full cache flush can race with an in-flight fetch: the fetch starts before the invalidation, the invalidation clears the cache, then the fetch completes and writes stale data back. To prevent this, a per-key generation counter (
imageGenTracker) is incremented on invalidation. In-flight fetches capture a generation snapshot before the fetch call and compare it after — if the generation changed, the result is not cached. This ensures invalidation takes effect immediately rather than being masked for the duration of the cache entry's lifetime.The generation counters get wiped out every reprocessing interval, because the periodic reprocessing triggers a full flush of the admission controller cache.
User-facing documentation
Testing and quality
Automated testing
How I validated my change
Tested this as a part of the stack, along with PR #19837 and #19840.