Skip to content

ROX-33328: Avoid cache re-poisoning during coalesced fetches #19600

Open
clickboo wants to merge 1 commit intomasterfrom
boo-adm-cntr-image-gen-counters
Open

ROX-33328: Avoid cache re-poisoning during coalesced fetches #19600
clickboo wants to merge 1 commit intomasterfrom
boo-adm-cntr-image-gen-counters

Conversation

@clickboo
Copy link
Copy Markdown
Contributor

@clickboo clickboo commented Mar 25, 2026

Description

Generation counter for stale-write prevention: A targeted invalidation or full cache flush can race with an in-flight fetch: the fetch starts before the invalidation, the invalidation clears the cache, then the fetch completes and writes stale data back. To prevent this, a per-key generation counter (imageGenTracker) is incremented on invalidation. In-flight fetches capture a generation snapshot before the fetch call and compare it after — if the generation changed, the result is not cached. This ensures invalidation takes effect immediately rather than being masked for the duration of the cache entry's lifetime.

The generation counters get wiped out every reprocessing interval, because the periodic reprocessing triggers a full flush of the admission controller cache.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Tested this as a part of the stack, along with PR #19837 and #19840.

@clickboo
Copy link
Copy Markdown
Contributor Author

clickboo commented Mar 25, 2026

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 25, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

sourcery-ai[bot]

This comment was marked as outdated.

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Mar 25, 2026

Images are ready for the commit at ce07e41.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-527-gce07e419e1.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 75.55556% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.60%. Comparing base (8c9e377) to head (dbcacfd).
⚠️ Report is 23 commits behind head on master.

Files with missing lines Patch % Lines
sensor/admission-control/manager/images.go 0.00% 6 Missing ⚠️
sensor/admission-control/manager/manager_impl.go 87.17% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master   #19600    +/-   ##
========================================
  Coverage   49.59%   49.60%            
========================================
  Files        2766     2766            
  Lines      208583   208692   +109     
========================================
+ Hits       103457   103522    +65     
- Misses      97449    97489    +40     
- Partials     7677     7681     +4     
Flag Coverage Δ
go-unit-tests 49.60% <75.55%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from 6836bf6 to 4d98322 Compare March 25, 2026 12:19
@clickboo clickboo changed the title ROX-33328: Adding support for image generation counters for targeted invalidation ROX-33328: Image generation counters for targeted invalidation Mar 25, 2026
@clickboo clickboo changed the title ROX-33328: Image generation counters for targeted invalidation ROX-33328: Avoid cache re-poisoning when image cache is invalidated during coalesced fetches Mar 25, 2026
@clickboo clickboo changed the title ROX-33328: Avoid cache re-poisoning when image cache is invalidated during coalesced fetches ROX-33328: Avoid cache re-poisoning during coalesced fetches Mar 25, 2026
@stackrox stackrox deleted a comment from openshift-ci bot Mar 25, 2026
@stackrox stackrox deleted a comment from openshift-ci bot Mar 25, 2026
@clickboo clickboo force-pushed the boo-adm-cntrl-targeted-invalidation-foundation branch from 4031ed6 to 348220e Compare March 25, 2026 15:00
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from 4d98322 to e994f13 Compare March 25, 2026 15:00
@clickboo clickboo force-pushed the boo-adm-cntrl-targeted-invalidation-foundation branch from 348220e to ec0f7f7 Compare March 27, 2026 12:29
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from e994f13 to 97acf6d Compare March 27, 2026 12:33
@clickboo clickboo force-pushed the boo-adm-cntrl-targeted-invalidation-foundation branch 2 times, most recently from 1ad1e5f to 782897c Compare April 1, 2026 18:35
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from 97acf6d to ce07e41 Compare April 1, 2026 18:35
Base automatically changed from boo-adm-cntrl-targeted-invalidation-foundation to master April 1, 2026 20:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 4, 2026

📝 Walkthrough

Walkthrough

This PR adds targeted image cache invalidation to the admission control system. New protocol buffer messages enable selective cache entry removal instead of full purges. A generation-tracking mechanism prevents stale in-flight image fetches from re-populating the cache after invalidation.

Changes

Cohort / File(s) Summary
Protocol Buffer Definitions
proto/internalapi/sensor/admission_control.proto, proto/internalapi/sensor/admission_control_iservice.proto
Added AdmCtrlImageCacheInvalidation message type and integrated it as a new oneof variant in MsgToAdmissionControl for targeted cache invalidation messaging.
Manager Interface & Core Implementation
sensor/admission-control/manager/manager.go, sensor/admission-control/manager/manager_impl.go
Exposed ImageCacheInvalidationC() channel on the Manager interface. Implemented generation-tracking (imageGenTracker) for cache keys, integrated invalidation message handling in the event loop, and added processImageCacheInvalidation to increment generations, remove cache entries, and forget pending fetches.
Image Cache Logic
sensor/admission-control/manager/images.go
Updated fetchImage to capture cache generation before fetching and conditionally cache results only if generation remains unchanged, preventing stale writes after invalidation.
Message Dispatch
sensor/admission-control/settingswatch/sensor_push.go
Wired the new imageCacheInvalidationOutC channel and added dispatch case for AdmCtrlImageCacheInvalidation messages in the streaming message switch.
Test Coverage
sensor/admission-control/manager/images_test.go
Added test suite setup for generation tracking and comprehensive test cases validating targeted cache invalidation behavior, generation counter incrementing, and stale-write prevention.

Sequence Diagram

sequenceDiagram
    participant Central as Central Service
    participant Manager as Admission Control Manager
    participant Cache as Image Cache
    participant GenTracker as Generation Tracker
    participant InFlightFetch as In-Flight Image Fetch

    Central->>Manager: Send AdmCtrlImageCacheInvalidation
    activate Manager
    
    Manager->>GenTracker: Increment generation for cache key
    Manager->>Cache: Remove targeted cache entry
    Manager->>Cache: Remove name→key mapping
    deactivate Manager

    Note over InFlightFetch: Stale fetch completes<br/>after invalidation
    InFlightFetch->>GenTracker: Check current generation
    GenTracker-->>InFlightFetch: Return new generation value
    
    alt Generation unchanged
        InFlightFetch->>Cache: Cache image result
    else Generation changed
        InFlightFetch->>InFlightFetch: Discard stale result
        Note over Cache: Cache not re-poisoned
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: preventing cache re-poisoning (stale data) during coalesced image fetches in the admission controller.
Description check ✅ Passed The PR description is well-structured and covers all major required sections including a clear description of changes, user-facing documentation status, testing and quality checklist items, and validation approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch boo-adm-cntr-image-gen-counters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from ce07e41 to d93efd5 Compare April 5, 2026 14:30
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

🚀 Build Images Ready

Images are ready for commit dbcacfd. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-605-gdbcacfd9be

@clickboo clickboo force-pushed the boo-central-refresh-ttl-invalidate branch from 633958e to 0ea06a1 Compare April 6, 2026 06:54
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from 3603619 to 3c5b3b2 Compare April 6, 2026 06:54
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from 3c5b3b2 to d97556c Compare April 6, 2026 08:32
@clickboo clickboo force-pushed the boo-central-refresh-ttl-invalidate branch from eb958d5 to 348a8b6 Compare April 7, 2026 09:29
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from d97556c to 71e898a Compare April 7, 2026 09:29
@clickboo clickboo force-pushed the boo-central-refresh-ttl-invalidate branch from 348a8b6 to 4953faf Compare April 8, 2026 18:50
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from 71e898a to e8acd3c Compare April 8, 2026 18:50
@clickboo clickboo force-pushed the boo-central-refresh-ttl-invalidate branch 2 times, most recently from 915f7d7 to 7361076 Compare April 8, 2026 19:50
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from e8acd3c to 936bdfa Compare April 8, 2026 19:51
@clickboo clickboo force-pushed the boo-central-refresh-ttl-invalidate branch from 7361076 to 872624e Compare April 8, 2026 20:17
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from 936bdfa to 0b23161 Compare April 8, 2026 20:17
@clickboo clickboo force-pushed the boo-central-refresh-ttl-invalidate branch from 872624e to 2a2b7e9 Compare April 9, 2026 04:57
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch 3 times, most recently from d685b40 to 0c477c1 Compare April 9, 2026 05:52
@clickboo clickboo force-pushed the boo-adm-cntr-image-gen-counters branch from 0c477c1 to dbcacfd Compare April 9, 2026 05:59
@stackrox stackrox deleted a comment from openshift-ci bot Apr 9, 2026
@clickboo clickboo changed the base branch from boo-central-refresh-ttl-invalidate to master April 9, 2026 06:06
@clickboo
Copy link
Copy Markdown
Contributor Author

clickboo commented Apr 9, 2026

/test all

@clickboo
Copy link
Copy Markdown
Contributor Author

clickboo commented Apr 9, 2026

/test gke-nongroovy-e2e-tests

@clickboo
Copy link
Copy Markdown
Contributor Author

clickboo commented Apr 9, 2026

/test gke-qa-e2e-tests

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{}

@clickboo clickboo marked this pull request as ready for review April 12, 2026 18:27
@clickboo clickboo requested a review from a team as a code owner April 12, 2026 18:27
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="sensor/admission-control/manager/manager_impl.go" line_range="97" />
<code_context>
 	imageNameCacheSize = 8192
 )

+// imageGenTracker tracks per-key generation counters to prevent stale in-flight
+// fetches from re-caching data after an invalidation or full purge.
+type imageGenTracker struct {
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the per-key imageGenTracker with a single atomic global generation counter to simplify stale-fetch protection logic while preserving behavior.

You can keep the stale‑fetch protection while substantially simplifying the tracking logic.

### 1. Collapse `imageGenTracker` into a single global generation

Given the current use (invalidate on targeted keys + purge on cacheVersion change), a single monotonic cache generation is enough to prevent stale in‑flight writes, without per‑key tracking and without duplicating `cacheVersion`.

You can remove `imageGenTracker` entirely and add an atomic generation field to `manager`:

```go
type manager struct {
    // ...
    imageCacheGen atomic.Uint64 // global generation for all image cache state
    // ...
}
```

Increment it on any event that invalidates cache correctness (cacheVersion change or targeted invalidation):

```go
func (m *manager) processImageCacheInvalidation(inv *sensor.AdmCtrlImageCacheInvalidation) {
    s := m.currentState()
    flatten := s != nil && s.GetFlattenImageData()

    // bump global generation once per invalidation batch
    m.imageCacheGen.Add(1)

    invalidated := 0
    for _, key := range inv.GetImageKeys() {
        cacheKey := key.GetImageId()
        if flatten && key.GetImageIdV2() != "" {
            cacheKey = key.GetImageIdV2()
        }
        fullName := key.GetImageFullName()

        if cacheKey != "" {
            m.imageCache.Remove(cacheKey)
            m.imageFetchGroup.Forget(cacheKey)
            invalidated++
        }
        if fullName != "" {
            m.imageNameToImageCacheKey.Remove(fullName)
            m.imageFetchGroup.Forget(fullName)
        }
    }

    log.Infof("Targeted image cache invalidation: invalidated %d entries", invalidated)
}
```

On settings changes, use the same generation instead of separate `cacheVersion` in the tracker:

```go
if newSettings.GetCacheVersion() != m.cacheVersion {
    log.Infof("CacheVersion changed (%s -> %s): purging image cache",
        m.cacheVersion, newSettings.GetCacheVersion())

    m.imageCache.Purge()
    m.imageNameToImageCacheKey.Purge()
    m.imageCacheGen.Add(1) // invalidate all in-flight fetches

    m.cacheVersion = newSettings.GetCacheVersion()
}
```

### 2. Simplify the fetch path API

Instead of `Snapshot` + `Changed` with `gen` and `cacheVersion`, expose a minimal “snapshot + stale check” around the global generation:

At fetch start:

```go
startGen := m.imageCacheGen.Load()
// ... perform remote fetch ...
if startGen != m.imageCacheGen.Load() {
    // stale: cache was invalidated during fetch; do not write result
    return result // or just skip storing
}
// safe to store in caches
```

You can wrap this in a helper if you want to keep call sites clean:

```go
type cacheGenSnapshot uint64

func (m *manager) snapshotCacheGen() cacheGenSnapshot {
    return cacheGenSnapshot(m.imageCacheGen.Load())
}

func (m *manager) isCacheSnapshotStale(s cacheGenSnapshot) bool {
    return uint64(s) != m.imageCacheGen.Load()
}
```

Usage:

```go
snap := m.snapshotCacheGen()
// ... fetch ...
if m.isCacheSnapshotStale(snap) {
    return img // but do not update cache
}
// update cache
```

This preserves the “don’t re‑cache after invalidation/purge” behavior while:

- Removing the separate `imageGenTracker` type, mutex, and per‑key map.
- Avoiding the dual `cacheVersion` + per‑key generation tracking.
- Eliminating `Get`/`Clear` methods used only for tests; tests instead assert behavior via the public fetch/invalidation API and `imageCacheGen` if needed.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants