ROX-33339: Optimize Sensor reprocessor gRPC and AC image cache churn by clickboo · Pull Request #19840 · stackrox/stackrox

clickboo · 2026-04-06T06:28:55Z

Description

Optimizes Central's short-circuit reprocessor path to reduce unnecessary UpdatedImage message overhead and prevent full Admission Controller cache purges.

Previously, Central sent UpdatedImage for every image during short circuit reprocessing regardless of whether the scan data actually changed. This caused unnecessary proto serialization/deserialization overhead and, combined with the subsequent FlushCache, purged the entire admission controller cache even when most images were unchanged.

What changed (short-circuit path only):

Central's reprocessor now distinguishes between changed and unchanged images during the short-circuit reprocessing cycle:

Changed images (imageUpdated=true): Central sends UpdatedImage as before. Sensor's detector accumulates the image key.
Unchanged images (imageUpdated=false): Central skips UpdatedImage entirely and batches the image key for a single RefreshImageCacheTTL message per cluster, keeping Sensor's cache warm without the deserialization cost of full image data.

At the end of the cycle, Central sends ReprocessDeployments with skip_cache_flush=true. Sensor's detector then sends a single batched InvalidateImageCache to the Admission Controller for only the changed images, instead of flushing the entire admission controller cache. This keeps the Admission controller cache warm for unchanged images, avoiding unnecessary re-fetches on subsequent review requests.

Central -> Sensor gRPC message and cache operation comparison (short-circuit path, per cluster)

N = total images reprocessed, K = images with changed scan data, D = deployments with cached images in admission controller

Metric	Before PR	After PR
`UpdatedImage` messages (Central→Sensor)	N	K
`RefreshImageCacheTTL` (Central→Sensor)	—	1 (batched, N−K keys)
`ReprocessDeployments` (Central→Sensor)	1	1 (`SkipCacheFlush=true`)
Total Central→Sensor messages	N + 1	K + 2
Proto serializations (full image data)	N	K
Sensor `imageCache.Add` (replace data)	N	K
Sensor `imageCache.Touch` (TTL only)	—	N − K
Sensor deployment re-detections	N	K
AC image cache	Full purge	Targeted invalidation (K entries)
AC cache entries preserved	0	N − K
AC re-fetch load (post-purge)	D (all cached images)	K (only changed images)
Messages saved	—	N − K − 1

Example: 500 images, 3 changed, 200 of those images in a review request burst to admission controller → Before: 501 messages, 500 deployment re-detections, full admission controller cache purge, 200 re-fetches. After: 5 messages, 3 deployment re-detections, 3 admission controller entries invalidated, 3 re-fetches.

Sensor detection during reassess (no image changes)

Master: each deployment is re-evaluated N+1 times — once per ResolveDeploymentsByImages for each of its images (triggered by N UpdatedImage messages), plus once from ResolveAllDeployments (triggered by ReprocessDeployments). All but the final pass are redundant.
This PR: each deployment is re-evaluated exactly once — from ResolveAllDeployments only. No UpdatedImage messages are sent for unchanged images, eliminating N redundant per-image re-detection passes. Policy evaluation outcomes are identical.

Key wins:

Message reduction: N + 1 → K + 2 (typically K ≪ N, saving nearly N messages per cycle)
Serialization savings: only K full image protos serialized/deserialized instead of N
Sensor cache churn eliminated: N − K unnecessary cache replacements become lightweight TTL touches
Deployment re-detection reduced: N → K; avoids N − K redundant ResolveDeploymentsByImages calls.
Admission controller cache kept warm: N − K entries preserved instead of full purge
Admission controller re-fetch load reduced: D → K; avoids a burst of image re-fetches from Sensor/Central for unchanged images after every reprocessing cycle

Periodic reprocessor path is unchanged in behavior: The periodic path (default 4h interval) continues to send UpdatedImage for every image and ReprocessDeployments with skip_cache_flush=false (the proto default). Sensor receives skip_cache_flush=false and performs a full FlushCache() on the admission controller cache. The admissionCacheNeedsFlush guard was removed since Central never sends ReprocessDeployments without preceding images (early return on zero results), making the flag always true when reached.

Version compatibility:

Central	Sensor	Behavior
New	New (with `TargetedImageCacheInvalidation` capability)	Short-circuit path uses `RefreshImageCacheTTL` for unchanged images, batched targeted invalidation for changed images
New	Old (without capability)	Central always sends `UpdatedImage` (capability gate in `sendReprocessingMessages`). `skip_cache_flush` defaults to `false` in proto, so old Sensor calls `FlushCache()`. Identical to pre-change behavior
Old	New	Old Central never sends `RefreshImageCacheTTL` or sets `skip_cache_flush`. New Sensor sees `skip_cache_flush=false` and calls `FlushCache()`. Identical to pre-change behavior

No breaking changes in any version combination.

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

How I validated my change

Testing

Validated end-to-end using bench-reassess.sh (available in #19427) against a live cluster, comparing a master build to this PR's build. The script automates:

Burst 1 — sends N deployment creates (--dry-run=server) to warm AC/Sensor caches
Snapshot pre-reassess — scrapes Prometheus metrics from AC, Sensor, and Central
Trigger reassess — calls POST /v1/policies/reassess, waits for Central's reprocessor to complete by tailing logs for the completion marker
Snapshot post-reassess — scrapes metrics again to measure reassess-cycle impact
Burst 2 — repeats the burst to measure cache survival after reassess
Snapshot post-burst2 — final metric scrape
Report — prints deltas for key metrics across all three components

Run: BURST_SIZE=100 UNIQUE_PCT=25 PARALLEL=50 ./tools/admission-control/bench-reassess.sh

Results (100 deployments, 20 unique images):

Metric	Master	PR	Change
Sensor `component_process_message` (reassess)	97	2	-98%
Sensor `deployment_processed` (reassess)	343	142	-59%
AC `image_fetch_total` (burst 2)	51	2	-96%
AC `cache_hit_rate` (burst 2)	~46%	98%	cache survives reassess
AC `review_duration` avg (burst 2)	1.034s	0.034s	30x faster

A detailed description and analysis of the test results can be found here.

clickboo · 2026-04-06T06:28:57Z

This change is part of the following stack:

ROX-33339: Return ScanReused when enrichment uses existing scan #19827
- ROX-33339: Sensor support for refresh image cache TTL message #19837
  - ROX-33339: Optimize Sensor reprocessor gRPC and AC image cache churn #19840 ◀
    - ROX-33328: Avoid cache re-poisoning during coalesced fetches #19600

_{Change managed by git-spice.}

github-actions · 2026-04-06T06:43:32Z

🚀 Build Images Ready

Images are ready for commit 5b5eb15. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-609-g5b5eb15237

codecov · 2026-04-06T07:29:16Z

Codecov Report

❌ Patch coverage is 76.62338% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.56%. Comparing base (9d35506) to head (5b5eb15).
⚠️ Report is 14 commits behind head on master.

Files with missing lines	Patch %	Lines
central/reprocessor/reprocessor.go	85.92%	15 Missing and 4 partials ⚠️
sensor/common/detector/detector.go	0.00%	16 Missing ⚠️
sensor/kubernetes/eventpipeline/pipeline_impl.go	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #19840      +/-   ##
==========================================
- Coverage   49.61%   49.56%   -0.05%     
==========================================
  Files        2765     2764       -1     
  Lines      208541   208411     -130     
==========================================
- Hits       103464   103306     -158     
- Misses      97401    97450      +49     
+ Partials     7676     7655      -21

Flag	Coverage Δ
go-unit-tests	`49.56% <76.62%> (-0.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

sensor/kubernetes/eventpipeline/pipeline_test.go (1)

178-180: Assert the forwarded ReprocessDeployments payload.

This only proves the call happened. A nil or zero-valued message would still satisfy gomock.Any(), so the new skip_cache_flush contract is not actually covered here.

Suggested test tightening

-			s.detector.EXPECT().ProcessReprocessDeployments(gomock.Any()).Times(1).Do(func(_ *central.ReprocessDeployments) {
+			s.detector.EXPECT().ProcessReprocessDeployments(gomock.Any()).Times(1).Do(func(msg *central.ReprocessDeployments) {
+				protoassert.Equal(s.T(), msgFromCentral.GetReprocessDeployments(), msg)
 				defer messageReceived.Done()
 			})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sensor/kubernetes/eventpipeline/pipeline_test.go` around lines 178 - 180, The
test currently uses gomock.Any() in
s.detector.EXPECT().ProcessReprocessDeployments(...) so it only asserts the call
occurred; change the expectation to assert the actual ReprocessDeployments
payload (for example use
gomock.AssignableToTypeOf(&central.ReprocessDeployments{}) or gomock.Eq with a
constructed central.ReprocessDeployments) and in the Do(func(arg
*central.ReprocessDeployments) { ... }) verify the skip_cache_flush field (e.g.,
arg.SkipCacheFlush == true/false as appropriate) before calling
messageReceived.Done(); this ensures the new skip_cache_flush contract is
covered while keeping the same expectation target (ProcessReprocessDeployments)
and teardown.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@central/reprocessor/reprocessor.go`:
- Around line 688-724: The sendRefreshImageCacheTTL currently sends all
ImageKeys for a cluster in one message which can exceed per-message size limits;
update loopImpl.sendRefreshImageCacheTTL to split the keys slice from
state.refreshTTLKeys into bounded batches (e.g. maxKeysPerMsg constant) and send
one central.MsgToSensor_RefreshImageCacheTtl per batch using l.injectMessage;
iterate over batches for each clusterID, handle errors the same way (store
clusterID in state.skipClusterIDs and stop sending further batches for that
cluster), and log per-batch sends (use state.refreshTTLKeys, l.injectMessage,
central.RefreshImageCacheTTL, and state.skipClusterIDs to locate the code to
change).

In `@sensor/common/detector/detector.go`:
- Around line 363-366: The code currently rebuilds central.ImageKey from
storage.Image and appends it to d.updatedImageKeys, losing ImageIdV2; instead,
preserve and append the original central.ImageKey that came with the
updated-image event so ImageIdV2 is retained for InvalidateImageCache. Locate
where d.updatedImageKeys is modified (the append of central.ImageKey in the
updated-image path) and change it to use the incoming/original central.ImageKey
object from the update payload (do not reconstruct from storage.Image), ensuring
ImageIdV2 is carried through to InvalidateImageCache.

In `@ui/apps/platform/src/Containers/Policies/Wizard/Step4/PolicyScopeForm.tsx`:
- Around line 266-278: The exclusion deployment scope block always renders
PolicyScopeCardLegacy and passes props (cardTitle, showTooltips) it doesn't
accept; update it to mirror the inclusion pattern: import ExclusionScopeCard and
conditionally render ExclusionScopeCard when isNewScopingEnabled, falling back
to PolicyScopeCardLegacy otherwise, wiring the same props/handlers (e.g., name
`excludedDeploymentScopes[${index}]`, clusters, onDelete ->
deleteExclusionDeploymentScope, hasAuditLogEventSource) so the new UI receives
the correct component and legacy keeps the old behavior.

---

Nitpick comments:
In `@sensor/kubernetes/eventpipeline/pipeline_test.go`:
- Around line 178-180: The test currently uses gomock.Any() in
s.detector.EXPECT().ProcessReprocessDeployments(...) so it only asserts the call
occurred; change the expectation to assert the actual ReprocessDeployments
payload (for example use
gomock.AssignableToTypeOf(&central.ReprocessDeployments{}) or gomock.Eq with a
constructed central.ReprocessDeployments) and in the Do(func(arg
*central.ReprocessDeployments) { ... }) verify the skip_cache_flush field (e.g.,
arg.SkipCacheFlush == true/false as appropriate) before calling
messageReceived.Done(); this ensures the new skip_cache_flush contract is
covered while keeping the same expectation target (ProcessReprocessDeployments)
and teardown.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 76e57661-0383-4b2e-bf57-d416989b78ac

📥 Commits

Reviewing files that changed from the base of the PR and between 39d15cc and 915f7d7.

⛔ Files ignored due to path filters (2)

generated/internalapi/central/sensor_iservice.pb.go is excluded by !**/*.pb.go, !**/generated/**
generated/internalapi/central/sensor_iservice_vtproto.pb.go is excluded by !**/*.pb.go, !**/generated/**

📒 Files selected for processing (10)

central/reprocessor/reprocessor.go
central/reprocessor/reprocessor_unit_test.go
central/sensor/service/service_impl.go
pkg/images/enricher/enricher.go
proto/internalapi/central/sensor_iservice.proto
sensor/common/detector/detector.go
sensor/common/detector/mocks/detector.go
sensor/kubernetes/eventpipeline/pipeline_impl.go
sensor/kubernetes/eventpipeline/pipeline_test.go
ui/apps/platform/src/Containers/Policies/Wizard/Step4/PolicyScopeForm.tsx

💤 Files with no reviewable changes (1)

central/sensor/service/service_impl.go

central/reprocessor/reprocessor.go

sensor/common/detector/detector.go

ui/apps/platform/src/Containers/Policies/Wizard/Step4/PolicyScopeForm.tsx

central/reprocessor/reprocessor.go

sensor/common/detector/detector.go

sachaudh · 2026-04-10T15:15:20Z

FYI, removed the area/ui label. The PR was on our radar, but I didn't see any UI changes here.

openshift-ci bot added the do-not-merge/work-in-progress label Apr 6, 2026

github-actions bot added area/central area/sensor ai-review coderabbit-review labels Apr 6, 2026

clickboo mentioned this pull request Apr 6, 2026

ROX-33328: Avoid cache re-poisoning during coalesced fetches #19600

Open

9 tasks

This comment was marked as off-topic.

Sign in to view

clickboo force-pushed the boo-central-refresh-ttl-invalidate branch from 633958e to 0ea06a1 Compare April 6, 2026 06:54

This was referenced Apr 6, 2026

ROX-33339: Sensor support for refresh image cache TTL message #19837

Merged

ROX-33339: Return ScanReused when enrichment uses existing scan #19827

Merged

clickboo force-pushed the boo-sensor-refresh-invalidate-reprocessor branch from a5bd2d1 to de678cb Compare April 7, 2026 09:29

clickboo force-pushed the boo-central-refresh-ttl-invalidate branch from eb958d5 to 348a8b6 Compare April 7, 2026 09:29

clickboo force-pushed the boo-sensor-refresh-invalidate-reprocessor branch 2 times, most recently from b1142f8 to f71c251 Compare April 7, 2026 12:00

clickboo changed the title ~~ROX-33339: Optimized sensor and admission controller image cache operations~~ ROX-33339: Image cache optimizations in sensor and admission controller Apr 7, 2026

clickboo changed the title ~~ROX-33339: Image cache optimizations in sensor and admission controller~~ ROX-33339: Sensor & Admission Controller image cache optimizations Apr 7, 2026

clickboo changed the title ~~ROX-33339: Sensor & Admission Controller image cache optimizations~~ ROX-33339: Sensor & Admission Cntrl image cache optimizations Apr 7, 2026

clickboo changed the title ~~ROX-33339: Sensor & Admission Cntrl image cache optimizations~~ ROX-33339: Sensor & Admission Controller image cache optimizations Apr 7, 2026

clickboo changed the title ~~ROX-33339: Sensor & Admission Controller image cache optimizations~~ ROX-33339: Optimize Sensor reprocessor gRPC and AC cache churn Apr 7, 2026

clickboo changed the title ~~ROX-33339: Optimize Sensor reprocessor gRPC and AC cache churn~~ ROX-33339: Optimize Sensor reprocessor gRPC and AC image cache churn Apr 7, 2026

Base automatically changed from boo-sensor-refresh-invalidate-reprocessor to master April 8, 2026 18:48

clickboo force-pushed the boo-central-refresh-ttl-invalidate branch 2 times, most recently from 4953faf to 915f7d7 Compare April 8, 2026 19:44

github-actions bot added the area/ui label Apr 8, 2026

clickboo force-pushed the boo-central-refresh-ttl-invalidate branch from 915f7d7 to 7361076 Compare April 8, 2026 19:50

This comment was marked as low quality.

Sign in to view

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

central/reprocessor/reprocessor.go Show resolved Hide resolved

sensor/common/detector/detector.go Outdated Show resolved Hide resolved

ui/apps/platform/src/Containers/Policies/Wizard/Step4/PolicyScopeForm.tsx Outdated Show resolved Hide resolved

clickboo force-pushed the boo-central-refresh-ttl-invalidate branch 3 times, most recently from 2a2b7e9 to 27260bd Compare April 9, 2026 08:57

ROX-33339: Optimize Sensor reprocessor gRPC and AC image cache churn

31a924a

clickboo force-pushed the boo-central-refresh-ttl-invalidate branch from 27260bd to 31a924a Compare April 9, 2026 12:10

clickboo marked this pull request as ready for review April 9, 2026 12:10

clickboo requested review from a team as code owners April 9, 2026 12:10

openshift-ci bot removed the do-not-merge/work-in-progress label Apr 9, 2026

This comment was marked as low quality.

Sign in to view

clickboo requested review from charmik-redhat and dcaravel April 9, 2026 12:20

stackrox deleted a comment from openshift-ci bot Apr 9, 2026

Reverting unnecessary comment deletion

5b5eb15

charmik-redhat reviewed Apr 10, 2026

View reviewed changes

central/reprocessor/reprocessor.go Show resolved Hide resolved

sensor/common/detector/detector.go Show resolved Hide resolved

sachaudh removed the area/ui label Apr 10, 2026

charmik-redhat approved these changes Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROX-33339: Optimize Sensor reprocessor gRPC and AC image cache churn#19840

ROX-33339: Optimize Sensor reprocessor gRPC and AC image cache churn#19840
clickboo wants to merge 2 commits intomasterfrom
boo-central-refresh-ttl-invalidate

clickboo commented Apr 6, 2026 •

edited

Loading

Uh oh!

clickboo commented Apr 6, 2026 •

edited

Loading

Uh oh!

This comment was marked as off-topic.

Uh oh!

github-actions bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

This comment was marked as low quality.

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as low quality.

Uh oh!

Uh oh!

Uh oh!

sachaudh commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

clickboo commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Central -> Sensor gRPC message and cache operation comparison (short-circuit path, per cluster)

Sensor detection during reassess (no image changes)

User-facing documentation

Testing and quality

Automated testing

How I validated my change

Testing

Uh oh!

clickboo commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Build Images Ready

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as low quality.

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as low quality.

Uh oh!

Uh oh!

Uh oh!

sachaudh commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clickboo commented Apr 6, 2026 •

edited

Loading

clickboo commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading

codecov bot commented Apr 6, 2026 •

edited

Loading