Skip to content

ROX-33555: Wire VM relay ACK flow with rate limiting and UMH#19321

Draft
vikin91 wants to merge 1 commit intopiotr/ROX-32316-umh-node-ackfrom
piotr/ROX-32316-vm-relay-ack-flow
Draft

ROX-33555: Wire VM relay ACK flow with rate limiting and UMH#19321
vikin91 wants to merge 1 commit intopiotr/ROX-32316-umh-node-ackfrom
piotr/ROX-32316-vm-relay-ack-flow

Conversation

@vikin91
Copy link
Copy Markdown
Contributor

@vikin91 vikin91 commented Mar 6, 2026

Description

Integrates the VM relay with the per-resource UMH (from the parent PR) and replaces the sender's
inline retry loop with a single-attempt send. Retry responsibility now lives in the UMH, which
tracks ACK state per VSOCK ID.

What changed:

  • Relay (compliance/virtualmachines/relay/relay.go): Added per-VSOCK rate limiting (leaky
    bucket via golang.org/x/time/rate), UMH integration (ObserveSending/OnACK), and a metadata
    cache that tracks updatedAt / lastAckedAt per VM for stale-ACK detection. Reports that exceed
    the rate limit are dropped with a metric — the agent will resubmit on its own schedule.
  • Sender (index_report_sender.go): Removed the 10-retry retry.WithRetry loop and
    isRetryableGRPCError helper. The sender now makes a single gRPC call; failures are reported
    back so the UMH can schedule a retry at the appropriate backoff interval. Added per-attempt
    latency and result metrics.
  • Compliance (compliance.go): Added umhVMIndex field d handleVMIndexACK to forward
    ComplianceACK messages for VM_INDEX_REPORT to the relay's UMH. The VM relay startup now reads
    ROX_VM_RELAY_MAX_REPORTS_PER_MINUTE and ROX_VM_RELAY_STALE_ACK_THRESHOLD from env.
  • Metrics (relay/metrics/metrics.go): New counters/histograms for send attempts, rate limiting,
    and ACKs received.
  • Env vars (pkg/env/virtualmachine.go): ROX_VM_RELAY_MAX_REPORTS_PER_MINUTE (default 1.0)
    and ROX_VM_RELAY_STALE_ACK_THRESHOLD (default 4h).

Bug fix during split: The feature branch passed *v1.IndexReport to sender.Send() which
expects *v1.VMReport. Fixed handleIncomingReport to carry the full VMReport through.

Depends on: piotr/ROX-32316-umh-node-ack (UMH per-resource refactor).

AI-assisted: code was extracted and adapted from a larger feature branch by AI,
with a type mismatch bug fix applied during the split. Reviewed and verified by the author.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • Unit tests
  • On a cluster

@vikin91
Copy link
Copy Markdown
Contributor Author

vikin91 commented Mar 6, 2026

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 6, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Mar 6, 2026

Images are ready for the commit at 283b771.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-387-g283b771fec.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 6, 2026

Codecov Report

❌ Patch coverage is 74.39024% with 84 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.62%. Comparing base (35bfdb7) to head (cbee5a2).
⚠️ Report is 85 commits behind head on piotr/ROX-32316-umh-node-ack.

Files with missing lines Patch % Lines
compliance/compliance.go 48.88% 46 Missing ⚠️
compliance/virtualmachines/relay/relay.go 80.20% 17 Missing and 2 partials ⚠️
pkg/retry/handler/unconfirmed_message_handler.go 84.80% 15 Missing and 4 partials ⚠️
Additional details and impacted files
@@                       Coverage Diff                        @@
##           piotr/ROX-32316-umh-node-ack   #19321      +/-   ##
================================================================
- Coverage                         49.68%   49.62%   -0.07%     
================================================================
  Files                              2695     2696       +1     
  Lines                            202798   203349     +551     
================================================================
+ Hits                             100757   100908     +151     
- Misses                            94527    94921     +394     
- Partials                           7514     7520       +6     
Flag Coverage Δ
go-unit-tests 49.62% <74.39%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vikin91 vikin91 force-pushed the piotr/ROX-32316-vm-relay-ack-flow branch from 63a45ce to 0ad5d24 Compare March 6, 2026 13:29
@vikin91 vikin91 changed the title ROX-32316: Wire VM relay ACK flow with rate limiting and UMH ROX-32848: Wire VM relay ACK flow with rate limiting and UMH Mar 6, 2026
@vikin91 vikin91 force-pushed the piotr/ROX-32316-umh-node-ack branch from 7faf2a3 to d136021 Compare March 9, 2026 10:05
@vikin91 vikin91 force-pushed the piotr/ROX-32316-vm-relay-ack-flow branch from 0ad5d24 to c795476 Compare March 9, 2026 10:05
@vikin91 vikin91 changed the title ROX-32848: Wire VM relay ACK flow with rate limiting and UMH ROX-33555: Wire VM relay ACK flow with rate limiting and UMH Mar 11, 2026
@vikin91 vikin91 force-pushed the piotr/ROX-32316-umh-node-ack branch from d136021 to 0ea378c Compare March 11, 2026 09:47
@vikin91 vikin91 force-pushed the piotr/ROX-32316-vm-relay-ack-flow branch from c795476 to cbee5a2 Compare March 11, 2026 09:48
@vikin91 vikin91 force-pushed the piotr/ROX-32316-umh-node-ack branch from 0ea378c to d3322c9 Compare March 18, 2026 12:16
@vikin91 vikin91 force-pushed the piotr/ROX-32316-umh-node-ack branch from d3322c9 to 0085a80 Compare March 19, 2026 11:37
@vikin91 vikin91 force-pushed the piotr/ROX-32316-vm-relay-ack-flow branch from cbee5a2 to 283b771 Compare March 19, 2026 14:53
@vikin91 vikin91 force-pushed the piotr/ROX-32316-umh-node-ack branch from 6492540 to 950ed7c Compare April 9, 2026 08:50
Integrates the VM relay with the per-resource UMH from the previous commit.
The relay now rate-limits reports per VSOCK ID (leaky bucket), tracks ACK
metadata for stale-ACK detection, and delegates retry responsibility to UMH
instead of retrying inline in the sender. The sender is simplified to a
single-attempt send. Adds handleVMIndexACK in compliance to forward
ComplianceACK messages to the VM relay's UMH.

Also fixes type mismatch in relay where handleIncomingReport passed
*IndexReport to sender.Send() which expects *VMReport.

AI-assisted: code was extracted from the feature branch by AI, with bug
fixes applied during the split. Reviewed and verified by the author.
@vikin91 vikin91 force-pushed the piotr/ROX-32316-vm-relay-ack-flow branch from 283b771 to a9270b7 Compare April 9, 2026 08:55
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

🚀 Build Images Ready

Images are ready for commit a9270b7. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-612-ga9270b7b6e

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 10, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants