Skip to content

perf(ci): enable full Go test caching (fixed ldflags + stable mtimes)#19618

Closed
davdhacs wants to merge 14 commits intomasterfrom
davdhacs/test-cache-combined-minimal
Closed

perf(ci): enable full Go test caching (fixed ldflags + stable mtimes)#19618
davdhacs wants to merge 14 commits intomasterfrom
davdhacs/test-cache-combined-minimal

Conversation

@davdhacs
Copy link
Copy Markdown
Contributor

@davdhacs davdhacs commented Mar 25, 2026

Description

Combined demo of all CI optimization PRs. Do not merge — merge the individual PRs instead.

Overall impact

Workflow Master Optimized Speedup
Unit Tests (warm) 33-36m 3.2m 10x
Unit Tests (cold) 33-36m 11m 3x
Style 20m 13m 1.5x
Total compute (unit-tests) 188m 54m 71% less

Individual PRs and what they do

Foundation (merge first, any order):

PR Change Impact
#19617 Stable test ldflags (0.0.0-test) Enables test cache Phase 1
#19395 Stabilize file mtimes after checkout Enables test cache Phase 2
#19580 Fix GHA cache keys (GOTAGS suffix, scanner GOARCH) Prevents cache thrashing between matrix entries
#19688 Remove GOMODCACHE caching Cache step 60-100s faster, all jobs unchanged or faster

Workflow overhaul (merge after foundation):

PR Change Impact
#19678 Unit-tests: remove containers, 5-way Go sharding, postgres sharding, integration split, roxctl split, cypress action, skip bench on PRs 33m → 3.2m warm, 188m → 54m compute
#19312 Style: split into 5 parallel jobs, shard roxvet 3-way 20m → 13m

Key findings

  • Go doesn't cache linked binaries for go build (source). Stable ldflags only help test caching, not build caching.
  • GOMODCACHE (2.9GB) was the cache step bottleneck, not GOCACHE. Go proxy is as fast as GHA cache restore.
  • -count=1 on integration tests was added in 2019 for external service tests, but 38/50 packages use mocks.
  • bash 5.2+ treats & in replacements as matched pattern (like sed). CI container had bash 5.1, ubuntu-latest has 5.2+.
  • Smaller PRs = faster CI — test cache hits are per-package.

Unit Tests warm run breakdown (run 23728217606)

Job Time
go (pkg-helm) 1.0m
go-postgres (migrator) 1.0m
shell-unit-tests 1.0m
openshift-ci 1.5m
go-integration 1.6m
go (central-2) 2.0m
go (pkg-other) 2.0m
go-postgres (main) 2.1m
go (rest) 2.6m
go (central-1) 2.7m
go-operator-integration 2.8m
local-roxctl (release) 2.9m
sensor-integration 3.0m
ui-component 3.1m
local-roxctl (dev) 3.2m

Partially generated by AI.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 25, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The git ls-files | xargs touch mtime stabilization currently applies to every tracked file; consider narrowing this to files relevant to Go builds/tests (or excluding known problematic paths like large assets or build outputs) to avoid surprising other tools that rely on mtimes during the same workflow.
  • The hard-coded VERSION_PKG="github.com/stackrox/rox/pkg/version/internal" in go-tool.sh will silently break if the version package is moved/renamed; consider deriving this import path from go list or centralizing it in a single shared variable/source of truth.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `git ls-files | xargs touch` mtime stabilization currently applies to every tracked file; consider narrowing this to files relevant to Go builds/tests (or excluding known problematic paths like large assets or build outputs) to avoid surprising other tools that rely on mtimes during the same workflow.
- The hard-coded `VERSION_PKG="github.com/stackrox/rox/pkg/version/internal"` in `go-tool.sh` will silently break if the version package is moved/renamed; consider deriving this import path from `go list` or centralizing it in a single shared variable/source of truth.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Mar 25, 2026

Images are ready for the commit at ca76592.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-524-gca76592f2e.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.66%. Comparing base (0ca1671) to head (ca76592).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19618   +/-   ##
=======================================
  Coverage   49.66%   49.66%           
=======================================
  Files        2748     2748           
  Lines      207354   207354           
=======================================
+ Hits       102987   102990    +3     
+ Misses      96711    96708    -3     
  Partials     7656     7656           
Flag Coverage Δ
go-unit-tests 49.66% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@davdhacs davdhacs force-pushed the davdhacs/test-cache-combined-minimal branch from fb71343 to b3de047 Compare March 26, 2026 04:36
davdhacs and others added 8 commits March 27, 2026 22:59
Tests don't need real version info. Use fixed version strings
(0.0.0-test) instead of git-describe values for test ldflags.
This makes link ActionIDs stable across commits, enabling Go's
test result cache to hit.

Also adds -buildvcs=false to stop Go 1.18+'s VCS stamping which
independently changes link ActionIDs on every commit.

Builds keep the current XDef/status.sh mechanism unchanged.

Verified locally: tests cache across commits with fixed ldflags.

Combined with the mtime fix (#19395), this enables full warm
test caching in CI (2.5-8.2x faster test jobs).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Go's test cache validates inputs by (path, size, mtime) — not content.
git checkout sets all file mtimes to the current time, which differs
between CI runs, causing test results to never be cached.

Fix: set all tracked file mtimes to a fixed date (2001-01-01) after
checkout and cache restore. This makes the test cache hit across runs
when file content hasn't changed.

Measured impact (from PR #19201):
- go unit tests: 35m → 8m (4.5x)
- go-postgres: 31m → 36s (52x)
- sensor-integration: 26m → 2.5m (11x)
- Test cache hit rate: 97.6% (681/698 packages)

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
git ls-files only outputs files, not directories. Tests that read
directories at runtime (os.ReadDir, filepath.Walk) check directory
mtimes for cache validation. Without stable dir mtimes, these tests
always miss the cache.

The helm chart tests (pkg/helm/charts/tests/*) use testdata directories
and were the largest uncached tests at 712s combined. This fix caches
them, dropping the Go Unit Tests step from 486s to 85s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… entries

pre-build-go-binaries: default and prerelease (GOTAGS=release) share 93%
of cache entries (only 6 packages use release build tags). Separating them
doubles cache storage and causes the trim to delete shared entries. Keep
key-suffix only for race-condition-debug which uses -race/CGO_ENABLED=1
and has 39% unique entries.

build-and-push-operator: branding (RHACS vs STACKROX) is a runtime env
var, not a build tag. Compiled output is identical. Remove key-suffix
entirely.

Verified locally:
  default:    8911 entries (baseline)
  prerelease: +691 new (7% unique, 93% shared)
  race:       +6300 new (39% unique, 61% shared)
  branding:   +0 new (0% unique, 100% shared)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
unit-tests.yaml: go and go-postgres have GOTAGS matrix (""/ release)
sharing one cache key. With the GOCACHE trim, each variant deletes
the other's entries on every run, causing 3GB of churn. Add key-suffix
to give each variant its own cache.

scanner-build.yaml: pre-build-scanner-go-binary cross-compiles for
multiple architectures but doesn't set GOARCH on the cache step,
so all arches share one amd64-labeled cache key. Add env GOARCH.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address sourcery-ai review: make GHA expression grouping explicit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The operator branding key-suffix is unnecessary for GOCACHE (0% unique
entries), but PR #19417 will unify operator builds entirely. Removing
the key-suffix here would create a merge conflict with that PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address sourcery-ai feedback: clarify why only race-condition-debug
gets a separate cache key (39% unique entries from -race/CGO_ENABLED=1)
while default and prerelease share (93% overlap).

GOARCH is intentionally set only on the cache step, not job-level,
to avoid changing the default arch for other steps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs force-pushed the davdhacs/test-cache-combined-minimal branch from c224e54 to 029b0e6 Compare March 28, 2026 05:00
…-combined-minimal

# Conflicts:
#	.github/actions/cache-go-dependencies/action.yaml
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Chores
    • Optimized Go build cache efficiency to improve cache hit rates across CI runs by standardizing file modification times.
    • Enhanced build caching strategy to enable better dependency sharing across configurations while maintaining architecture-specific cache isolation where needed.
    • Refined build script logic for improved dependency management in test and production environments.

Walkthrough

The PR updates GitHub Actions workflows and a shell script. The cache-go-dependencies composite action simplifies control flow and adds file mtime stabilization to improve cache hits. Three workflows are updated with conditional cache key-suffix inputs. The go-tool.sh script refactors version symbol handling with conditional logic based on tool type.

Changes

Cohort / File(s) Summary
Cache Go Dependencies Action
.github/actions/cache-go-dependencies/action.yaml
Simplified cache step control flow by removing github.event_name and branch conditions; added bash step to stabilize Git-tracked file and directory mtimes by touching all tracked files to a fixed timestamp to improve Go test cache consistency across CI runs.
Workflow Cache Configuration
.github/workflows/build.yaml, .github/workflows/scanner-build.yaml, .github/workflows/unit-tests.yaml
Updated cache action invocations: build.yaml conditionally selects cache key suffix (race vs. empty for shared cache); scanner-build.yaml passes GOARCH as environment variable for architecture-specific caching; unit-tests.yaml adds explicit key-suffix input keyed to matrix.gotags for per-tag cache segregation.
Version Symbol Script
scripts/go-tool.sh
Refactored version symbol generation with conditional logic: when TOOL == "test", uses hardcoded test version ldflags; otherwise, retains prior two-pass pipeline refactored to use REPO_ROOT variable and git grep for scanning //XDef: markers. Error handling for missing status vars now only evaluated in non-test path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: enabling full Go test caching through fixed ldflags and stable mtimes, which directly aligns with the core technical objectives.
Description check ✅ Passed The description is clearly related to the changeset, explaining this is a combined demo of CI optimization PRs with detailed impact metrics, individual PR references, and technical findings supporting the changes made.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch davdhacs/test-cache-combined-minimal

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/actions/cache-go-dependencies/action.yaml:
- Around line 75-89: The directory stabilization only touches each file's
immediate parent (the pipeline using git ls-files | sed 's|/[^/]*$||' | sort -u
| xargs touch -t 200101010000), so higher-level ancestors and the repo root
remain with checkout-time mtimes; replace that pipeline with one that enumerates
all ancestor directories for each tracked path (including '.' for top-level
files), deduplicates them and then touch -t 200101010000; keep the original git
ls-files and final touch -t timestamp but change the sed step to a command that
emits every ancestor directory for each file so os.ReadDir/filepath.Walk see
stabilized mtimes.

In @.github/workflows/build.yaml:
- Around line 243-246: Update the cache key-suffix logic so it also appends
"race" when the later race-setup step enables RACE for non-race matrix legs:
modify the expression used for key-suffix (currently referencing matrix.name) to
also check the RACE environment flag set by the race-setup step (env.RACE ==
'true') and return 'race' when true; this targets the key-suffix setting so
default/prerelease amd64 runs with RACE=true will use the race-specific cache
lineage instead of sharing the empty suffix.

In @.github/workflows/scanner-build.yaml:
- Around line 104-107: The cache step "Cache Go dependencies" currently only
varies by GOARCH, causing the amd64 "race-condition-debug" leg to share caches;
update this step to include the scanner matrix dimension (e.g., matrix.scanner,
matrix.variant or whatever key defines default/prerelease/race-condition-debug)
in the cache lineage by supplying that matrix value to the cache action (either
via env like SCANNER_VARIANT: ${{ matrix.scanner }} or a with: input) and ensure
the cache key generation inside ./.github/actions/cache-go-dependencies
incorporates that variable so race-only entries use a distinct cache key.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 081a1eee-f01d-481e-92ab-cf6644e53179

📥 Commits

Reviewing files that changed from the base of the PR and between c4abfa7 and 79cc3bb.

📒 Files selected for processing (5)
  • .github/actions/cache-go-dependencies/action.yaml
  • .github/workflows/build.yaml
  • .github/workflows/scanner-build.yaml
  • .github/workflows/unit-tests.yaml
  • scripts/go-tool.sh

Comment on lines +75 to +89
- name: Stabilize file mtimes for Go test cache
run: |
# Go's test cache validates inputs by (path, size, mtime) — NOT content.
# git checkout sets all mtimes to "now", which differs between CI runs,
# causing every test to miss the cache. Setting a fixed mtime makes the
# test cache hit across runs when file content hasn't changed.
#
# Risk: if a non-.go testdata file changes content but not size between
# commits, the test cache could return a stale result. This is extremely
# rare and Go's own test cache has the same inherent limitation.
git ls-files | xargs touch -t 200101010000
# Also stabilize directory mtimes — tests using os.ReadDir or
# filepath.Walk check directory mtimes too. git ls-files only
# lists files, not directories.
git ls-files | sed 's|/[^/]*$||' | sort -u | xargs touch -t 200101010000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stabilize the full directory tree, not only direct parents.

The directory pass only touches the immediate parent of each tracked file. foo/bar/baz.txt stabilizes foo/bar, but not foo or .; top-level files also resolve back to the filename instead of the repo root. Any tests that walk or stat a higher-level directory will still see checkout-time mtimes and miss the Go test cache.

Suggested fix
-        git ls-files | xargs touch -t 200101010000
+        git ls-files -z | xargs -0 touch -t 200101010000
         # Also stabilize directory mtimes — tests using os.ReadDir or
         # filepath.Walk check directory mtimes too. git ls-files only
         # lists files, not directories.
-        git ls-files | sed 's|/[^/]*$||' | sort -u | xargs touch -t 200101010000
+        {
+          printf '.\0'
+          git ls-files -z | while IFS= read -r -d '' path; do
+            dir=$(dirname "$path")
+            while [[ "$dir" != "." && "$dir" != "/" ]]; do
+              printf '%s\0' "$dir"
+              dir=$(dirname "$dir")
+            done
+          done
+        } | sort -zu | xargs -0 touch -t 200101010000
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/actions/cache-go-dependencies/action.yaml around lines 75 - 89, The
directory stabilization only touches each file's immediate parent (the pipeline
using git ls-files | sed 's|/[^/]*$||' | sort -u | xargs touch -t 200101010000),
so higher-level ancestors and the repo root remain with checkout-time mtimes;
replace that pipeline with one that enumerates all ancestor directories for each
tracked path (including '.' for top-level files), deduplicates them and then
touch -t 200101010000; keep the original git ls-files and final touch -t
timestamp but change the sed step to a command that emits every ancestor
directory for each file so os.ReadDir/filepath.Walk see stabilized mtimes.

Comment on lines 243 to +246
with:
key-suffix: ${{ matrix.name }}
# Only race needs a separate cache (-race/CGO_ENABLED=1 produces 39% unique entries).
# Default and prerelease share 93% of entries and should use the same cache.
key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Account for label-enabled race builds in the cache key.

The later race-setup step can also turn on RACE=true for the default and prerelease amd64 legs when the PR has ci-race-tests. Those runs still use the empty suffix here, so race and non-race entries end up sharing the same cache lineage.

Suggested fix
-          key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }}
+          key-suffix: ${{ ((matrix.arch == 'amd64' && (matrix.name == 'race-condition-debug' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'ci-race-tests')))) && 'race') || '' }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with:
key-suffix: ${{ matrix.name }}
# Only race needs a separate cache (-race/CGO_ENABLED=1 produces 39% unique entries).
# Default and prerelease share 93% of entries and should use the same cache.
key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }}
with:
# Only race needs a separate cache (-race/CGO_ENABLED=1 produces 39% unique entries).
# Default and prerelease share 93% of entries and should use the same cache.
key-suffix: ${{ ((matrix.arch == 'amd64' && (matrix.name == 'race-condition-debug' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'ci-race-tests')))) && 'race') || '' }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/build.yaml around lines 243 - 246, Update the cache
key-suffix logic so it also appends "race" when the later race-setup step
enables RACE for non-race matrix legs: modify the expression used for key-suffix
(currently referencing matrix.name) to also check the RACE environment flag set
by the race-setup step (env.RACE == 'true') and return 'race' when true; this
targets the key-suffix setting so default/prerelease amd64 runs with RACE=true
will use the race-specific cache lineage instead of sharing the empty suffix.

Comment on lines 104 to 107
- name: Cache Go dependencies
env:
GOARCH: ${{ matrix.goarch }}
uses: ./.github/actions/cache-go-dependencies
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Split the scanner cache for race-condition-debug too.

This matrix still has default, prerelease, and race-condition-debug, but the cache step only varies by GOARCH. The amd64 race leg therefore shares the same cache lineage as the non-race legs, so the warm cache for that arch becomes scheduler-dependent and the race-only entries are often lost.

Suggested fix
     - name: Cache Go dependencies
       env:
         GOARCH: ${{ matrix.goarch }}
       uses: ./.github/actions/cache-go-dependencies
+      with:
+        key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Cache Go dependencies
env:
GOARCH: ${{ matrix.goarch }}
uses: ./.github/actions/cache-go-dependencies
- name: Cache Go dependencies
env:
GOARCH: ${{ matrix.goarch }}
uses: ./.github/actions/cache-go-dependencies
with:
key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/scanner-build.yaml around lines 104 - 107, The cache step
"Cache Go dependencies" currently only varies by GOARCH, causing the amd64
"race-condition-debug" leg to share caches; update this step to include the
scanner matrix dimension (e.g., matrix.scanner, matrix.variant or whatever key
defines default/prerelease/race-condition-debug) in the cache lineage by
supplying that matrix value to the cache action (either via env like
SCANNER_VARIANT: ${{ matrix.scanner }} or a with: input) and ensure the cache
key generation inside ./.github/actions/cache-go-dependencies incorporates that
variable so race-only entries use a distinct cache key.

@davdhacs
Copy link
Copy Markdown
Contributor Author

Closing this demo PR — its individual component PRs have been merged or are in review:

Merged:

Remaining:

Once #19617 lands, the full test cache pipeline is complete.

@davdhacs davdhacs closed this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants