perf(ci): enable full Go test caching (fixed ldflags + stable mtimes)#19618
perf(ci): enable full Go test caching (fixed ldflags + stable mtimes)#19618
Conversation
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
git ls-files | xargs touchmtime stabilization currently applies to every tracked file; consider narrowing this to files relevant to Go builds/tests (or excluding known problematic paths like large assets or build outputs) to avoid surprising other tools that rely on mtimes during the same workflow. - The hard-coded
VERSION_PKG="github.com/stackrox/rox/pkg/version/internal"ingo-tool.shwill silently break if the version package is moved/renamed; consider deriving this import path fromgo listor centralizing it in a single shared variable/source of truth.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `git ls-files | xargs touch` mtime stabilization currently applies to every tracked file; consider narrowing this to files relevant to Go builds/tests (or excluding known problematic paths like large assets or build outputs) to avoid surprising other tools that rely on mtimes during the same workflow.
- The hard-coded `VERSION_PKG="github.com/stackrox/rox/pkg/version/internal"` in `go-tool.sh` will silently break if the version package is moved/renamed; consider deriving this import path from `go list` or centralizing it in a single shared variable/source of truth.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Images are ready for the commit at ca76592. To use with deploy scripts, first |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19618 +/- ##
=======================================
Coverage 49.66% 49.66%
=======================================
Files 2748 2748
Lines 207354 207354
=======================================
+ Hits 102987 102990 +3
+ Misses 96711 96708 -3
Partials 7656 7656
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fb71343 to
b3de047
Compare
Tests don't need real version info. Use fixed version strings (0.0.0-test) instead of git-describe values for test ldflags. This makes link ActionIDs stable across commits, enabling Go's test result cache to hit. Also adds -buildvcs=false to stop Go 1.18+'s VCS stamping which independently changes link ActionIDs on every commit. Builds keep the current XDef/status.sh mechanism unchanged. Verified locally: tests cache across commits with fixed ldflags. Combined with the mtime fix (#19395), this enables full warm test caching in CI (2.5-8.2x faster test jobs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Go's test cache validates inputs by (path, size, mtime) — not content. git checkout sets all file mtimes to the current time, which differs between CI runs, causing test results to never be cached. Fix: set all tracked file mtimes to a fixed date (2001-01-01) after checkout and cache restore. This makes the test cache hit across runs when file content hasn't changed. Measured impact (from PR #19201): - go unit tests: 35m → 8m (4.5x) - go-postgres: 31m → 36s (52x) - sensor-integration: 26m → 2.5m (11x) - Test cache hit rate: 97.6% (681/698 packages) Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
git ls-files only outputs files, not directories. Tests that read directories at runtime (os.ReadDir, filepath.Walk) check directory mtimes for cache validation. Without stable dir mtimes, these tests always miss the cache. The helm chart tests (pkg/helm/charts/tests/*) use testdata directories and were the largest uncached tests at 712s combined. This fix caches them, dropping the Go Unit Tests step from 486s to 85s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… entries pre-build-go-binaries: default and prerelease (GOTAGS=release) share 93% of cache entries (only 6 packages use release build tags). Separating them doubles cache storage and causes the trim to delete shared entries. Keep key-suffix only for race-condition-debug which uses -race/CGO_ENABLED=1 and has 39% unique entries. build-and-push-operator: branding (RHACS vs STACKROX) is a runtime env var, not a build tag. Compiled output is identical. Remove key-suffix entirely. Verified locally: default: 8911 entries (baseline) prerelease: +691 new (7% unique, 93% shared) race: +6300 new (39% unique, 61% shared) branding: +0 new (0% unique, 100% shared) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
unit-tests.yaml: go and go-postgres have GOTAGS matrix (""/ release)
sharing one cache key. With the GOCACHE trim, each variant deletes
the other's entries on every run, causing 3GB of churn. Add key-suffix
to give each variant its own cache.
scanner-build.yaml: pre-build-scanner-go-binary cross-compiles for
multiple architectures but doesn't set GOARCH on the cache step,
so all arches share one amd64-labeled cache key. Add env GOARCH.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address sourcery-ai review: make GHA expression grouping explicit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The operator branding key-suffix is unnecessary for GOCACHE (0% unique entries), but PR #19417 will unify operator builds entirely. Removing the key-suffix here would create a merge conflict with that PR. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address sourcery-ai feedback: clarify why only race-condition-debug gets a separate cache key (39% unique entries from -race/CGO_ENABLED=1) while default and prerelease share (93% overlap). GOARCH is intentionally set only on the cache step, not job-level, to avoid changing the default arch for other steps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
c224e54 to
029b0e6
Compare
…-combined-minimal
…-combined-minimal # Conflicts: # .github/actions/cache-go-dependencies/action.yaml
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe PR updates GitHub Actions workflows and a shell script. The cache-go-dependencies composite action simplifies control flow and adds file mtime stabilization to improve cache hits. Three workflows are updated with conditional cache key-suffix inputs. The go-tool.sh script refactors version symbol handling with conditional logic based on tool type. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsTimed out fetching pipeline failures after 30000ms Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/actions/cache-go-dependencies/action.yaml:
- Around line 75-89: The directory stabilization only touches each file's
immediate parent (the pipeline using git ls-files | sed 's|/[^/]*$||' | sort -u
| xargs touch -t 200101010000), so higher-level ancestors and the repo root
remain with checkout-time mtimes; replace that pipeline with one that enumerates
all ancestor directories for each tracked path (including '.' for top-level
files), deduplicates them and then touch -t 200101010000; keep the original git
ls-files and final touch -t timestamp but change the sed step to a command that
emits every ancestor directory for each file so os.ReadDir/filepath.Walk see
stabilized mtimes.
In @.github/workflows/build.yaml:
- Around line 243-246: Update the cache key-suffix logic so it also appends
"race" when the later race-setup step enables RACE for non-race matrix legs:
modify the expression used for key-suffix (currently referencing matrix.name) to
also check the RACE environment flag set by the race-setup step (env.RACE ==
'true') and return 'race' when true; this targets the key-suffix setting so
default/prerelease amd64 runs with RACE=true will use the race-specific cache
lineage instead of sharing the empty suffix.
In @.github/workflows/scanner-build.yaml:
- Around line 104-107: The cache step "Cache Go dependencies" currently only
varies by GOARCH, causing the amd64 "race-condition-debug" leg to share caches;
update this step to include the scanner matrix dimension (e.g., matrix.scanner,
matrix.variant or whatever key defines default/prerelease/race-condition-debug)
in the cache lineage by supplying that matrix value to the cache action (either
via env like SCANNER_VARIANT: ${{ matrix.scanner }} or a with: input) and ensure
the cache key generation inside ./.github/actions/cache-go-dependencies
incorporates that variable so race-only entries use a distinct cache key.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 081a1eee-f01d-481e-92ab-cf6644e53179
📒 Files selected for processing (5)
.github/actions/cache-go-dependencies/action.yaml.github/workflows/build.yaml.github/workflows/scanner-build.yaml.github/workflows/unit-tests.yamlscripts/go-tool.sh
| - name: Stabilize file mtimes for Go test cache | ||
| run: | | ||
| # Go's test cache validates inputs by (path, size, mtime) — NOT content. | ||
| # git checkout sets all mtimes to "now", which differs between CI runs, | ||
| # causing every test to miss the cache. Setting a fixed mtime makes the | ||
| # test cache hit across runs when file content hasn't changed. | ||
| # | ||
| # Risk: if a non-.go testdata file changes content but not size between | ||
| # commits, the test cache could return a stale result. This is extremely | ||
| # rare and Go's own test cache has the same inherent limitation. | ||
| git ls-files | xargs touch -t 200101010000 | ||
| # Also stabilize directory mtimes — tests using os.ReadDir or | ||
| # filepath.Walk check directory mtimes too. git ls-files only | ||
| # lists files, not directories. | ||
| git ls-files | sed 's|/[^/]*$||' | sort -u | xargs touch -t 200101010000 |
There was a problem hiding this comment.
Stabilize the full directory tree, not only direct parents.
The directory pass only touches the immediate parent of each tracked file. foo/bar/baz.txt stabilizes foo/bar, but not foo or .; top-level files also resolve back to the filename instead of the repo root. Any tests that walk or stat a higher-level directory will still see checkout-time mtimes and miss the Go test cache.
Suggested fix
- git ls-files | xargs touch -t 200101010000
+ git ls-files -z | xargs -0 touch -t 200101010000
# Also stabilize directory mtimes — tests using os.ReadDir or
# filepath.Walk check directory mtimes too. git ls-files only
# lists files, not directories.
- git ls-files | sed 's|/[^/]*$||' | sort -u | xargs touch -t 200101010000
+ {
+ printf '.\0'
+ git ls-files -z | while IFS= read -r -d '' path; do
+ dir=$(dirname "$path")
+ while [[ "$dir" != "." && "$dir" != "/" ]]; do
+ printf '%s\0' "$dir"
+ dir=$(dirname "$dir")
+ done
+ done
+ } | sort -zu | xargs -0 touch -t 200101010000🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/actions/cache-go-dependencies/action.yaml around lines 75 - 89, The
directory stabilization only touches each file's immediate parent (the pipeline
using git ls-files | sed 's|/[^/]*$||' | sort -u | xargs touch -t 200101010000),
so higher-level ancestors and the repo root remain with checkout-time mtimes;
replace that pipeline with one that enumerates all ancestor directories for each
tracked path (including '.' for top-level files), deduplicates them and then
touch -t 200101010000; keep the original git ls-files and final touch -t
timestamp but change the sed step to a command that emits every ancestor
directory for each file so os.ReadDir/filepath.Walk see stabilized mtimes.
| with: | ||
| key-suffix: ${{ matrix.name }} | ||
| # Only race needs a separate cache (-race/CGO_ENABLED=1 produces 39% unique entries). | ||
| # Default and prerelease share 93% of entries and should use the same cache. | ||
| key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }} |
There was a problem hiding this comment.
Account for label-enabled race builds in the cache key.
The later race-setup step can also turn on RACE=true for the default and prerelease amd64 legs when the PR has ci-race-tests. Those runs still use the empty suffix here, so race and non-race entries end up sharing the same cache lineage.
Suggested fix
- key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }}
+ key-suffix: ${{ ((matrix.arch == 'amd64' && (matrix.name == 'race-condition-debug' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'ci-race-tests')))) && 'race') || '' }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with: | |
| key-suffix: ${{ matrix.name }} | |
| # Only race needs a separate cache (-race/CGO_ENABLED=1 produces 39% unique entries). | |
| # Default and prerelease share 93% of entries and should use the same cache. | |
| key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }} | |
| with: | |
| # Only race needs a separate cache (-race/CGO_ENABLED=1 produces 39% unique entries). | |
| # Default and prerelease share 93% of entries and should use the same cache. | |
| key-suffix: ${{ ((matrix.arch == 'amd64' && (matrix.name == 'race-condition-debug' || (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'ci-race-tests')))) && 'race') || '' }} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/build.yaml around lines 243 - 246, Update the cache
key-suffix logic so it also appends "race" when the later race-setup step
enables RACE for non-race matrix legs: modify the expression used for key-suffix
(currently referencing matrix.name) to also check the RACE environment flag set
by the race-setup step (env.RACE == 'true') and return 'race' when true; this
targets the key-suffix setting so default/prerelease amd64 runs with RACE=true
will use the race-specific cache lineage instead of sharing the empty suffix.
| - name: Cache Go dependencies | ||
| env: | ||
| GOARCH: ${{ matrix.goarch }} | ||
| uses: ./.github/actions/cache-go-dependencies |
There was a problem hiding this comment.
Split the scanner cache for race-condition-debug too.
This matrix still has default, prerelease, and race-condition-debug, but the cache step only varies by GOARCH. The amd64 race leg therefore shares the same cache lineage as the non-race legs, so the warm cache for that arch becomes scheduler-dependent and the race-only entries are often lost.
Suggested fix
- name: Cache Go dependencies
env:
GOARCH: ${{ matrix.goarch }}
uses: ./.github/actions/cache-go-dependencies
+ with:
+ key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Cache Go dependencies | |
| env: | |
| GOARCH: ${{ matrix.goarch }} | |
| uses: ./.github/actions/cache-go-dependencies | |
| - name: Cache Go dependencies | |
| env: | |
| GOARCH: ${{ matrix.goarch }} | |
| uses: ./.github/actions/cache-go-dependencies | |
| with: | |
| key-suffix: ${{ (matrix.name == 'race-condition-debug' && 'race') || '' }} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/scanner-build.yaml around lines 104 - 107, The cache step
"Cache Go dependencies" currently only varies by GOARCH, causing the amd64
"race-condition-debug" leg to share caches; update this step to include the
scanner matrix dimension (e.g., matrix.scanner, matrix.variant or whatever key
defines default/prerelease/race-condition-debug) in the cache lineage by
supplying that matrix value to the cache action (either via env like
SCANNER_VARIANT: ${{ matrix.scanner }} or a with: input) and ensure the cache
key generation inside ./.github/actions/cache-go-dependencies incorporates that
variable so race-only entries use a distinct cache key.
…-combined-minimal
|
Closing this demo PR — its individual component PRs have been merged or are in review: Merged:
Remaining:
Once #19617 lands, the full test cache pipeline is complete. |
Description
Combined demo of all CI optimization PRs. Do not merge — merge the individual PRs instead.
Overall impact
Individual PRs and what they do
Foundation (merge first, any order):
Workflow overhaul (merge after foundation):
Key findings
go build(source). Stable ldflags only help test caching, not build caching.-count=1on integration tests was added in 2019 for external service tests, but 38/50 packages use mocks.&in replacements as matched pattern (like sed). CI container had bash 5.1, ubuntu-latest has 5.2+.Unit Tests warm run breakdown (run 23728217606)
Partially generated by AI.