perf(ci): test cache — combined stable ldflags + mtime fix#19404
perf(ci): test cache — combined stable ldflags + mtime fix#19404
Conversation
Go includes -X ldflags in the link ActionID (exec.go:1404), so per-commit values (MainVersion, GitShortSha) change the test binary ID, causing test cache misses on every commit. Fix: for go test, replace per-commit MainVersion with the stable base version tag (from git describe --tags --abbrev=0) and skip GitShortSha. Builds continue using full per-commit ldflags unchanged. Also adds -buildvcs=false which removes unused VCS stamping that Go 1.18+ silently enabled (no code reads debug.ReadBuildInfo). This is the minimal change to enable test caching — just go-tool.sh. The existing XDef/status.sh mechanism is preserved for builds. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Go's test cache validates inputs by (path, size, mtime) — not content. git checkout sets all file mtimes to the current time, which differs between CI runs, causing test results to never be cached. Fix: set all tracked file mtimes to a fixed date (2001-01-01) after checkout and cache restore. This makes the test cache hit across runs when file content hasn't changed. Measured impact (from PR #19201): - go unit tests: 35m → 8m (4.5x) - go-postgres: 31m → 36s (52x) - sensor-integration: 26m → 2.5m (11x) - Test cache hit rate: 97.6% (681/698 packages) Note: the full test cache benefit requires stable Go build cache ActionIDs across commits (see companion PR for zversion.go change). Without that, tests still recompile on every commit, negating the mtime fix. This PR provides the mtime stabilization infrastructure; full speedup requires both changes. Risk: if a non-.go testdata file changes content but not size between commits, the test cache could return a stale result. This is Go's own inherent test cache limitation — it explicitly avoids hashing file content (test.go:hashOpen()). Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The logic that derives
test_ldflagsby matching on*"MainVersion="*and*"GitShortSha="*is a bit brittle; consider matching on the fully-qualified symbol (including package) or deriving these keys from a single source (e.g.,status.sh) to avoid future breakage if variable names change or newXDefvars are added. - Running
git describe --tagsingo-tool.shon every invocation may add non-trivial overhead in CI; consider memoizing the base version (e.g., viastatus.sh, an env var, or a small cached file) so repeated calls to the wrapper don’t have to invoke git each time.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The logic that derives `test_ldflags` by matching on `*"MainVersion="*` and `*"GitShortSha="*` is a bit brittle; consider matching on the fully-qualified symbol (including package) or deriving these keys from a single source (e.g., `status.sh`) to avoid future breakage if variable names change or new `XDef` vars are added.
- Running `git describe --tags` in `go-tool.sh` on every invocation may add non-trivial overhead in CI; consider memoizing the base version (e.g., via `status.sh`, an env var, or a small cached file) so repeated calls to the wrapper don’t have to invoke git each time.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Images are ready for the commit at d9e0941. To use with deploy scripts, first |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19404 +/- ##
=======================================
Coverage 49.68% 49.68%
=======================================
Files 2700 2700
Lines 203278 203317 +39
=======================================
+ Hits 100999 101025 +26
- Misses 94753 94768 +15
+ Partials 7526 7524 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
status.sh called 5 make targets (tag, collector-tag, fact-tag, scanner-tag, shortcommit), each spawning a make subprocess that parsed the entire Makefile before executing a trivial git/cat command. This added ~15s locally and ~50-60s on CI to every go build/test. Replace with direct git/cat commands. Same output, same env var overrides (BUILD_TAG, SHORTCOMMIT), 32x faster (0.5s vs 15.6s). Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
github.job is the same for all matrix entries (e.g. "go" for both GOTAGS="" and GOTAGS=release). Without a suffix, the matrix entries overwrite each other's GOCACHE — whichever saves last wins, and the other gets a stale cache with mismatched ActionIDs. Add key-suffix input to cache-go-dependencies and pass matrix.gotags from unit-tests.yaml so each GOTAGS variant gets its own cache key. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Simulates a typical PR — one small Go file changed. Expected: - pkg/mathutil recompiles + retests (~1 package) - ~700 other packages hit test cache - Total time should be close to warm run (~14m vs 45m cold)
Go test cache has two phases: 1. Test binary ID (includes link ActionID with ldflags) — fixed by commit 2 2. File mtime validation — fixed here git checkout sets all file mtimes to "now", causing Phase 2 to miss. Fix: set all tracked file mtimes to a fixed date after checkout. Also adds key-suffix input to cache-go-dependencies to disambiguate GOTAGS matrix entries. github.job is the same for all matrix entries (e.g. "go" for both GOTAGS="" and GOTAGS=release). Without key-suffix, matrix entries overwrite each other's GOCACHE. Combined measured results (PR #19404): - go unit tests: 45m → 14m (3.3x) - go-postgres: 35m → 4m (8.7x) - sensor-integration: 21m → 5m (4x) - Test cache hit rate: 92% (1942/2108) Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Superseded by #19424. |
Combined test: stable ldflags + mtime fix + key-suffix + fast status.sh
Only 3 files changed from master — no tmpfs, no shallow clone, no generated files.
Changes
scripts/go-tool.sh— stable base tag for test ldflags +-buildvcs=false.github/actions/cache-go-dependencies/action.yaml— mtime stabilization +key-suffixinputstatus.sh— direct git/cat commands (0.5s vs 15.6s with make)Results (seed run 23037501495, warm run 23038712568)
Test cache hit rate: 92% (1942/2108 packages cached)
Key discoveries
GOTAGS matrix cache collision:
github.jobis the same for all matrix entries(e.g. "go" for both GOTAGS="" and GOTAGS=release). Without
key-suffix, theyoverwrite each other's GOCACHE — whichever saves last wins, the other misses.
status.sh overhead: 5
makecalls × ~3s each = 15s pergo-tool.shinvocation.Replaced with direct git/cat commands (0.5s).
Build step timing: 26s warm (vs 21s in PR perf(ci): disable buildvcs for builds and tests #19201) — nearly identical. The
remaining gap is checkout time (18s full vs 4s shallow) and cache restore overhead.
Partially generated by AI.