Skip to content

perf(ci): version refactor (remove Xref/status.sh)#19424

Closed
davdhacs wants to merge 1 commit intomasterfrom
davdhacs/go-cache-optimization
Closed

perf(ci): version refactor (remove Xref/status.sh)#19424
davdhacs wants to merge 1 commit intomasterfrom
davdhacs/go-cache-optimization

Conversation

@davdhacs
Copy link
Copy Markdown
Contributor

@davdhacs davdhacs commented Mar 13, 2026

Problem: -X ldflags break Go's build and test cache

Go's build cache keys every action (compile, link, test) on an ActionID — a hash of all inputs that affect the output. The link ActionID includes the full -ldflags string (exec.go:linkActionID() — see fmt.Fprintf(h, "link\n") ... fmt.Fprintf(h, "ldflags %q\n", ldflags)).

The old status.sh + //XDef: mechanism injected version data via -X ldflags:

-X "github.com/stackrox/rox/pkg/version/internal.MainVersion=4.11.x-431-gbbcc7d3be2"
-X "github.com/stackrox/rox/pkg/version/internal.GitShortSha=bbcc7d3be24"

Since MainVersion and GitShortSha change on every commit, the ldflags string changes, which changes the link ActionID. This has two effects:

  1. Build cache miss at link step: every binary re-links on every commit, even if no source code changed. Go's compile cache uses dependency contentID (output hash) rather than actionID, so compile steps can still cache-hit. But the link step always re-runs because its ActionID includes the changed ldflags. On CI, this means ~100s spent re-linking binaries that haven't actually changed.

  2. Test cache miss at Phase 1: Go's test result cache computes a test binary identity from the test binary's build ActionID, which includes the link ActionID (test.go:builderTest() — the test binary hash flows through packageActionID()linkActionID()). When the link ActionID changes, the test binary identity changes, and Go discards all cached test results — every test re-runs from scratch.

Fix: generated zversion.go instead of -X ldflags

Replace -X ldflags with a generated pkg/version/internal/zversion.go file that sets version data via init(). This eliminates -X from the link step entirely, making ldflags stable across commits (-s -w only).

For tests, go-tool.sh writes only the base tag (e.g. "4.11.x") — stable across commits on the same release branch. For builds, it writes the full git describe output (e.g. "4.11.x-431-gbbcc7d3be2").

The generated zversion.go is a source file, so it flows through the compile ActionID (content-hashed). When its content doesn't change (test mode), the compile output is byte-identical, giving the same contentID, so all 4400+ downstream packages get cache hits at both compile and link. Only 2 packages recompile (the version packages themselves, whose output is byte-identical anyway). Result: build compilation step drops from 100s → 27s (3.7x), and this improvement works independently of the test cache fix.

Companion PRs

Test caching requires both Phase 1 (this PR) and Phase 2 (#19395). The build cache improvement from this PR works on its own.

Prior art: PR #15609

PR #15609 was closed due to two blockers (comment). Both are now resolved:

  1. -coverprofile broke test caching (Go <1.25): fixed in Go 1.25 (golang/go#23565). We are on Go 1.25.5.
  2. release vs non-release GOTAGS sharing one cache: our research (PR perf(ci): add key-suffix to remaining matrix jobs sharing GOCACHE #19576) showed they share 93% of entries — only 6/1952 packages have release build tags. Go correctly recompiles those when GOTAGS changes. Separating caches would duplicate 93% of entries.

Measured results (both PRs combined)

Job Master warm Combined warm Speedup
go (GOTAGS="") 35m ~8m 4.5x
go-postgres 31m ~36s 52x
sensor-integration 26m ~2.5m 11x

Build compilation step (this PR alone): 100s → 27s (3.7x), with 99.95% cache hit rate (2/4403 packages recompile).

Testing and quality

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • Verified locally: go-build.sh generates full version, go-test.sh generates stable base tag
  • Build produces correct version strings in binaries
  • Prior CI runs confirmed 4.5-52x test speedup and 3.7x build speedup when combined with mtime fix
  • Operator builds pass CI with the new BUILD_TAG mechanism

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 13, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The build cache statistics added to go_build rely on go build -v line counts and a fixed /tmp/go-build-compiled-count file, which can be both inaccurate (non-1:1 with compiled units) and racy if multiple builds run in parallel; consider using a unique temp file per invocation and/or gating this logging behind a flag or env var.
  • The git ls-files | xargs touch step in the composite action mutates mtimes for every tracked file, which could interfere with incremental build tooling or other languages in the same job; consider narrowing this to Go source/test files or to the directories actually used by Go.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The build cache statistics added to `go_build` rely on `go build -v` line counts and a fixed `/tmp/go-build-compiled-count` file, which can be both inaccurate (non-1:1 with compiled units) and racy if multiple builds run in parallel; consider using a unique temp file per invocation and/or gating this logging behind a flag or env var.
- The `git ls-files | xargs touch` step in the composite action mutates mtimes for every tracked file, which could interfere with incremental build tooling or other languages in the same job; consider narrowing this to Go source/test files or to the directories actually used by Go.

## Individual Comments

### Comment 1
<location path="scripts/go-tool.sh" line_range="108" />
<code_context>
-  invoke_go_build -o "$output" "${dirs[@]}"
+  local total
+  total=$(go list -deps "${dirs[@]}" 2>/dev/null | wc -l | tr -d ' ')
+  local compiled_count=/tmp/go-build-compiled-count
+  invoke_go_build -o "$output" "${dirs[@]}" 2> >(tee >(wc -l | tr -d ' ' > "$compiled_count") >&2)
+  wait
</code_context>
<issue_to_address>
**issue (bug_risk):** Use a per-process or per-invocation temp file instead of a fixed path for compiled_count

A fixed `/tmp` path means concurrent runs of this script (or multiple `go_build` calls in one job) will race on the same `compiled_count` file, causing corrupted or incorrect counts. Please use a unique temp file per invocation (e.g., via `mktemp` or including `$$`/`$BASHPID` and job name in the filename) and clean it up afterward.
</issue_to_address>

### Comment 2
<location path="scripts/go-tool.sh" line_range="112-115" />
<code_context>
+  invoke_go_build -o "$output" "${dirs[@]}" 2> >(tee >(wc -l | tr -d ' ' > "$compiled_count") >&2)
+  wait
+  local compiled
+  compiled=$(cat "$compiled_count" 2>/dev/null | tr -d ' ')
+  local cached=$((total - compiled))
+  echo >&2 "Build cache: ${cached}/${total} deps cached ($compiled compiled)"
+  if [[ "$total" -gt 0 ]] && [[ $((cached * 100 / total)) -lt 50 ]]; then
+    echo >&2 "WARNING: Build cache hit rate below 50% (${cached}/${total}). Significant code changes invalidate the cached build — expect slower compilation."
+  fi
</code_context>
<issue_to_address>
**issue (bug_risk):** Harden arithmetic against empty or non-numeric totals/compiled counts

If `go list` or the build step fails (or their output format changes), `total`/`compiled` can be empty or non-numeric. Then `cached=$((total - compiled))` and the percentage calculation will throw a bash arithmetic error and abort. Please default missing values to 0 and ensure both vars are integers (e.g. `${total:-0}`, `${compiled:-0}` with a simple regex check), or skip the cache-reporting logic when the counts aren’t valid.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Mar 13, 2026

Images are ready for the commit at a564561.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-432-ga564561fb2.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.28%. Comparing base (bbcc7d3) to head (a564561).
⚠️ Report is 42 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19424      +/-   ##
==========================================
+ Coverage   49.26%   49.28%   +0.02%     
==========================================
  Files        2735     2735              
  Lines      206138   206170      +32     
==========================================
+ Hits       101545   101604      +59     
+ Misses      97046    97022      -24     
+ Partials     7547     7544       -3     
Flag Coverage Δ
go-unit-tests 49.28% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@davdhacs davdhacs added the ci-save-cache Enable Go cache saves on this PR branch for testing label Mar 13, 2026
@davdhacs davdhacs changed the title perf(ci): enable Go build and test caching across commits perf(ci): add VERSION file and caching fixes Mar 14, 2026
@davdhacs davdhacs changed the title perf(ci): add VERSION file and caching fixes perf(ci): add VERSION file and build/test caching fixes Mar 14, 2026
@davdhacs davdhacs changed the title perf(ci): add VERSION file and build/test caching fixes perf(ci): build/test caching fixes Mar 15, 2026
@davdhacs davdhacs removed the ci-save-cache Enable Go cache saves on this PR branch for testing label Mar 15, 2026
@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Mar 17, 2026

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@davdhacs davdhacs force-pushed the davdhacs/go-cache-optimization branch 4 times, most recently from a6fff2b to 63a3bd8 Compare March 24, 2026 22:54
@davdhacs davdhacs changed the title perf(ci): build/test caching fixes perf(ci): go test caching fixes Mar 24, 2026
@davdhacs davdhacs changed the title perf(ci): go test caching fixes perf(ci): version refactor (remove Xref/status.sh) Mar 24, 2026
@davdhacs davdhacs force-pushed the davdhacs/go-cache-optimization branch 2 times, most recently from a6cc9da to 21b91d0 Compare March 25, 2026 00:01
@davdhacs
Copy link
Copy Markdown
Contributor Author

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • go-tool.sh unconditionally generates pkg/version/internal/zversion.go and reads the *VERSION files for every invocation; consider short‑circuiting for tools that don’t need version data (or only generating when TOOL is build/test) to avoid unexpected failures when those files are absent and to minimize unnecessary writes.
  • Because zversion.go is now generated into the repo tree, please ensure it’s consistently excluded from version control and tooling (e.g., .gitignore, linters, IDEs) so local go test / go build runs don’t leave a perpetually dirty working tree or confuse static analysis.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- go-tool.sh unconditionally generates pkg/version/internal/zversion.go and reads the *VERSION files for every invocation; consider short‑circuiting for tools that don’t need version data (or only generating when TOOL is build/test) to avoid unexpected failures when those files are absent and to minimize unnecessary writes.
- Because zversion.go is now generated into the repo tree, please ensure it’s consistently excluded from version control and tooling (e.g., .gitignore, linters, IDEs) so local `go test` / `go build` runs don’t leave a perpetually dirty working tree or confuse static analysis.

## Individual Comments

### Comment 1
<location path="scripts/go-tool.sh" line_range="37" />
<code_context>
+		# Konflux/release builds set BUILD_TAG to the full version string.
+		# Use it directly (the Docker build context has no .git directory).
+		main_version="${BUILD_TAG}"
+		git_short_sha="$(echo "$BUILD_TAG" | grep -oP 'g\K[0-9a-f]+$' || echo "")"
 	else
-		die "Malformed status.sh output line ${line}"
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `grep -P` may break in minimal environments that lack PCRE-enabled grep

Because this script runs in build tooling and containers, relying on `grep -P` can cause failures where PCRE-enabled grep isn’t available (e.g., busybox images). Please switch to a POSIX-compatible pattern (e.g., via `sed`, `awk`, or `grep` + `cut`) for extracting the short SHA so it works reliably across environments.
</issue_to_address>

### Comment 2
<location path="operator/Makefile" line_range="351-354" />
<code_context>
 	BUILDKIT_PROGRESS=plain ../scripts/docker-build.sh \
 		-t ${IMG} \
 		$(if $(GOARCH),--build-arg TARGET_ARCH=$(GOARCH)) \
+		--build-arg BUILD_TAG=$(shell cd .. && make --quiet --no-print-directory tag) \
 		-f $< \
 		..
</code_context>
<issue_to_address>
**suggestion:** The `BUILD_TAG` make invocation is duplicated and could be centralized

This `$(shell cd .. && make --quiet --no-print-directory tag)` call is now used in multiple targets (e.g., `docker-build` and `docker-buildx`). To keep tag computation consistent and easier to change later, please extract it into a Make variable (e.g., `BUILD_TAG_CMD` or `IMAGE_TAG`) and reference that in each target instead.

Suggested implementation:

```
# Force re-building the file to make sure the current environment is correctly reflected.
.PHONY: build/Dockerfile.gen
	sed -e 's,$${ROX_IMAGE_FLAVOR},$(ROX_IMAGE_FLAVOR),g; s,$${BUILD_IMAGE_VERSION},$(shell sed 's/\s*\#.*//' ../BUILD_IMAGE_VERSION),' < $< > $@

# Centralized build tag computation so all targets use the same tag logic.
BUILD_TAG ?= $(shell cd .. && make --quiet --no-print-directory tag)

.PHONY: docker-build
docker-build: build/Dockerfile.gen ## Build docker image with the operator.
	BUILDKIT_PROGRESS=plain ../scripts/docker-build.sh \
		-t ${IMG} \
		$(if $(GOARCH),--build-arg TARGET_ARCH=$(GOARCH)) \
		--build-arg BUILD_TAG=$(BUILD_TAG) \
		-f $< \
		..

```

Any other targets in `operator/Makefile` that currently invoke:
`$(shell cd .. && make --quiet --no-print-directory tag)`
(e.g., in `docker-buildx` or elsewhere) should be updated to use `$(BUILD_TAG)` instead, for example:
`--build-arg BUILD_TAG=$(BUILD_TAG)` or `-t ${IMG}:$(BUILD_TAG)` depending on how the tag is used.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Go's test cache computes a test binary ID that includes -X ldflags.
Since -X MainVersion and -X GitShortSha change on every commit, the
test binary ID changes too, causing 100% test cache misses (Phase 1).

Fix: generate pkg/version/internal/zversion.go with an init() function
instead of passing version data via -X ldflags. For tests, use only the
base tag (e.g. "4.11.x") which is stable across commits. For builds,
use the full git-describe version.

This eliminates -X ldflags entirely from the link step, stabilizing
the link ActionID. Result: test binary IDs are constant across commits
(fixing Phase 1), and build cache hits improve from ~0% to ~99.9%
(only 2/4403 packages recompile — version packages whose output is
byte-identical anyway).

This is the companion to PR #19395 (mtime fix for Phase 2). Together
they enable 4.5-52x test speedup.

Addresses concerns from PR #15609: -coverprofile caching is fixed in
Go 1.25 (we're on 1.25.5), and GOTAGS=release shares 93% of cache
entries with default — separation is counterproductive.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs force-pushed the davdhacs/go-cache-optimization branch from 21b91d0 to a564561 Compare March 25, 2026 03:08
@davdhacs
Copy link
Copy Markdown
Contributor Author

https://github.com/sourcery-ai review

@davdhacs davdhacs requested a review from tommartensen March 25, 2026 04:11
@davdhacs
Copy link
Copy Markdown
Contributor Author

@sourcery-ai review

Copy link
Copy Markdown

@SourceryAI SourceryAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The generate_version_file test path (git describe --tags --abbrev=0 --exclude '*-nightly-*') hard-fails when no suitable tag exists; consider a fallback (e.g., a default version string or branch+SHA) so local or ephemeral environments without tags don't break go test.
  • The git_short_sha extraction from BUILD_TAG via sed -n 's/.*g\([0-9a-f]\{1,\}\)$/\1/p' assumes a very specific version suffix; if Konflux/release tags change format, this will silently yield an empty SHA, so it may be safer to validate and log/die on unexpected formats or to pass the SHA explicitly as a separate build arg.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `generate_version_file` test path (`git describe --tags --abbrev=0 --exclude '*-nightly-*'`) hard-fails when no suitable tag exists; consider a fallback (e.g., a default version string or branch+SHA) so local or ephemeral environments without tags don't break `go test`.
- The `git_short_sha` extraction from `BUILD_TAG` via `sed -n 's/.*g\([0-9a-f]\{1,\}\)$/\1/p'` assumes a very specific version suffix; if Konflux/release tags change format, this will silently yield an empty SHA, so it may be safer to validate and log/die on unexpected formats or to pass the SHA explicitly as a separate build arg.

## Individual Comments

### Comment 1
<location path="scripts/go-tool.sh" line_range="25-12" />
<code_context>
+generate_version_file() {
</code_context>
<issue_to_address>
**question:** Consider guarding git invocations when TOOL=test to avoid failures in build contexts without a .git directory

In the `TOOL == "test"` path, `generate_version_file` always runs `git describe` from `${REPO_ROOT}`. In Docker or other environments without a `.git` directory, this will hit `die "git describe failed"`, causing tests to hard-fail even when version info isn’t essential.

To avoid that, you could either (a) skip version generation in test mode when `.git` is absent, or (b) let `BUILD_TAG` override the `TOOL == "test"` behavior so tests still run in non‑git environments.
</issue_to_address>

### Comment 2
<location path="operator/Dockerfile" line_range="53-54" />
<code_context>
+
+# go-tool.sh needs BUILD_TAG to generate version info when there's no .git
+# directory. The Makefile sets this via --build-arg from `make tag`.
+ARG BUILD_TAG
+ENV BUILD_TAG=${BUILD_TAG}

 # We've been historically building operator without CGO both upstream and downstream.
</code_context>
<issue_to_address>
**question (bug_risk):** Clarify behavior when BUILD_TAG is not provided to the Docker build

With the current setup, a plain `docker build` without `--build-arg BUILD_TAG=...` will leave `BUILD_TAG` empty. In images without `.git`, `go-tool.sh` will then fail when it tries `git describe`. Consider either enforcing that `BUILD_TAG` is set (e.g., `RUN test -n "$BUILD_TAG"`) or adding a non-`.git`-based fallback for version generation.
</issue_to_address>

Hi @davdhacs! 👋

Thanks for trying out Sourcery by commenting with @sourcery-ai review! 🚀

Install the sourcery-ai bot to get automatic code reviews on every pull request ✨

Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -12,46 +12,55 @@ die() {
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Consider guarding git invocations when TOOL=test to avoid failures in build contexts without a .git directory

In the TOOL == "test" path, generate_version_file always runs git describe from ${REPO_ROOT}. In Docker or other environments without a .git directory, this will hit die "git describe failed", causing tests to hard-fail even when version info isn’t essential.

To avoid that, you could either (a) skip version generation in test mode when .git is absent, or (b) let BUILD_TAG override the TOOL == "test" behavior so tests still run in non‑git environments.

Comment on lines +53 to +54
ARG BUILD_TAG
ENV BUILD_TAG=${BUILD_TAG}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (bug_risk): Clarify behavior when BUILD_TAG is not provided to the Docker build

With the current setup, a plain docker build without --build-arg BUILD_TAG=... will leave BUILD_TAG empty. In images without .git, go-tool.sh will then fail when it tries git describe. Consider either enforcing that BUILD_TAG is set (e.g., RUN test -n "$BUILD_TAG") or adding a non-.git-based fallback for version generation.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In go-tool.sh, the non-test path without BUILD_TAG assumes a functional git repo and tags; you might want a more explicit error (e.g., detecting missing .git or no tags) to avoid opaque git describe failed messages when the script is used in reduced build contexts.
  • The BUILD_TAG parsing for GitShortSha in go-tool.sh assumes a ...g<sha> suffix and silently falls back to an empty SHA if the format changes; consider either validating the expected format with a clearer failure or logging that GitShortSha will be empty to make debugging version issues easier.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In go-tool.sh, the non-test path without BUILD_TAG assumes a functional git repo and tags; you might want a more explicit error (e.g., detecting missing .git or no tags) to avoid opaque `git describe failed` messages when the script is used in reduced build contexts.
- The BUILD_TAG parsing for GitShortSha in go-tool.sh assumes a `...g<sha>` suffix and silently falls back to an empty SHA if the format changes; consider either validating the expected format with a clearer failure or logging that GitShortSha will be empty to make debugging version issues easier.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@davdhacs
Copy link
Copy Markdown
Contributor Author

suggestion from @janisz : try smaller PRs: 1. static version for tests. 2. goembed with the existing files

@davdhacs
Copy link
Copy Markdown
Contributor Author

Closing — this was an earlier approach (zversion.go) to stabilize build ldflags. Superseded by the discovery that Go doesn't cache linked binaries in GOCACHE for go build (buildid.go:520-522), so stable build ldflags provide no build cache benefit.

The only cache benefit from stable ldflags is for test caching (Phase 1 = test binary identity includes link ActionID). This is handled by #19617 with hardcoded test-only ldflags.

See #19643 closing comment for full details.

@davdhacs davdhacs closed this Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants