Skip to content

ROX-33792: optimize operator builds#19417

Closed
janisz wants to merge 34 commits intomasterfrom
clenup_operator_build
Closed

ROX-33792: optimize operator builds#19417
janisz wants to merge 34 commits intomasterfrom
clenup_operator_build

Conversation

@janisz
Copy link
Copy Markdown
Contributor

@janisz janisz commented Mar 13, 2026

Issues:

  1. Build took 4 times 10 minutes and then 4 times 1 to upload manifests
  2. Duplicate compilation and repo and go modules fetch
  3. Ineffectively used Go cache - container prevented GitHub Actions cache integration
  4. Push images to just pull them in the next step to generate manifests
  5. Disk space exhaustion - frequent failures (fixed in fix(ci): increase operator build free-disk-space to 40GB #19396)

Solutions:

  • build operator bundle once
  • combine compilation of operator with compilation of other binaries
  • combine build and push images to a single job ( only copy a binary to the image so it's much faster and job setup overhead was much bigger than the build step itself)
  • combine push manifest with build and push images as we build them all in a single job we don't need to pull them so everything is in place for manifests push
  • do not fetch full repo and rely on operator tag from bundle job
  • use ubuntu-latest instead of custom container

Before:
before

Afeter:
after

Note: Changes include AI-assisted code generation and optimization.

@janisz janisz requested review from a team as code owners March 13, 2026 14:57
@janisz janisz requested review from vladbologa and removed request for a team March 13, 2026 14:57
@janisz janisz marked this pull request as draft March 13, 2026 14:57
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In the operator Dockerfile you hard-code default cache paths (/workspace/pkg/mod and /root/.cache/go-build); consider wiring these to go env GOMODCACHE/go env GOCACHE (or at least documenting/aligning with the base image’s defaults) to avoid subtle breakage if the Go toolchain layout changes.
  • The Docker layer cache key in the workflow uses hashFiles('**/go.sum'), so any go module change anywhere in the repo will invalidate the operator cache; if you want tighter caching, consider scoping this to the operator-related go.sum files (for example go.sum and operator/go.sum).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the operator Dockerfile you hard-code default cache paths (`/workspace/pkg/mod` and `/root/.cache/go-build`); consider wiring these to `go env GOMODCACHE`/`go env GOCACHE` (or at least documenting/aligning with the base image’s defaults) to avoid subtle breakage if the Go toolchain layout changes.
- The Docker layer cache key in the workflow uses `hashFiles('**/go.sum')`, so any go module change anywhere in the repo will invalidate the operator cache; if you want tighter caching, consider scoping this to the operator-related `go.sum` files (for example `go.sum` and `operator/go.sum`).

## Individual Comments

### Comment 1
<location path=".github/workflows/build.yaml" line_range="627-628" />
<code_context>
       - name: Set up Docker Buildx
         uses: docker/setup-buildx-action@v3
+        with:
+          driver-opts: |
+            image=moby/buildkit:latest
+
+      - name: Cache Docker layers
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using `moby/buildkit:latest` can introduce non-determinism and unexpected breakage in CI builds.

Pulling `moby/buildkit:latest` makes CI behavior change as the image updates, which can cause flaky, hard-to-reproduce failures. Please pin to a specific BuildKit tag or digest so builds are deterministic and upgrades are controlled.

Suggested implementation:

```
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        with:
          driver-opts: |
            image=moby/buildkit:v0.15.1

```

If your project has a documented set of pinned tool versions (e.g., in a README or tooling matrix), you may want to:
1. Align the `moby/buildkit` version here (`v0.15.1`) with whatever version is documented there.
2. Optionally further harden determinism by using a digest (`image=moby/buildkit@sha256:...`) instead of a tag, and updating it intentionally when upgrading BuildKit.
</issue_to_address>

### Comment 2
<location path="operator/Makefile" line_range="363-364" />
<code_context>
+	DOCKER_BUILDKIT=1 BUILDKIT_PROGRESS=plain ../scripts/docker-build.sh \
 		-t ${IMG} \
 		$(if $(GOARCH),--build-arg TARGET_ARCH=$(GOARCH)) \
+		--build-arg GOMODCACHE_PATH=$(shell go env GOMODCACHE) \
+		--build-arg GOCACHE_PATH=$(shell go env GOCACHE) \
+		$(if $(BUILDKIT_CACHE_FROM),--cache-from $(BUILDKIT_CACHE_FROM)) \
+		$(if $(BUILDKIT_CACHE_TO),--cache-to $(BUILDKIT_CACHE_TO)) \
</code_context>
<issue_to_address>
**issue (performance):** Passing host `go env` paths into the container may misalign cache paths with the container’s Go configuration.

Inside the Docker build, GOPATH is `/workspace`, so the container’s natural `GOMODCACHE`/`GOCACHE` differ from the host’s (e.g. `/home/runner/...`). Overriding Dockerfile defaults with host-derived paths means cache mounts may not match where `go mod download` writes, reducing cache effectiveness or sending writes to unexpected locations. Prefer using the Dockerfile defaults or deriving these paths inside the container from its own `go env` instead of the host’s.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@davdhacs davdhacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, and fast!

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Mar 13, 2026

Images are ready for the commit at 30c697d.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-469-g7588692824.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.28%. Comparing base (4e2a327) to head (30c697d).
⚠️ Report is 77 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19417   +/-   ##
=======================================
  Coverage   49.28%   49.28%           
=======================================
  Files        2735     2735           
  Lines      206215   206215           
=======================================
+ Hits       101633   101635    +2     
+ Misses      97041    97038    -3     
- Partials     7541     7542    +1     
Flag Coverage Δ
go-unit-tests 49.28% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@janisz
Copy link
Copy Markdown
Contributor Author

janisz commented Mar 17, 2026

I think it will be easier if operator follow the same build style as main or roxctl instead of using docker alternatively we can use docker with cache but then the whole job should be dockerized.

@janisz janisz closed this Mar 17, 2026
@janisz janisz reopened this Mar 18, 2026
@janisz janisz force-pushed the clenup_operator_build branch 4 times, most recently from e533da3 to 9f7755b Compare March 18, 2026 16:50
@janisz janisz changed the title fix(ci): resolve disk space exhaustion in operator build perf(ci): build operator binary once outside Docker for faster builds Mar 18, 2026
@janisz janisz changed the title perf(ci): build operator binary once outside Docker for faster builds perf(ci): build operator binary outside of Docker for faster builds Mar 18, 2026
@janisz janisz force-pushed the clenup_operator_build branch from 9f7755b to b6d2be9 Compare March 18, 2026 17:04
@github-actions github-actions bot added the konflux-build Run Konflux in PR. Push commit to trigger it. label Mar 19, 2026
@janisz janisz force-pushed the clenup_operator_build branch from f2ea7fc to 581ba43 Compare March 19, 2026 09:52
@janisz janisz changed the title perf(ci): build operator binary outside of Docker for faster builds perf(ci): unify operator builds with other binaries Mar 19, 2026
@janisz janisz requested a review from davdhacs March 19, 2026 11:17
@janisz janisz marked this pull request as ready for review March 19, 2026 11:17
@janisz janisz force-pushed the clenup_operator_build branch from dee2c85 to 28d5326 Compare March 25, 2026 16:30
janisz and others added 2 commits March 25, 2026 17:32
Remove explicit fetch-depth: 1 as it's the default value for
actions/checkout@v6.

Only jobs that need full history (define-job-matrix and a few others)
explicitly set fetch-depth: 0.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix build failures with shallow clones by using Makefile environment
variables instead of git commands.

Changes:
- define-job-matrix: Compute short commit and export as output
- All jobs: Use BUILD_TAG instead of TAG (TAG is cleared for security)
- All jobs: Add SHORTCOMMIT env var to avoid git rev-parse failures

How it works:
- make/env.mk checks BUILD_TAG env var (line 58-59) before git describe
- Makefile checks SHORTCOMMIT env var before git rev-parse
- This allows builds to work with shallow clones (no git history needed)

Benefits:
- pre-build-go-binaries can now use shallow clones
- Saves ~12s × 2-4 matrix jobs = 24-48s
- Total workflow savings: ~72-96s per run

Fixes: ROX-33792

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@janisz janisz force-pushed the clenup_operator_build branch from b29a666 to 0e29168 Compare March 25, 2026 16:46
janisz and others added 10 commits March 25, 2026 18:40
Co-authored-by: Marcin Owsiany <porridge@redhat.com>
Changed step output key from 'sha' to 'shortcommit' for consistency
with the SHORTCOMMIT environment variable name used throughout the
workflow.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed output name from 'main-tag' to 'build-tag' to match the
BUILD_TAG environment variable used throughout the workflow.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>
Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>
Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>
Replace manual if/else logic with existing registry_from_branding
function from lib.sh. This improves code maintainability by reusing
the centralized registry mapping logic.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>
Fixes critical issues found in code review:

1. Use docker-build-prebuilt target to leverage prebuilt binaries
   - Changed from docker-build which rebuilds inside Docker
   - This enables the intended optimization of building once per arch

2. Add missing 'fi' to close if statement in scan-images workflow
   - Fixes bash syntax error that would cause CI failures

3. Remove duplicate sourcing of lib.sh
   - Already sourced at top of script

4. Remove unused 'image' variable
   - Dead code cleanup

AI-assisted code review and fixes.
Fixes test failures where development builds were incorrectly compiled
as release builds.

Root cause:
- make/env.mk checks if BUILD_TAG is set in CI to determine release builds
- Our shallow clone optimization now sets BUILD_TAG for ALL builds
- This caused ALL builds to be compiled with GOTAGS=release

Solution:
- Check if BUILD_TAG contains '.x' (development tag marker)
- Only set GOTAGS=release for clean version tags without '.x'

Examples:
- 4.7.x-36-gc2253ef650 → development build (GOTAGS empty)
- 4.6.0 → release build (GOTAGS=release)
- 4.11.x-nightly-20260325 → development build (GOTAGS empty)

This fixes E2E test failures:
- TestMetadataIsSetCorrectly expecting buildFlavor=development
- TestFeatureFlagSettings expecting development defaults

AI-assisted debugging and fix.
@github-actions
Copy link
Copy Markdown
Contributor

/konflux-retest scanner-v4-db-on-push

@github-actions
Copy link
Copy Markdown
Contributor

/konflux-retest central-db-on-push

@github-actions
Copy link
Copy Markdown
Contributor

/konflux-retest operator-bundle-on-push

davdhacs added a commit that referenced this pull request Mar 25, 2026
The operator branding key-suffix is unnecessary for GOCACHE (0% unique
entries), but PR #19417 will unify operator builds entirely. Removing
the key-suffix here would create a merge conflict with that PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
davdhacs added a commit that referenced this pull request Mar 26, 2026
The operator branding key-suffix is unnecessary for GOCACHE (0% unique
entries), but PR #19417 will unify operator builds entirely. Removing
the key-suffix here would create a merge conflict with that PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@janisz
Copy link
Copy Markdown
Contributor Author

janisz commented Mar 27, 2026

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 27, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

davdhacs added a commit that referenced this pull request Mar 28, 2026
The operator branding key-suffix is unnecessary for GOCACHE (0% unique
entries), but PR #19417 will unify operator builds entirely. Removing
the key-suffix here would create a merge conflict with that PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
davdhacs added a commit that referenced this pull request Mar 28, 2026
The operator branding key-suffix is unnecessary for GOCACHE (0% unique
entries), but PR #19417 will unify operator builds entirely. Removing
the key-suffix here would create a merge conflict with that PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
davdhacs added a commit that referenced this pull request Mar 31, 2026
The operator branding key-suffix is unnecessary for GOCACHE (0% unique
entries), but PR #19417 will unify operator builds entirely. Removing
the key-suffix here would create a merge conflict with that PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@janisz
Copy link
Copy Markdown
Contributor Author

janisz commented Apr 1, 2026

Closing as it was splitted and merged

@janisz janisz closed this Apr 1, 2026
janisz added a commit that referenced this pull request Apr 2, 2026
…ecture

Build roxctl and roxagent together in a single go build invocation for
each Linux architecture, following the pattern from main-build-nodeps.

Problem:
- CLI builds invoked go build separately for roxctl and roxagent on each
  architecture
- 8 separate build invocations for Linux (4 roxctl + 4 roxagent)
- Go toolchain restarted for each binary
- Common dependencies recompiled multiple times
- No build cache sharing within same architecture

Solution:
- Created cli_linux-% pattern rule that builds both binaries together
- Single go build with multiple package paths: ./roxctl ./compliance/virtualmachines/roxagent
- roxctl-linux and roxagent-linux now depend on combined targets
- Preserves existing pattern rules for backward compatibility and darwin/windows builds

Performance (warm cache, 4 Linux architectures):
- Old approach (separate builds): 10m 6s
- New approach (sequential): 2m 18s → 4.4x faster
- New approach (make -j 2): 1m 6s → 9.2x faster
- New approach (make -j 4): 37s → 16.4x faster

Similar pattern used in:
- main-build-nodeps: builds 9 binaries in one invocation
- PR #19417: consolidated operator build

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants