Skip to content

perf(ci): shard go and go-postgres unit test jobs#19744

Draft
davdhacs wants to merge 18 commits intomasterfrom
davdhacs/shard-go-tests
Draft

perf(ci): shard go and go-postgres unit test jobs#19744
davdhacs wants to merge 18 commits intomasterfrom
davdhacs/shard-go-tests

Conversation

@davdhacs
Copy link
Copy Markdown
Contributor

@davdhacs davdhacs commented Apr 1, 2026

Description

Split Go test jobs into parallel shards and add supporting Makefile changes:

Makefile changes:

  • SKIP_DEPS: wrap deps target in ifdef SKIP_DEPS guard to skip go mod tidy in test-only CI jobs
  • Split go-postgres-unit-tests into go-postgres-unit-tests-main / go-postgres-unit-tests-migrator subtargets for independent sharding
  • Remove -count=1 from integration-unit-tests to allow Go test cache hits

Workflow changes (unit-tests.yaml):

go job → 5 shards:

  • pkg-helm, pkg-other, central-1, central-2, rest

go-postgres → 2 shards:

  • main (central/pkg/tools), migrator (with -p 1)

Split out from go job:

  • go-integration: registries/scanners/notifiers integration tests
  • go-operator-integration: operator test-integration + test-helm

All Makefile changes are backward-compatible.

Includes #19712 (host-runner test fixes).
Split from #19678, replaces #19742.

User-facing documentation

Testing and quality

  • the change is production ready
  • CI results are inspected

Automated testing

  • modified existing tests

How I validated my change

Previously validated on #19678.

🤖 Generated with Claude Code

davdhacs and others added 7 commits March 31, 2026 10:06
Fix tests and scripts to work on ubuntu-latest host runners (no
container) in addition to CI container environments.

- rate_limited_logger.go: add /home/runner/work/ path prefix
  (container used /__w/)
- download_zip_test.go: accept ErrPermission for /root/ paths
  (non-root runner returns permission denied, not not-found)
- detection_test.go: delete circular test
- lib.sh: escape & in bash 5.2+ replacements (verified 4.4-5.3)
- helpers.bash: add yq_multidoc helper (yq 4.x adds --- separators)
- roxctl-netpol-generate-*.bats: use yq_multidoc

All changes are backward-compatible with container environments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add SKIP_DEPS guard to deps target: when set, skip go mod tidy
  for jobs that don't need it (e.g. test-only jobs).
- Split go-postgres-unit-tests into main/migrator subtargets to
  enable independent sharding in CI.
- Remove -count=1 from integration-unit-tests to allow Go test
  cache hits.

All changes are backward-compatible: existing callers work unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
grep -v returns exit code 1 when no lines match (e.g. yq output is
only --- separators), which would look like a yq failure. sed always
returns 0 regardless of whether lines were deleted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run Go test jobs directly on ubuntu-latest instead of in the CI
container image. This saves ~55s container init overhead per job
and enables faster cache I/O (no overlay filesystem).

Changes per job:
- Remove container: block
- Add actions/setup-go with GOTOOLCHAIN=auto override
- Replace su postgres with docker run postgres
- Add pg_isready retry loop (docker postgres needs warmup time)
- go-bench: skip on PRs (if: github.event_name == 'push')

Depends on #19712 (host-runner test fixes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split the monolithic go job into 5 shards (pkg-helm, pkg-other,
central-1, central-2, rest) and go-postgres into 2 shards
(main, migrator) for parallel execution.

Also split integration and operator tests into separate jobs
(go-integration, go-operator-integration) so they run in parallel
with unit tests instead of sequentially after them.

Depends on #19743 (container removal) and #19742 (SKIP_DEPS +
postgres subtargets).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 1, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The EXCLUDE/exclude-pattern regexes used to build PACKAGES (e.g. /central/[o-z], /helm/) rely on substring matches in go list output and could easily over-/under-match if paths change; consider anchoring these patterns to path segments or using more explicit regexes to make the sharding more robust.
  • In the go-integration job, go list ./... | grep -E 'registries|scanners|notifiers' will select any package whose import path contains those substrings (including potential helpers or unrelated dirs); if you intend to restrict to specific packages/directories, tighten this filter to explicit path patterns.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `EXCLUDE`/`exclude-pattern` regexes used to build `PACKAGES` (e.g. `/central/[o-z]`, `/helm/`) rely on substring matches in `go list` output and could easily over-/under-match if paths change; consider anchoring these patterns to path segments or using more explicit regexes to make the sharding more robust.
- In the `go-integration` job, `go list ./... | grep -E 'registries|scanners|notifiers'` will select any package whose import path contains those substrings (including potential helpers or unrelated dirs); if you intend to restrict to specific packages/directories, tighten this filter to explicit path patterns.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Apr 1, 2026

Images are ready for the commit at 9c2e078.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-544-g9c2e07883f.

@davdhacs davdhacs changed the base branch from davdhacs/remove-container-go-tests to master April 1, 2026 19:58
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

This PR refactors the CI/CD workflow to use job sharding and setup-go actions, adds new integration test jobs, updates database setup patterns, fixes XML escaping in JUnit output, removes an obsolete test, adds GitHub Actions workspace path trimming, enhances error handling in path traversal tests, and introduces a YAML multi-document helper function for test assertions.

Changes

Cohort / File(s) Summary
CI Workflow Restructuring
.github/workflows/unit-tests.yaml
Introduced job sharding for the go test job with matrix dimensions; replaced container-based execution with actions/setup-go@v6 and GOTOOLCHAIN=auto; updated go-postgres and go-bench jobs to use Docker Postgres with polling readiness checks instead of local services; added new go-integration and go-operator-integration jobs; modified slack notification dependency graph to include new integration jobs.
Test Infrastructure & Utilities
tests/roxctl/bats-tests/helpers.bash
Added new yq_multidoc() helper function that wraps yq execution and removes YAML document separator lines to stabilize multi-document output assertions.
YAML Assertion Updates
tests/roxctl/bats-tests/local/roxctl-netpol-generate-development.bats, tests/roxctl/bats-tests/local/roxctl-netpol-generate-release.bats
Updated test assertions to use yq_multidoc instead of yq for validating multi-document YAML output (kind, labels, and port specifications); removed commented-out alternate assertion blocks.
CI Script Fixes
scripts/ci/lib.sh
Fixed XML escaping of special characters (&, ", <, >) in JUnit description strings by properly prefixing replacement delimiters with backslash in bash parameter expansions.
Test Cleanup & Updates
pkg/containers/detection_test.go, roxctl/common/zipdownload/download_zip_test.go
Removed obsolete TestContainerDetection test; updated TestExtractZipToFolder_PreventPathTraversal to accept both fs.ErrNotExist and fs.ErrPermission errors via errors.Is to handle environment-specific path behavior.
Logging & Environment Handling
pkg/logging/rate_limited_logger.go
Extended path trimming in getTrimmedFilePath to include the GitHub Actions workspace prefix (/home/runner/work/stackrox/stackrox/), affecting rate-limited log key generation across CI environments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The description is comprehensive and well-structured, covering Makefile changes, workflow modifications with specific job sharding details, and dependency tracking. All required template sections are addressed.
Title check ✅ Passed The PR title clearly and specifically describes the main change: sharding Go and go-postgres unit test jobs for CI performance improvement.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch davdhacs/shard-go-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/unit-tests.yaml:
- Around line 185-186: The "Cache Go dependencies" step using
./.github/actions/cache-go-dependencies is missing a key-suffix to separate
caches for different matrix runs; update that step to include a key-suffix like
${{ matrix.gotags }} (or ${{ matrix.gotags }}-<other-identifiers>) so the cache
key differs per gotags matrix entry and prevents collisions between matrix
variations.
- Around line 83-87: The unit-tests job is always setting GOTAGS to "test" and
ignores the matrix dimension; update the GOTAGS environment passed to
scripts/go-test.sh so it includes the matrix value (`${{ matrix.gotags }}`) as a
prefix before the existing test tag (so the matrix variations actually run with
different GOTAGS), i.e. change the GOTAGS assignment used when invoking
go-test.sh (the GOTAGS=... before scripts/go-test.sh) to incorporate `${{
matrix.gotags }}` while preserving the existing ",test" behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: dffb98ea-2637-41a2-8e05-52bc5f3ec7e3

📥 Commits

Reviewing files that changed from the base of the PR and between f8de23e and aaaff23.

📒 Files selected for processing (8)
  • .github/workflows/unit-tests.yaml
  • pkg/containers/detection_test.go
  • pkg/logging/rate_limited_logger.go
  • roxctl/common/zipdownload/download_zip_test.go
  • scripts/ci/lib.sh
  • tests/roxctl/bats-tests/helpers.bash
  • tests/roxctl/bats-tests/local/roxctl-netpol-generate-development.bats
  • tests/roxctl/bats-tests/local/roxctl-netpol-generate-release.bats
💤 Files with no reviewable changes (1)
  • pkg/containers/detection_test.go

Comment on lines +83 to +87
GOTAGS="${GOTAGS:+$GOTAGS,}test" CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 \
scripts/go-test.sh -timeout 25m -race -cover \
-coverprofile test-output/coverage.out -v \
$PACKAGES \
| tee test-output/test.log
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing ${{ matrix.gotags }} causes both matrix variations to run identical tests.

The gotags matrix dimension is defined (line 24) but not used in this command. Both GOTAGS="" and GOTAGS=release matrix jobs will execute with identical GOTAGS=test, defeating the purpose of testing with release tags.

Compare with go-integration job (line 147) which correctly uses ${{ matrix.gotags }} prefix.

🐛 Proposed fix
     # shellcheck disable=SC2086
-    GOTAGS="${GOTAGS:+$GOTAGS,}test" CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 \
+    ${{ matrix.gotags }} GOTAGS="${GOTAGS:+$GOTAGS,}test" CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 \
       scripts/go-test.sh -timeout 25m -race -cover \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/unit-tests.yaml around lines 83 - 87, The unit-tests job
is always setting GOTAGS to "test" and ignores the matrix dimension; update the
GOTAGS environment passed to scripts/go-test.sh so it includes the matrix value
(`${{ matrix.gotags }}`) as a prefix before the existing test tag (so the matrix
variations actually run with different GOTAGS), i.e. change the GOTAGS
assignment used when invoking go-test.sh (the GOTAGS=... before
scripts/go-test.sh) to incorporate `${{ matrix.gotags }}` while preserving the
existing ",test" behavior.

Comment on lines +185 to +186
- name: Cache Go dependencies
uses: ./.github/actions/cache-go-dependencies
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing key-suffix for cache differentiation between gotags matrix runs.

Other jobs include key-suffix: ${{ matrix.gotags }}-... to prevent cache collisions between matrix variations. This job omits it, which could cause cache conflicts.

🔧 Proposed fix
     - name: Cache Go dependencies
       uses: ./.github/actions/cache-go-dependencies
+      with:
+        key-suffix: ${{ matrix.gotags }}-operator-integration
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Cache Go dependencies
uses: ./.github/actions/cache-go-dependencies
- name: Cache Go dependencies
uses: ./.github/actions/cache-go-dependencies
with:
key-suffix: ${{ matrix.gotags }}-operator-integration
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/unit-tests.yaml around lines 185 - 186, The "Cache Go
dependencies" step using ./.github/actions/cache-go-dependencies is missing a
key-suffix to separate caches for different matrix runs; update that step to
include a key-suffix like ${{ matrix.gotags }} (or ${{ matrix.gotags
}}-<other-identifiers>) so the cache key differs per gotags matrix entry and
prevents collisions between matrix variations.

@davdhacs davdhacs changed the title perf(ci): shard go and go-postgres unit test jobs perf(ci): shard go/go-postgres jobs, add SKIP_DEPS and postgres subtargets Apr 1, 2026
davdhacs and others added 2 commits April 1, 2026 14:37
…est shard

Use ./... with exclude-pattern /pkg/|/central/ instead of listing
every top-level directory. Automatically picks up new directories.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match the other test jobs' stable ldflag env vars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs changed the title perf(ci): shard go/go-postgres jobs, add SKIP_DEPS and postgres subtargets perf(ci): shard go and go-postgres unit test jobs Apr 1, 2026
davdhacs and others added 4 commits April 1, 2026 15:05
The operator Makefile needs the scanner module in GOMODCACHE for
proto generation. Previously inherited from the go job's make deps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Operator tests use their own test runner, not scripts/go-test.sh,
so test-output/test.log is never created. The generate-junit-reports
make target fails when the log doesn't exist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explain why each job overrides GOTOOLCHAIN from local to auto:
setup-go sets GOTOOLCHAIN=local, but sub-module tools with newer
go directives need auto to download the required toolchain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explain that setup-go reads go.mod to install Go, then exports
GOTOOLCHAIN=local to force that version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify that setup-go exports GOTOOLCHAIN matching the go.mod
version, not hardcoded to "local".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs force-pushed the davdhacs/shard-go-tests branch from 07f23c0 to b12da02 Compare April 1, 2026 21:43
davdhacs and others added 2 commits April 1, 2026 15:47
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The go-bench PR skip is handled separately in #19758.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.02%. Comparing base (f8de23e) to head (9c2e078).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #19744       +/-   ##
===========================================
- Coverage   49.59%   49.02%    -0.58%     
===========================================
  Files        2760      768     -1992     
  Lines      208144    74767   -133377     
===========================================
- Hits       103239    36652    -66587     
+ Misses      97239    34762    -62477     
+ Partials     7666     3353     -4313     
Flag Coverage Δ
go-unit-tests 49.02% <ø> (-0.58%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants