Skip to content

Ci fast fail on job failure#5759

Closed
jyotshnayaparla-00 wants to merge 2 commits intofeldera:mainfrom
jyotshnayaparla-00:ci-fast-fail-on-job-failure
Closed

Ci fast fail on job failure#5759
jyotshnayaparla-00 wants to merge 2 commits intofeldera:mainfrom
jyotshnayaparla-00:ci-fast-fail-on-job-failure

Conversation

@jyotshnayaparla-00
Copy link
Copy Markdown
Contributor

@jyotshnayaparla-00 jyotshnayaparla-00 commented Mar 5, 2026

Add a 'check-prior-build' job that queries the GitHub API for a prior run on the same commit SHA. If all required binary and Docker digest artifacts are found unexpired, the Rust, Java, and Docker build jobs are skipped and test workflows download artifacts from that prior run instead of rebuilding.

@jyotshnayaparla-00 jyotshnayaparla-00 force-pushed the ci-fast-fail-on-job-failure branch from 67e0a3b to 12a497a Compare March 5, 2026 02:23
Copy link
Copy Markdown

@mythical-fred mythical-fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid design. The two-step fallback (consolidate-outputs always runs, defaults to empty run_id) means the job truly never fails and never blocks. The test-integration-runtime.yml gap is intentional — that job runs against deployed infra, not build artifacts.

One soft note: the artifact list fetch uses per_page=100. If a future run ever exceeds 100 artifacts, the check silently falls back to rebuilding (not a correctness bug, just a missed optimization). Worth a comment or bumping to 200.

needs: [invoke-build-rust, invoke-build-java]
needs: [check-prior-build, invoke-build-rust, invoke-build-java]
if: |
always() &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some special semantic associated with always() in gha? otherwise this is just if true && a && b and can be rewritten as if a && b

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always() is needed here, without it, GHA silently skips dependent jobs when a needs: job is skipped, so the result == 'skipped' checks would never be reached and test jobs would be silently skipped whenever the build phase is skipped.

always() is not true, it prevents GHA from skipping the job before evaluating the condition.

needs: [invoke-build-rust]
needs: [check-prior-build, invoke-build-rust]
if: |
always() &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same q as above

needs: [invoke-build-rust, invoke-build-java]
needs: [check-prior-build, invoke-build-rust, invoke-build-java]
if: |
always() &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

invoke-generate-sbom:
name: Generate SBOMs
needs: [invoke-build-docker]
if: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

name: Integration Tests
needs: [invoke-build-docker]
needs: [check-prior-build, invoke-build-docker]
if: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

name: Integration Tests
needs: [invoke-build-java]
needs: [check-prior-build, invoke-build-java]
if: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

name: Java Tests
needs: [invoke-build-java]
needs: [check-prior-build, invoke-build-java]
if: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

name: Publish Crates (Dry Run)
needs: [invoke-build-rust]
needs: [check-prior-build, invoke-build-rust]
if: |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same q

type: string
required: false
default: ""
workflow_dispatch:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also need a run id similar to test-integration-platform.yaml?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, added in the latest commit.

@gz
Copy link
Copy Markdown
Contributor

gz commented Mar 9, 2026

note that github won't allow this to merge when you have merge commits in the branch since we require a linear history on main

Add a check-prior-build job to ci.yml that queries the GitHub API for a
prior run on the same commit SHA. If all required binary and Docker digest
artifacts are found unexpired, the Rust, Java, and Docker build jobs are
skipped and test workflows download artifacts from that prior run instead
of rebuilding from scratch.

Each test workflow now accepts an artifacts_run_id input passed through
from ci.yml, and uses it as the run-id in actions/download-artifact so
cross-run artifact fetches work with the built-in action and no third-party
dependencies. Build jobs that are skipped leave result=skipped (not
failure), so sentinel cancel jobs and the final main job are unaffected.
- Replace digests-linux-{amd64,arm64} check with docker-image-ready
  artifact, which is only uploaded after the sha-{sha} tag is verified
  in GHCR; this prevents skip_docker=true when the tag doesn't exist yet
- Bump artifact list fetch from per_page=100 to per_page=200
- Add actions: read + contents: read permissions to test-unit and
  test-adapters workflows for cross-run artifact downloads
- Add actions: read + contents: read + packages: read permissions to
  oss-platform-tests job for cross-run artifact downloads and private
  GHCR image pulls
- Remove --cap-add=PERFMON from oss-platform-tests service options
  (not available in the k8s runner environment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants