Skip to content

[grafana-otel-advisor] OTel improvement: gh-aw.run.status silently reports 'success' on real agent failures #32958

@github-actions

Description

@github-actions

OTel Instrumentation Improvement: derive gh-aw.run.status from observable failure signals

Analysis Date: 2026-05-18
Priority: High
Effort: Small (< 2h)

Problem

The conclusion span attribute gh-aw.run.status and the OTLP span status.code are computed exclusively from process.env.GH_AW_AGENT_CONCLUSION (and the rarer workflow_run.conclusion) in actions/setup/js/send_otlp_span.cjs:1670-1683. Because GitHub Actions only exposes needs.<job>.result to downstream jobs, that env var is empty for the agent job's own post-step — and from the live data, it appears to be empty for downstream jobs too. The result is that every conclusion span in Tempo over the last 7 days carries gh-aw.run.status="success" and status.code=STATUS_CODE_OK, even on runs where the agent emitted errors.

A DevOps engineer cannot answer the most basic operational question — "which gh-aw runs failed in the last hour?" — by filtering on either span status or gh-aw.run.status in Grafana. The data exists (gh-aw.error_count, exception events) but is not surfaced through the conventional channels dashboards and alerting rules rely on.

Why This Matters (DevOps Perspective)
  • Alerting is blocked: Any rule of the form count_over_time({status=error}[5m]) > N returns 0 today. The only working failure signal is an in-payload exception event, which most TraceQL/PromQL dashboards do not aggregate cleanly.
  • MTTR increases: On-call engineers cannot triage by failure status; they must drill into individual traces or fall back to GitHub Actions UI.
  • Failure-rate SLOs are unmeasurable: success_rate = ok / total is stuck at 100% in every backend, masking real regressions.
  • The fix is local: the data needed to derive the correct status (outputErrors.length, hasNoReadableAgentOutput) is already read in the same function — it is just not consulted for runStatus.
Current Behavior

actions/setup/js/send_otlp_span.cjs:1670-1683:

let runStatus = "success";
const rawRunStatus = agentConclusion || workflowRunConclusion;
if (rawRunStatus === "cancelled") {
  runStatus = "cancelled";
} else if (rawRunStatus === "failure" || rawRunStatus === "timed_out") {
  runStatus = "failure";
}

if (isAgentFailure && errorMessages.length > 0) {
  statusMessage = `agent ${agentConclusion}: ${errorMessages[0]}`.slice(0, 256);
}

const attributes = [..., buildAttr("gh-aw.run.status", runStatus), ...];

And the span status code at line 1651:

const isAgentNonOK = isAgentFailure || isAgentCancelled;
const statusCode = isAgentNonOK ? 2 : 1;

Both depend entirely on agentConclusion (from GH_AW_AGENT_CONCLUSION). When that env var is empty — which is the observed reality for all spans in Tempo — runStatus stays "success" and statusCode stays 1 (OK), regardless of what the agent actually did.

Proposed Change

Fall back to observable failure signals (errors written to agent_output.json, or a missing/unreadable agent_output.json) when the env var path did not yield a non-success status. The same outputErrors and hasNoReadableAgentOutput values are already computed a few lines above.

// actions/setup/js/send_otlp_span.cjs — after the existing rawRunStatus block (~line 1676)

// Fallback: GH_AW_AGENT_CONCLUSION is empty on the agent job's own post step
// (GitHub Actions does not expose needs.<self>.result), and is often empty on
// downstream jobs as well. Derive the failure status from observable signals
// that this function already has in hand so dashboards and alerts can use
// gh-aw.run.status and span status_code as authoritative failure indicators.
if (runStatus === "success" && (outputErrors.length > 0 || hasNoReadableAgentOutput)) {
  runStatus = "failure";
}

// Re-derive the OTLP status from the (possibly upgraded) runStatus so the two stay in sync.
const statusCode = runStatus === "success" ? 1 : 2;
let statusMessage;
if (runStatus === "failure") {
  statusMessage = errorMessages[0]
    ? `agent failure: ${errorMessages[0]}`.slice(0, 256)
    : (agentConclusion ? `agent ${agentConclusion}` : "agent failure");
} else if (runStatus === "cancelled") {
  statusMessage = "agent cancelled";
}

The earlier const statusCode = isAgentNonOK ? 2 : 1; block and the if (isAgentFailure && errorMessages.length > 0) statusMessage assignment are removed (they become subsumed by the new derivation).

Expected Outcome
  • In Grafana / Tempo: {resource.service.name="gh-aw" && status=error} returns the real failing traces. {span.gh-aw.run.status="failure"} becomes a usable filter. The attribute-values index for gh-aw.run.status gains failure (and cancelled when relevant) instead of being stuck on a single success value.
  • In the JSONL mirror: failed runs are visibly different from successful ones at the top-level span status field, not just inside nested exception events.
  • For on-call engineers: a single TraceQL filter or alert rule is enough to find and page on agent failures. The existing gh-aw.error.messages attribute (already emitted) becomes immediately useful as a tooltip on those filtered spans.
Implementation Steps
  • Edit actions/setup/js/send_otlp_span.cjs around line 1676 as shown above.
  • Remove the now-superseded statusCode = isAgentNonOK ? 2 : 1 assignment (line ~1651) and the standalone if (isAgentFailure && errorMessages.length > 0) statusMessage block (lines ~1678-1680).
  • Update actions/setup/js/send_otlp_span.test.cjs to assert: (a) gh-aw.run.status="failure" and status.code=2 when agent_output.json contains errors but GH_AW_AGENT_CONCLUSION is empty, (b) same when agent_output.json is missing on the agent job, (c) existing agentConclusion=success path still emits status.code=1.
  • Run cd actions/setup/js && npx vitest run send_otlp_span.test.cjs to confirm tests pass.
  • Run make fmt and make test-unit.
  • Open a PR referencing this issue.
Evidence from Live Grafana Data

Queried Tempo datasource grafanacloud-traces over 2026-05-11T00:00:00Z2026-05-18T07:00:00Z:

  • tempo_get-attribute-values name="span.gh-aw.run.status" returns exactly one value: "success" across the entire 7-day window. No failure, no cancelled.
  • {resource.service.name="gh-aw" && status=error} returns 0 traces, despite the same data showing exception events on multiple traces.
  • {span.gh-aw.error_count=1} returns traces with real agent errors. Inspecting trace 5b3a7917f205e61028bd3d6b0f921c72 (gh-aw.copilot-cli-deep-research):
job=agent       span=gh-aw.agent.conclusion       status_code=STATUS_CODE_OK  run_status=success  error_count=1  agent_conclusion=None
  EVENT: exception  type=gh-aw.AgentError  message="Line 2: Too many items of type 'create_discussion'. Maximum allowed: 1."
job=detection   span=gh-aw.detection.conclusion   status_code=STATUS_CODE_OK  run_status=success  error_count=1  agent_conclusion=None
job=safe_outputs span=gh-aw.safe_outputs.conclusion status_code=STATUS_CODE_OK run_status=success  error_count=1  agent_conclusion=None
job=conclusion  span=gh-aw.conclusion.conclusion  status_code=STATUS_CODE_OK  run_status=success  error_count=1  agent_conclusion=None

Every single conclusion span in that failing run reports status_code=OK and gh-aw.run.status=success. The gh-aw.agent.conclusion attribute is absent on every span (it is also missing from the Tempo attribute-name index, confirming the env var path never set it in production).

Related Files
  • actions/setup/js/send_otlp_span.cjs (primary change)
  • actions/setup/js/send_otlp_span.test.cjs (test additions)
  • actions/setup/js/action_conclusion_otlp.cjs (caller — no change expected)
  • actions/setup/js/generate_observability_summary.cjs (consumer — verify summary reflects new status)

Generated by the Daily Grafana OTel Instrumentation Advisor workflow

Generated by 📊 Daily Grafana OTel Instrumentation Advisor · ● 19.8M ·

  • expires on May 25, 2026, 5:54 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions