Skip to content

Many CI workflows are failing with KeyError: 'jobs' #91332

@kit1980

Description

@kit1980

This is probably a GitHub issue, but https://www.githubstatus.com/ currently says "All Systems Operational".

Current Status

Mitigated: it seems to resolve itself, at least temporarily.

Error looks like

++ python3 .github/scripts/get_workflow_job_id.py 3761062705 i-04939a5bd44132575
Traceback (most recent call last):
  File ".github/scripts/get_workflow_job_id.py", line 48, in <module>
    jobs = response.json()["jobs"]
KeyError: 'jobs'

After #91145, now this is the error message:

RuntimeError: ('Is github alright?', "Recieved status code '502' when attempting to retrieve runs:\n", '{\n "message": "Server Error"\n}\n')

Found a simple repro: this gives Server Error 502 currently
https://api.github.com/repos/pytorch/pytorch/actions/runs/3761801620/jobs?per_page=100

For some other jobs, it's fine.

Incident timeline (all times pacific)

Include when the incident began, when it was detected, mitigated, root caused, and finally closed.

  • 2022-12-22 ~14:00 PT incident began

User impact

Almost every workflow is failing with the above error

Root cause

Probably a GitHub issue.

Mitigation

How did we mitigate the issue?

Prevention/followups

How do we prevent issues like this in the future?

cc @seemethere @malfet @pytorch/pytorch-dev-infra

Metadata

Metadata

Assignees

Labels

module: ciRelated to continuous integrationtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions