Skip to content

Conversation

@Iamrodos
Copy link
Contributor

@Iamrodos Iamrodos commented Dec 16, 2025

Refactors error handling and retry logic to be more robust and consistent.

There were a number of issues related to this which are now addressed. Tried to simplify the flow and make the main data retrieval route more readable.

Key changes:

  • Retry all 5xx errors, not just 502 (fixes API request returned HTTP 500: Internal Server Error #140)
  • Retry network errors including URLError, socket.error, and IncompleteRead (fixes Retry API requests which failed due to whatever reasons #110)
  • Retry JSON parse errors on HTTP 200 responses (transient corruption)
  • Add exponential backoff with jitter (1s base, 120s max, 10% jitter)
  • Respect retry-after and rate limit headers per GitHub API requirements
  • Consolidate retry logic into make_request_with_retry() wrapper
  • Improve error visibility with clear logging before failures
  • Remove dead code from 2016 that was intentionally disabled

Fixes #140, #110, #138

#138 suggests adding a --continue-on-error behavior, this is more part of a per-repo solution that relates to #244

Retry behavior

Error Type Before After
HTTP 500 Fails immediately Retries with backoff
HTTP 502 Retries 3x Retries 5x with backoff
HTTP 503, 504 Fails immediately Retries with backoff
HTTP 403 (rate limit) Waits for reset Waits for reset
HTTP 403 (permission) Fails Fails immediately
URLError Retries 3x Retries 5x with backoff
socket.error Retries 3x Retries 5x with backoff
IncompleteRead Retries 3x Retries 5x with backoff
JSON decode error Retries 3x Retries 5x with backoff

Logging improvements

Clear visibility into retry behavior and failures:

Event Level Example message
Retry attempt warning HTTP 502, retrying in 2.1s (attempt 1/5)
Connection retry warning Connection error: <urlopen error ...>, retrying in 1.0s (attempt 1/5)
JSON parse retry warning JSONDecodeError reading response / Retrying in 1.0s (attempt 1/5)
Throttling info Throttling: 50 requests left, pausing 30s
HTTP failure after retries error HTTP 502 failed after 5 attempts
Connection failure after retries error Connection error failed after 5 attempts: <error>
JSON parse failure after retries error Failed to read response after 5 attempts for <url>
Rate limit hint info Hint: Authenticate to raise your GitHub rate limit

Dead code removed

Parameter rename

The retrieve_data() function parameter single_request has been renamed to paginated with inverted logic:

  • Old: single_request=True meant "don't paginate"
  • New: paginated=False means "don't paginate"

This is more intuitive (positive framing). All internal call sites have been updated.

Tests

New test added to exercise retries. Live tested against a number of repos.

Refactors error handling to retry all 5xx errors (not just 502), network errors (URLError, socket.error, IncompleteRead), and JSON parse errors with exponential backoff and jitter. Respects retry-after and rate limit headers per GitHub API requirements. Consolidates retry logic into make_request_with_retry() wrapper and adds clear logging for retry attempts and failures. Removes dead code from 2016 (errors list, _request_http_error, _request_url_error) that was intentionally disabled in commit 1e5a904 to fix josegonzalez#29.

Fixes josegonzalez#140, josegonzalez#110, josegonzalez#138
@josegonzalez josegonzalez merged commit 27d3fcd into josegonzalez:master Dec 16, 2025
10 checks passed
@Iamrodos Iamrodos deleted the fix/retry-logic branch December 17, 2025 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants