Skip to content

Fix autoscaled pool scaling behavior on 429 Too Many Requests #1437

@vdusek

Description

@vdusek

Description

  • Crawlee does not currently handle 429 Too Many Requests responses correctly.
  • When a target server starts returning 429s, Crawlee does not slow down.
  • Instead, due to the current autoscaled pool logic, Crawlee may actually scale concurrency up when responses get slower (because of less CPU work).
  • This creates a "death spiral" - the slower the server, the faster Crawlee increases concurrency, which can quickly overwhelm small websites.

Proposed solution

  • Detect 429 responses and implement proper backoff logic (reducing concurrency of autoscaled pool, cooldown period, ...).
  • Ensure the autoscaled pool does not interpret slow responses or 429s as a signal to increase concurrency.
  • Consider respecting Retry-After headers if present.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions