Skip to content

fix: Handle the case when error_handler returns Request#1595

Merged
Pijukatel merged 3 commits intoapify:masterfrom
Mantisus:new-request-error-handler
Dec 5, 2025
Merged

fix: Handle the case when error_handler returns Request#1595
Pijukatel merged 3 commits intoapify:masterfrom
Mantisus:new-request-error-handler

Conversation

@Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Dec 4, 2025

Description

  • This PR fixes the behavior of crawler when error_handler returns Request. The old behavior resulted in the queue never reaching the empty state.

Testing

  • Add new test

@Mantisus Mantisus self-assigned this Dec 4, 2025
@Mantisus Mantisus requested a review from janbuchar December 4, 2025 03:14
@Mantisus
Copy link
Collaborator Author

Mantisus commented Dec 4, 2025

I made a fix in accordance with the current signature for ErrorHandler.

But the signature of ErrorHandler does not match the TS version. In TS, it only returns 'None'.

Comment on lines 1140 to 1147
await wait_for(
lambda: request_manager.mark_request_as_handled(request),
timeout=self._internal_timeout,
timeout_message='Marking request as handled timed out after '
f'{self._internal_timeout.total_seconds()} seconds',
logger=self._logger,
max_retries=3,
)
Copy link
Collaborator

@Pijukatel Pijukatel Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this is repeated 6x times in BasicCrawler in some variants. I would consider creating a utility function for it.

async def some_very_good_name(self, request: Request)->None:
            request_manager = await self.get_request_manager()
            await wait_for(
                lambda: request_manager.mark_request_as_handled(request),
                timeout=self._internal_timeout,
                timeout_message='Marking request as handled timed out after '
                f'{self._internal_timeout.total_seconds()} seconds',
                logger=self._logger,
                max_retries=3,
            )

(no need to worry about request_manager = await self.get_request_manager(), it will resolve to self._request_manager)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we can use just a private method async def _mark_request_as_handled(self, request: Request) -> None:.

Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Pijukatel Pijukatel merged commit 8a961a2 into apify:master Dec 5, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants