fix: Improve indexing of the request_queue_records table for SqlRequestQueueClient#1527
fix: Improve indexing of the request_queue_records table for SqlRequestQueueClient#1527vdusek merged 2 commits intoapify:masterfrom
request_queue_records table for SqlRequestQueueClient#1527Conversation
|
@ericvg97, thank you for your analysis and for finding this. Is it possible for you to check the new index in your environment and confirm that the queue is working correctly with it? |
It doesn't work in my environment. The query I shared was incorrect, the actual query crawlee is doing is Notice the condition "is_handled IS false" vs "is_handled = false" that I shared before. When doing the first, the planner doesn't use the index you are proposing. When doing the second, the planner does use the index. I didn't know this could happen in postgres TBH. Can you check this also happens in your environmnet |
|
Changing "postgresql_where=text('is_handled = false')", to "postgresql_where=text('is_handled is false') fixes it in my environment. |
@ericvg97 Thank you, I double-checked the query and updated the index. It works the same way in my environment.
Yes, I think it's because it's a partition index. Therefore, it is important that the expression fully matches the index. Thanks again for double-checking this and helping to better identify the error. I really appreciate it. |
Description
request_queue_recordstable, as the previous version of the index was not efficient for retrieving requests.EXPLAIN result for over a million requests
Old index
New index
Issues