fix: Fix KeyValueStore.auto_saved_value failing in some scenarios#1438
fix: Fix KeyValueStore.auto_saved_value failing in some scenarios#1438
KeyValueStore.auto_saved_value failing in some scenarios#1438Conversation
Explicit kvs to RecoverableState
|
This PR replaces #1368. That PR included many changes that were already released in Crawlee v1.0, and continuing in the previous PR would not be good for review. |
97b11d4 to
537ed1a
Compare
vdusek
left a comment
There was a problem hiding this comment.
I wouldn't be afraid to call it a fix if it's (according to the description) fixing something, and then it should be included in the changelog.
But also, I'm not 100% I understand why we're doing this:
Reduce the amount of global side effects in service_locator by using an explicit KVS factory in RecoverableState.
What side effects?
Fix KeyValueStore.auto_saved_value not working properly if the global storage_client was different from the current kvs storage client.
👍
| metadata=metadata, | ||
| path_to_rq=path_to_rq, | ||
| lock=asyncio.Lock(), | ||
| recoverable_state=await cls._create_recoverable_state(id=metadata.id, configuration=configuration), |
There was a problem hiding this comment.
Instead of creating it three times, we can create it once, store it in a variable, and just pass it where needed.
There was a problem hiding this comment.
That creation requires metadata to get the RQ.id, so we have to repeat this call, as in all three branches, we get metadata in a different way.
| from crawlee.storage_clients import FileSystemStorageClient # noqa: PLC0415 avoid circular import | ||
| from crawlee.storages import KeyValueStore # noqa: PLC0415 avoid circular import | ||
|
|
||
| return await KeyValueStore.open(storage_client=FileSystemStorageClient(), configuration=configuration) |
There was a problem hiding this comment.
Creating a fresh filesystem storage client to open a request queue feels wrong - at this point, we can be pretty sure that another one already exists. Is there a specific reason to do this or is it just because you don't have access to the existing one?
There was a problem hiding this comment.
At that point, we are not sure it exists. It could have been created through a class method without the client and even when created with the help of client, it is out of scope:
await FileSystemRequestQueueClient.open(...)
And why not open the KVS through such a class method as well? Because that way, you bypass the storage instance manager - and that is generally something we do not want.
FileSystemStorageClient is just a helper factory class, which is mainly for convenience and for registering the storage instance manager.
| assert request_data['url'].startswith('https://example.com/') | ||
|
|
||
|
|
||
| async def test_opening_rq_does_not_have_side_effect_on_service_locator( |
There was a problem hiding this comment.
Can you add some explanation here? Also, inlining the rq_client fixture could lead to better readable code
|
|
||
| # Reset global class variables to ensure test isolation. | ||
| KeyValueStore._autosaved_values = {} | ||
| Statistics._Statistics__next_id = 0 # type:ignore[attr-defined] # Mangled attribute |
There was a problem hiding this comment.
How does this contribute towards test isolation? Is there anything that depends on the persist state key that is derived from the ID?
There was a problem hiding this comment.
The answer is I am not sure. But we have so many tests, so I think it is best if we restore all we can to the same state at the beginning of the test. This reduces the chance of some weird behavior based on the order of the test execution.
Each time you use There are different cases, where I am not sure, for example, |
KeyValueStore.auto_saved_value failing in some scenarios
Description
service_locatorby using an explicit KVS factory inRecoverableState.KeyValueStore.auto_saved_valuenot working properly if the global storage_client was different from the current kvs storage client.Issues
Testing