Revert cache FIFO replacement policy to reset/clear #134

masklinn · 2022-08-27T10:46:14Z

Amongst other changes, #113 switched the cache to a FIFO inspired by
the standard library's re module, however it didn't really take
concurrency in account, so didn't really consider: that double-pops
are possible (probably why the stdlib ignores a bunch of errors),
which can cause KeyError during lookup (as two workers try to clear
the first key, one succeeds, and the other doesn't find the key and
fails).

It also has a few other less major issues:

double-inserts are possible, which can cause the cache to exceed set
capacity permanently by the number of concurrent workers
the stdlib's method only works properly with Python 3.6's naturally
ordered dict, but I'd rather not drop 2.7 compatibility from 0.x
unless there are very good causes to as, despite 2.7 having been
EOL'd in 2020, it still accounts for more downloads than 3.10
(according to pypistats)

Using an ordered dict would solve (3), and allow using an LRU rather
than a FIFO, but it would not actually prevent double-pops or
double-inserts, that would require a proper lock on lookup. Which
might not be that expensive but given the lack of a good dataset to
bench with, it seems a lot of additional complexity for something
we've got no visibility on. But that can be considered if someone
reports a serious performance regression from this.

So for now just revert to a "reset" cache replacement policy. If /
when we drop older versions we can switch to functools.lru_cache and
let the stdlib take care of this (and possibly have cache
stats). Alternatively if we get a good testing dataset one day we can
bench cache replacement policies or even provide pluggable policies.

Anyway fixes #132, closes #133

Amongst other changes, ua-parser#113 switched the cache to a FIFO inspired by the standard library's re module, however it didn't really take concurrency in account, so didn't really consider: that double-pops are possible (probably why the stdlib ignores a bunch of errors), which can cause KeyError during lookup (as two workers try to clear the first key, one succeeds, and the other doesn't find the key and fails). It also has a few other less major issues: - double-inserts are possible, which can cause the cache to exceed set capacity permanently by the number of concurrent workers - the stdlib's method only works properly with Python 3.6's naturally ordered `dict`, but I'd rather not drop 2.7 compatibility from 0.x unless there are very good causes to as, despite 2.7 having been EOL'd in 2020, it still accounts for more downloads than 3.10 (according to pypistats) Using an ordered dict would solve (3), and allow using an LRU rather than a FIFO, but it would not actually prevent double-pops or double-inserts, that would require a proper lock on lookup. Which might not be that expensive but given the lack of a good dataset to bench with, it seems a lot of additional complexity for something we've got no visibility on. But that can be considered if someone reports a serious performance regression from this. So for now just revert to a "reset" cache replacement policy. If / when we drop older versions we can switch to `functools.lru_cache` and let the stdlib take care of this (and possibly have cache stats). Alternatively if we get a good testing dataset one day we can bench cache replacement policies or even provide pluggable policies. Anyway fixes ua-parser#132, closes ua-parser#133

masklinn force-pushed the revert-cache-fifo branch from b0601ce to f77ba6d Compare August 27, 2022 11:20

masklinn merged commit 0939d42 into ua-parser:0.15.x Aug 27, 2022

masklinn deleted the revert-cache-fifo branch August 27, 2022 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert cache FIFO replacement policy to reset/clear #134

Revert cache FIFO replacement policy to reset/clear #134

Uh oh!

masklinn commented Aug 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Revert cache FIFO replacement policy to reset/clear #134

Revert cache FIFO replacement policy to reset/clear #134

Uh oh!

Conversation

masklinn commented Aug 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant