Mercurial > p > roundup > code
diff roundup/backends/sessions_dbm.py @ 6565:2c2dbfc332ba
Try to handle multiple connections better.
The session database is a hot spot. When multiple requests (e.g. 20)
come in at the same time session database contention can get great.
The original code didn't retry session database access when the open
failed. This resulted in errors at the client.
The second pass delayed 0.01 seconds and retried. It was better but we
still had multiple second stalls. I think the first request got in,
everybody else backed up and then retried at the same time. Again they
stepped on each other. With logging I would see many counters go all
the way to low single digits or to -1 indicating falure.
This pass uses randomint to generate delays from 0-.125 seconds in 5ms
increments. This performs better in testing. I rarely saw a counter
less than 13 (2 failed retries). Current logging starts after 6
failures and counts down until success or failure.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Thu, 16 Dec 2021 20:02:00 -0500 |
| parents | bef1e42be04c |
| children | b4d0b48b3096 |
line wrap: on
line diff
--- a/roundup/backends/sessions_dbm.py Wed Dec 15 23:52:25 2021 -0500 +++ b/roundup/backends/sessions_dbm.py Thu Dec 16 20:02:00 2021 -0500 @@ -6,7 +6,7 @@ """ __docformat__ = 'restructuredtext' -import os, marshal, time +import os, marshal, time, logging, random from roundup.anypy.html import html_escape as escape @@ -132,21 +132,24 @@ dbm = __import__(db_type) retries_left = 15 + logger = logging.getLogger('roundup.hyperdb.backend.sessions') while True: try: handle = dbm.open(path, mode) break - except OSError: + except OSError as e: # Primarily we want to catch and retry: # [Errno 11] Resource temporarily unavailable retry # FIXME: make this more specific + if retries_left < 10: + logger.warning('dbm.open failed, retrying %s left: %s'%(retries_left,e)) if retries_left < 0: # We have used up the retries. Reraise the exception # that got us here. raise else: - # delay retry a bit - time.sleep(0.01) + # stagger retry to try to get around thundering herd issue. + time.sleep(random.randint(0,25)*.005) retries_left = retries_left - 1 continue # the while loop return handle
