comparison roundup/backends/sessions_dbm.py @ 6565:2c2dbfc332ba

Try to handle multiple connections better. The session database is a hot spot. When multiple requests (e.g. 20) come in at the same time session database contention can get great. The original code didn't retry session database access when the open failed. This resulted in errors at the client. The second pass delayed 0.01 seconds and retried. It was better but we still had multiple second stalls. I think the first request got in, everybody else backed up and then retried at the same time. Again they stepped on each other. With logging I would see many counters go all the way to low single digits or to -1 indicating falure. This pass uses randomint to generate delays from 0-.125 seconds in 5ms increments. This performs better in testing. I rarely saw a counter less than 13 (2 failed retries). Current logging starts after 6 failures and counts down until success or failure.
author John Rouillard <rouilj@ieee.org>
date Thu, 16 Dec 2021 20:02:00 -0500
parents bef1e42be04c
children b4d0b48b3096
comparison
equal deleted inserted replaced
6564:21c7c2041a4b 6565:2c2dbfc332ba
4 Yes, it's called "sessions" - because originally it only defined a session 4 Yes, it's called "sessions" - because originally it only defined a session
5 class. It's now also used for One Time Key handling too. 5 class. It's now also used for One Time Key handling too.
6 """ 6 """
7 __docformat__ = 'restructuredtext' 7 __docformat__ = 'restructuredtext'
8 8
9 import os, marshal, time 9 import os, marshal, time, logging, random
10 10
11 from roundup.anypy.html import html_escape as escape 11 from roundup.anypy.html import html_escape as escape
12 12
13 from roundup import hyperdb 13 from roundup import hyperdb
14 from roundup.i18n import _ 14 from roundup.i18n import _
130 130
131 # open the database with the correct module 131 # open the database with the correct module
132 dbm = __import__(db_type) 132 dbm = __import__(db_type)
133 133
134 retries_left = 15 134 retries_left = 15
135 logger = logging.getLogger('roundup.hyperdb.backend.sessions')
135 while True: 136 while True:
136 try: 137 try:
137 handle = dbm.open(path, mode) 138 handle = dbm.open(path, mode)
138 break 139 break
139 except OSError: 140 except OSError as e:
140 # Primarily we want to catch and retry: 141 # Primarily we want to catch and retry:
141 # [Errno 11] Resource temporarily unavailable retry 142 # [Errno 11] Resource temporarily unavailable retry
142 # FIXME: make this more specific 143 # FIXME: make this more specific
144 if retries_left < 10:
145 logger.warning('dbm.open failed, retrying %s left: %s'%(retries_left,e))
143 if retries_left < 0: 146 if retries_left < 0:
144 # We have used up the retries. Reraise the exception 147 # We have used up the retries. Reraise the exception
145 # that got us here. 148 # that got us here.
146 raise 149 raise
147 else: 150 else:
148 # delay retry a bit 151 # stagger retry to try to get around thundering herd issue.
149 time.sleep(0.01) 152 time.sleep(random.randint(0,25)*.005)
150 retries_left = retries_left - 1 153 retries_left = retries_left - 1
151 continue # the while loop 154 continue # the while loop
152 return handle 155 return handle
153 156
154 def commit(self): 157 def commit(self):

Roundup Issue Tracker: http://roundup-tracker.org/