comparison roundup/backends/back_postgresql.py @ 7668:5b41018617f2

fix: out of memory error when importing under postgresql If you try importing more than 20k items under postgresql you can run out of memory: psycopg2.errors.OutOfMemory: out of shared memory HINT: You might need to increase max_locks_per_transaction. Tuning memory may help, it's unknown at this point. This checkin forces a commit to the postgres database after 10,000 rows have been added. This clears out the savepoints for each row and starts a new transaction. back_postgresql.py: Implement commit mechanism in checkpoint_data(). Add two class level attributes for tracking the number of savepoints and the limit when the commit should happen. roundup_admin.py: implement pragma and dynamically create the config item RDBMS_SAVEPOINT_LIMIT used by checkpoint_data. Also fixed formatting of descriptions when using pragma list in verbose mode. admin_guide.txt, upgrading.txt: Document change and use of pragma savepoint_limit in roundup-admin for changing the default of 10,000. test/db_test_base.py: add some more asserts. In existing testAdminImportExport, set the savepoint limit to 5 to test setting method and so that the commit code will be run by existing tests. This provides coverage, but does not actually test that the commit is done every 5 savepoints 8-(. The verification of every 5 savepoints was done manually using a pdb breakpoint just before the commit. acknowledgements.txt: Added 2.4.0 section mentioning Norbert as he has done a ton of testing with much larger datasets than I can test with.
author John Rouillard <rouilj@ieee.org>
date Thu, 19 Oct 2023 16:11:25 -0400
parents b5062cb5c2a2
children 4af0d235b570
comparison
equal deleted inserted replaced
7667:08e4399c3ae4 7668:5b41018617f2
150 attributes: 150 attributes:
151 dbtype: 151 dbtype:
152 holds the value for the type of db. It is used by indexer to 152 holds the value for the type of db. It is used by indexer to
153 identify the database type so it can import the correct indexer 153 identify the database type so it can import the correct indexer
154 module when using native text search mode. 154 module when using native text search mode.
155
156 import_savepoint_count:
157 count the number of savepoints that have been created during
158 import. Once the limit of savepoints is reached, a commit is
159 done and this is reset to 0.
160
155 """ 161 """
156 162
157 arg = '%s' 163 arg = '%s'
158 164
159 dbtype = "postgres" 165 dbtype = "postgres"
166
167 import_savepoint_count = 0
168
169 # Value is set from roundup-admin using db.config["RDBMS_SAVEPOINT_LIMIT"]
170 # or to the default of 10_000 at runtime. Use 0 here to trigger
171 # initialization.
172 savepoint_limit = 0
160 173
161 # used by some code to switch styles of query 174 # used by some code to switch styles of query
162 implements_intersect = 1 175 implements_intersect = 1
163 176
164 def sql_open_connection(self): 177 def sql_open_connection(self):
216 def checkpoint_data(self, savepoint="importing"): 229 def checkpoint_data(self, savepoint="importing"):
217 """Create a subtransaction savepoint. Allows recovery/retry 230 """Create a subtransaction savepoint. Allows recovery/retry
218 of operation in exception handler because 231 of operation in exception handler because
219 postgres requires a rollback in case of error 232 postgres requires a rollback in case of error
220 generating exception. Used with 233 generating exception. Used with
221 restore_connecion_on_error to handle uniqueness 234 restore_connection_on_error to handle uniqueness
222 conflict in import_table(). 235 conflict in import_table().
236
237 Savepoints take memory resources. Postgres keeps all
238 savepoints (rather than overwriting) until a
239 commit(). Commit every ~10,000 savepoints to prevent
240 running out of memory on import.
241
242 NOTE: a commit outside of this method will not reset the
243 import_savepoint_count. This can result in an unneeded
244 commit on a new cursor (that has no savepoints) as there is
245 no way to find out if there is a savepoint or how many
246 savepoints are opened on a db connection/cursor.
247
248 Because an import is a one shot deal and not part of a long
249 running daemon (e.g. the roundup-server), I am not too
250 worried about it. It will just slow the import down a tad.
223 """ 251 """
224 # Savepoints take resources. Postgres keeps all 252
225 # savepoints (rather than overwriting) until a
226 # commit(). If an import fails because of a resource
227 # issue with savepoints, uncomment this line. I
228 # expect it will slow down the import but it should
229 # eliminate any issue with stored savepoints and
230 # resource use.
231 #
232 # self.sql('RELEASE SAVEPOINT %s' % savepoint)
233 self.sql('SAVEPOINT %s' % savepoint) 253 self.sql('SAVEPOINT %s' % savepoint)
254
255 self.import_savepoint_count += 1
256
257 if not self.savepoint_limit:
258 if "RDBMS_SAVEPOINT_LIMIT" in self.config.keys():
259 # note this config option is created on the fly
260 # by admin.py::do_import. It is never listed in
261 # config.ini.
262 self.savepoint_limit = self.config["RDBMS_SAVEPOINT_LIMIT"]
263 else:
264 self.savepoint_limit = 10000
265
266 if self.import_savepoint_count > self.savepoint_limit:
267 # track savepoints and commit every 10000 (or user value)
268 # so we don't run postgres out of memory. An import of a
269 # customer's tracker ran out of memory after importing
270 # ~23000 items with: psycopg2.errors.OutOfMemory: out of
271 # shared memory HINT: You might need to increase
272 # max_locks_per_transaction.
273
274 self.commit()
275 self.import_savepoint_count = 0
234 276
235 def restore_connection_on_error(self, savepoint="importing"): 277 def restore_connection_on_error(self, savepoint="importing"):
236 """Postgres leaves a connection/cursor in an unusable state 278 """Postgres leaves a connection/cursor in an unusable state
237 after an error. Rollback the transaction to a 279 after an error. Rollback the transaction to a
238 previous savepoint and permit a retry of the 280 previous savepoint and permit a retry of the

Roundup Issue Tracker: http://roundup-tracker.org/