Mercurial > p > roundup > code
comparison roundup/backends/back_postgresql.py @ 7668:5b41018617f2
fix: out of memory error when importing under postgresql
If you try importing more than 20k items under postgresql you can run
out of memory:
psycopg2.errors.OutOfMemory: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Tuning memory may help, it's unknown at this point.
This checkin forces a commit to the postgres database after 10,000
rows have been added. This clears out the savepoints for each row and
starts a new transaction.
back_postgresql.py:
Implement commit mechanism in checkpoint_data(). Add two class level
attributes for tracking the number of savepoints and the limit when
the commit should happen.
roundup_admin.py:
implement pragma and dynamically create the config item
RDBMS_SAVEPOINT_LIMIT used by checkpoint_data.
Also fixed formatting of descriptions when using pragma list in
verbose mode.
admin_guide.txt, upgrading.txt:
Document change and use of pragma savepoint_limit in roundup-admin
for changing the default of 10,000.
test/db_test_base.py:
add some more asserts. In existing testAdminImportExport, set the
savepoint limit to 5 to test setting method and so that the commit
code will be run by existing tests. This provides coverage, but
does not actually test that the commit is done every 5 savepoints
8-(. The verification of every 5 savepoints was done manually
using a pdb breakpoint just before the commit.
acknowledgements.txt:
Added 2.4.0 section mentioning Norbert as he has done a ton of
testing with much larger datasets than I can test with.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Thu, 19 Oct 2023 16:11:25 -0400 |
| parents | b5062cb5c2a2 |
| children | 4af0d235b570 |
comparison
equal
deleted
inserted
replaced
| 7667:08e4399c3ae4 | 7668:5b41018617f2 |
|---|---|
| 150 attributes: | 150 attributes: |
| 151 dbtype: | 151 dbtype: |
| 152 holds the value for the type of db. It is used by indexer to | 152 holds the value for the type of db. It is used by indexer to |
| 153 identify the database type so it can import the correct indexer | 153 identify the database type so it can import the correct indexer |
| 154 module when using native text search mode. | 154 module when using native text search mode. |
| 155 | |
| 156 import_savepoint_count: | |
| 157 count the number of savepoints that have been created during | |
| 158 import. Once the limit of savepoints is reached, a commit is | |
| 159 done and this is reset to 0. | |
| 160 | |
| 155 """ | 161 """ |
| 156 | 162 |
| 157 arg = '%s' | 163 arg = '%s' |
| 158 | 164 |
| 159 dbtype = "postgres" | 165 dbtype = "postgres" |
| 166 | |
| 167 import_savepoint_count = 0 | |
| 168 | |
| 169 # Value is set from roundup-admin using db.config["RDBMS_SAVEPOINT_LIMIT"] | |
| 170 # or to the default of 10_000 at runtime. Use 0 here to trigger | |
| 171 # initialization. | |
| 172 savepoint_limit = 0 | |
| 160 | 173 |
| 161 # used by some code to switch styles of query | 174 # used by some code to switch styles of query |
| 162 implements_intersect = 1 | 175 implements_intersect = 1 |
| 163 | 176 |
| 164 def sql_open_connection(self): | 177 def sql_open_connection(self): |
| 216 def checkpoint_data(self, savepoint="importing"): | 229 def checkpoint_data(self, savepoint="importing"): |
| 217 """Create a subtransaction savepoint. Allows recovery/retry | 230 """Create a subtransaction savepoint. Allows recovery/retry |
| 218 of operation in exception handler because | 231 of operation in exception handler because |
| 219 postgres requires a rollback in case of error | 232 postgres requires a rollback in case of error |
| 220 generating exception. Used with | 233 generating exception. Used with |
| 221 restore_connecion_on_error to handle uniqueness | 234 restore_connection_on_error to handle uniqueness |
| 222 conflict in import_table(). | 235 conflict in import_table(). |
| 236 | |
| 237 Savepoints take memory resources. Postgres keeps all | |
| 238 savepoints (rather than overwriting) until a | |
| 239 commit(). Commit every ~10,000 savepoints to prevent | |
| 240 running out of memory on import. | |
| 241 | |
| 242 NOTE: a commit outside of this method will not reset the | |
| 243 import_savepoint_count. This can result in an unneeded | |
| 244 commit on a new cursor (that has no savepoints) as there is | |
| 245 no way to find out if there is a savepoint or how many | |
| 246 savepoints are opened on a db connection/cursor. | |
| 247 | |
| 248 Because an import is a one shot deal and not part of a long | |
| 249 running daemon (e.g. the roundup-server), I am not too | |
| 250 worried about it. It will just slow the import down a tad. | |
| 223 """ | 251 """ |
| 224 # Savepoints take resources. Postgres keeps all | 252 |
| 225 # savepoints (rather than overwriting) until a | |
| 226 # commit(). If an import fails because of a resource | |
| 227 # issue with savepoints, uncomment this line. I | |
| 228 # expect it will slow down the import but it should | |
| 229 # eliminate any issue with stored savepoints and | |
| 230 # resource use. | |
| 231 # | |
| 232 # self.sql('RELEASE SAVEPOINT %s' % savepoint) | |
| 233 self.sql('SAVEPOINT %s' % savepoint) | 253 self.sql('SAVEPOINT %s' % savepoint) |
| 254 | |
| 255 self.import_savepoint_count += 1 | |
| 256 | |
| 257 if not self.savepoint_limit: | |
| 258 if "RDBMS_SAVEPOINT_LIMIT" in self.config.keys(): | |
| 259 # note this config option is created on the fly | |
| 260 # by admin.py::do_import. It is never listed in | |
| 261 # config.ini. | |
| 262 self.savepoint_limit = self.config["RDBMS_SAVEPOINT_LIMIT"] | |
| 263 else: | |
| 264 self.savepoint_limit = 10000 | |
| 265 | |
| 266 if self.import_savepoint_count > self.savepoint_limit: | |
| 267 # track savepoints and commit every 10000 (or user value) | |
| 268 # so we don't run postgres out of memory. An import of a | |
| 269 # customer's tracker ran out of memory after importing | |
| 270 # ~23000 items with: psycopg2.errors.OutOfMemory: out of | |
| 271 # shared memory HINT: You might need to increase | |
| 272 # max_locks_per_transaction. | |
| 273 | |
| 274 self.commit() | |
| 275 self.import_savepoint_count = 0 | |
| 234 | 276 |
| 235 def restore_connection_on_error(self, savepoint="importing"): | 277 def restore_connection_on_error(self, savepoint="importing"): |
| 236 """Postgres leaves a connection/cursor in an unusable state | 278 """Postgres leaves a connection/cursor in an unusable state |
| 237 after an error. Rollback the transaction to a | 279 after an error. Rollback the transaction to a |
| 238 previous savepoint and permit a retry of the | 280 previous savepoint and permit a retry of the |
