Mercurial > p > roundup > code
changeset 7668:5b41018617f2
fix: out of memory error when importing under postgresql
If you try importing more than 20k items under postgresql you can run
out of memory:
psycopg2.errors.OutOfMemory: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Tuning memory may help, it's unknown at this point.
This checkin forces a commit to the postgres database after 10,000
rows have been added. This clears out the savepoints for each row and
starts a new transaction.
back_postgresql.py:
Implement commit mechanism in checkpoint_data(). Add two class level
attributes for tracking the number of savepoints and the limit when
the commit should happen.
roundup_admin.py:
implement pragma and dynamically create the config item
RDBMS_SAVEPOINT_LIMIT used by checkpoint_data.
Also fixed formatting of descriptions when using pragma list in
verbose mode.
admin_guide.txt, upgrading.txt:
Document change and use of pragma savepoint_limit in roundup-admin
for changing the default of 10,000.
test/db_test_base.py:
add some more asserts. In existing testAdminImportExport, set the
savepoint limit to 5 to test setting method and so that the commit
code will be run by existing tests. This provides coverage, but
does not actually test that the commit is done every 5 savepoints
8-(. The verification of every 5 savepoints was done manually
using a pdb breakpoint just before the commit.
acknowledgements.txt:
Added 2.4.0 section mentioning Norbert as he has done a ton of
testing with much larger datasets than I can test with.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Thu, 19 Oct 2023 16:11:25 -0400 |
| parents | 08e4399c3ae4 |
| children | fa5c3c86bcf2 |
| files | CHANGES.txt doc/acknowledgements.txt doc/admin_guide.txt doc/upgrading.txt roundup/admin.py roundup/backends/back_postgresql.py test/db_test_base.py |
| diffstat | 7 files changed, 138 insertions(+), 23 deletions(-) [+] |
line wrap: on
line diff
--- a/CHANGES.txt Thu Oct 19 14:07:56 2023 -0400 +++ b/CHANGES.txt Thu Oct 19 16:11:25 2023 -0400 @@ -59,6 +59,9 @@ - Fix error handling so failure during import of a non-user item doesn't cause a second traceback. (Found by Norbert Schlemmer, fix John Rouillard) +- Handle out of memory error when importing large trackers in + PostgreSQL. (Found by Norbert Schlemmer, extensive testing by + Norbert, fix John Rouillard) Features:
--- a/doc/acknowledgements.txt Thu Oct 19 14:07:56 2023 -0400 +++ b/doc/acknowledgements.txt Thu Oct 19 16:11:25 2023 -0400 @@ -16,6 +16,24 @@ .. _`Announcement with changelog for current release.`: announcement.html +2.4 +--- + +2.4.0 +~~~~~ + +Maintainer: John Rouillard + +Release Manager: John Rouillard + +Developer activity by changesets:: + + TBD + +Other contributers + +Norbert Schlemmer + 2.3 ---
--- a/doc/admin_guide.txt Thu Oct 19 14:07:56 2023 -0400 +++ b/doc/admin_guide.txt Thu Oct 19 16:11:25 2023 -0400 @@ -962,6 +962,16 @@ move the new tracker home into its place. 9. Restart web and email frontends. +If you are importing into PostgreSQL, it autocommits the data every +10000 objects/rows by default. This can slow down importing, but it +prevents an out of memory error caused by using a savepoint for each +object. You can control the commit frequency by using:: + + pragma savepoint_limit=20000 + +to set a higher or lower number in roundup-admin. In this example a +commit will be done every 20,000 objects/rows. The pragma can also be +set on the roundup-admin command line as described below. Moving a Tracker ----------------
--- a/doc/upgrading.txt Thu Oct 19 14:07:56 2023 -0400 +++ b/doc/upgrading.txt Thu Oct 19 16:11:25 2023 -0400 @@ -141,6 +141,20 @@ level Roundup customization. If you have, use one of the imports above if you plan on running on Python 3.13 (expected in 2024) or newer. +Fixing PostgreSQL Out of Memory Errors when Importing Tracker (info) +-------------------------------------------------------------------- + +Importing a tracker into PostgreSQL can run out of memory with the +error:: + + psycopg2.errors.OutOfMemory: out of shared memory + HINT: You might need to increase max_locks_per_transaction. + +before changing your PostgreSQL configuration, try changing the pragma +``savepoint_limit`` to a lower value. By default it is set to +``10000``. In some cases this may be too high. See the `administration +guide`_ for further details. + .. index:: Upgrading; 2.2.0 to 2.3.0 Migrating from 2.2.0 to 2.3.0
--- a/roundup/admin.py Thu Oct 19 14:07:56 2023 -0400 +++ b/roundup/admin.py Thu Oct 19 16:11:25 2023 -0400 @@ -35,8 +35,9 @@ from roundup import date, hyperdb, init, password, token_r from roundup import __version__ as roundup_version import roundup.instance -from roundup.configuration import (CoreConfig, NoConfigError, OptionUnsetError, - OptionValueError, ParsingOptionError, UserConfig) +from roundup.configuration import (CoreConfig, NoConfigError, Option, + OptionUnsetError, OptionValueError, + ParsingOptionError, UserConfig) from roundup.i18n import _, get_translation from roundup.exceptions import UsageError from roundup.anypy.my_input import my_input @@ -108,6 +109,7 @@ 'display_protected': False, 'indexer_backend': "as set in config.ini", '_reopen_tracker': False, + 'savepoint_limit': 10000, 'show_retired': "no", '_retired_val': False, 'verbose': False, @@ -116,25 +118,29 @@ } self.settings_help = { 'display_header': - _("Have 'display designator[,designator*]' show header inside " - " []'s before items. Includes retired/active status."), + _("Have 'display designator[,designator*]' show header inside\n" + " []'s before items. Includes retired/active status.\n"), 'display_protected': - _("Have 'display designator' and 'specification class' show " - "protected fields: creator, id etc."), + _("Have 'display designator' and 'specification class' show\n" + " protected fields: creator, id etc.\n"), 'indexer_backend': - _("Set indexer to use when running 'reindex' NYI"), + _("Set indexer to use when running 'reindex' NYI\n"), '_reopen_tracker': - _("Force reopening of tracker when running each command."), + _("Force reopening of tracker when running each command.\n"), - 'show_retired': _("Show retired items in table, list etc. One of 'no', 'only', 'both'"), - '_retired_val': _("internal mapping for show_retired."), - 'verbose': _("Enable verbose output: tracing, descriptions..."), + 'savepoint_limit': + _("set the number of rows imported before a database commit is\n" + " done. Used only for imports on PostgreSQL.\n"), + 'show_retired': _("Show retired items in table, list etc. " + "One of 'no', 'only', 'both'\n"), + '_retired_val': _("internal mapping for show_retired.\n"), + 'verbose': _("Enable verbose output: tracing, descriptions...\n"), - '_inttest': "Integer valued setting. For testing only.", - '_floattest': "Float valued setting. For testing only.", + '_inttest': "Integer valued setting. For testing only.\n", + '_floattest': "Float valued setting. For testing only.\n", } def get_class(self, classname): @@ -1049,6 +1055,14 @@ if hasattr(csv, 'field_size_limit'): csv.field_size_limit(self.db.config.CSV_FIELD_SIZE) + # default value is 10000, only go through this if default + # is different. + if self.settings['savepoint_limit'] != 10000: + self.db.config.add_option(Option(self.db.config, + "rdbms", "savepoint_limit")) + self.db.config.options["RDBMS_SAVEPOINT_LIMIT"].set( + self.settings['savepoint_limit']) + # directory to import from dir = args[0]
--- a/roundup/backends/back_postgresql.py Thu Oct 19 14:07:56 2023 -0400 +++ b/roundup/backends/back_postgresql.py Thu Oct 19 16:11:25 2023 -0400 @@ -152,12 +152,25 @@ holds the value for the type of db. It is used by indexer to identify the database type so it can import the correct indexer module when using native text search mode. + + import_savepoint_count: + count the number of savepoints that have been created during + import. Once the limit of savepoints is reached, a commit is + done and this is reset to 0. + """ arg = '%s' dbtype = "postgres" + import_savepoint_count = 0 + + # Value is set from roundup-admin using db.config["RDBMS_SAVEPOINT_LIMIT"] + # or to the default of 10_000 at runtime. Use 0 here to trigger + # initialization. + savepoint_limit = 0 + # used by some code to switch styles of query implements_intersect = 1 @@ -218,20 +231,49 @@ of operation in exception handler because postgres requires a rollback in case of error generating exception. Used with - restore_connecion_on_error to handle uniqueness + restore_connection_on_error to handle uniqueness conflict in import_table(). + + Savepoints take memory resources. Postgres keeps all + savepoints (rather than overwriting) until a + commit(). Commit every ~10,000 savepoints to prevent + running out of memory on import. + + NOTE: a commit outside of this method will not reset the + import_savepoint_count. This can result in an unneeded + commit on a new cursor (that has no savepoints) as there is + no way to find out if there is a savepoint or how many + savepoints are opened on a db connection/cursor. + + Because an import is a one shot deal and not part of a long + running daemon (e.g. the roundup-server), I am not too + worried about it. It will just slow the import down a tad. """ - # Savepoints take resources. Postgres keeps all - # savepoints (rather than overwriting) until a - # commit(). If an import fails because of a resource - # issue with savepoints, uncomment this line. I - # expect it will slow down the import but it should - # eliminate any issue with stored savepoints and - # resource use. - # - # self.sql('RELEASE SAVEPOINT %s' % savepoint) + self.sql('SAVEPOINT %s' % savepoint) + self.import_savepoint_count += 1 + + if not self.savepoint_limit: + if "RDBMS_SAVEPOINT_LIMIT" in self.config.keys(): + # note this config option is created on the fly + # by admin.py::do_import. It is never listed in + # config.ini. + self.savepoint_limit = self.config["RDBMS_SAVEPOINT_LIMIT"] + else: + self.savepoint_limit = 10000 + + if self.import_savepoint_count > self.savepoint_limit: + # track savepoints and commit every 10000 (or user value) + # so we don't run postgres out of memory. An import of a + # customer's tracker ran out of memory after importing + # ~23000 items with: psycopg2.errors.OutOfMemory: out of + # shared memory HINT: You might need to increase + # max_locks_per_transaction. + + self.commit() + self.import_savepoint_count = 0 + def restore_connection_on_error(self, savepoint="importing"): """Postgres leaves a connection/cursor in an unusable state after an error. Rollback the transaction to a
--- a/test/db_test_base.py Thu Oct 19 14:07:56 2023 -0400 +++ b/test/db_test_base.py Thu Oct 19 16:11:25 2023 -0400 @@ -3061,6 +3061,7 @@ self.db.commit() self.assertEqual(self.db.user.lookup("duplicate"), active_dupe_id) + self.assertEqual(self.db.user.is_retired(retired_dupe_id), True) finally: shutil.rmtree('_test_export') @@ -3151,12 +3152,25 @@ self.assertRaises(csv.Error, tool.do_import, ['_test_export']) self.nukeAndCreate() + + # make sure we have an empty db + with self.assertRaises(IndexError) as e: + # users 1 and 2 always are created on schema load. + # so don't use them. + self.db.user.getnode("5").values() + self.db.config.CSV_FIELD_SIZE = 3200 tool = roundup.admin.AdminTool() tool.tracker_home = home tool.db = self.db + # Force import code to commit when more than 5 + # savepoints have been created. + tool.settings['savepoint_limit'] = 5 tool.verbose = False tool.do_import(['_test_export']) + + # verify the data is loaded. + self.db.user.getnode("5").values() finally: roundup.admin.sys = sys shutil.rmtree('_test_export')
