Mercurial > p > roundup > code
view test/html_norm.py @ 7668:5b41018617f2
fix: out of memory error when importing under postgresql
If you try importing more than 20k items under postgresql you can run
out of memory:
psycopg2.errors.OutOfMemory: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Tuning memory may help, it's unknown at this point.
This checkin forces a commit to the postgres database after 10,000
rows have been added. This clears out the savepoints for each row and
starts a new transaction.
back_postgresql.py:
Implement commit mechanism in checkpoint_data(). Add two class level
attributes for tracking the number of savepoints and the limit when
the commit should happen.
roundup_admin.py:
implement pragma and dynamically create the config item
RDBMS_SAVEPOINT_LIMIT used by checkpoint_data.
Also fixed formatting of descriptions when using pragma list in
verbose mode.
admin_guide.txt, upgrading.txt:
Document change and use of pragma savepoint_limit in roundup-admin
for changing the default of 10,000.
test/db_test_base.py:
add some more asserts. In existing testAdminImportExport, set the
savepoint limit to 5 to test setting method and so that the commit
code will be run by existing tests. This provides coverage, but
does not actually test that the commit is done every 5 savepoints
8-(. The verification of every 5 savepoints was done manually
using a pdb breakpoint just before the commit.
acknowledgements.txt:
Added 2.4.0 section mentioning Norbert as he has done a ton of
testing with much larger datasets than I can test with.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Thu, 19 Oct 2023 16:11:25 -0400 |
| parents | 5cadcaa13bed |
| children |
line wrap: on
line source
"""Minimal html parser/normalizer for use in test_templating. When testing markdown -> html conversion libraries, there are gratuitous whitespace changes in generated output that break the tests. Use this to try to normalize the generated HTML into something that tries to preserve the semantic meaning allowing tests to stop breaking. This is not a complete parsing engine. It supports the Roundup issue tracker unit tests so that no third party libraries are needed to run the tests. If you find it useful enjoy. Ideally this would be done by hijacking in some way lxml.html.usedoctest to get a liberal parser that will ignore whitespace. But that means the user has to install lxml to run the tests. Similarly BeautifulSoup could be used to pretty print the html but again, BeautifulSoup would need to be installed to run the tests. """ try: from html.parser import HTMLParser except ImportError: from HTMLParser import HTMLParser # python2 try: from htmlentitydefs import name2codepoint except ImportError: pass # assume running under python3, name2codepoint predefined class NormalizingHtmlParser(HTMLParser): """Handle start/end tags and normalize whitespace in data. Strip doctype, comments when passed in. Implements normalize method that takes input html and returns a normalized string leaving the instance ready for another call to normalize for another string. Note that using this rewrites all attributes parsed by HTMLParser into attr="value" form even though HTMLParser accepts other attribute specification forms. """ debug = False # set to true to enable more verbose output current_normalized_string = "" # accumulate result string preserve_data = False # if inside pre preserve whitespace def handle_starttag(self, tag, attrs): """put tag on new line with attributes. Note valid attributes according to HTMLParser: attrs='single_quote' attrs=noquote attrs="double_quote" """ if self.debug: print("Start tag:", tag) self.current_normalized_string += "\n<%s" % tag for attr in attrs: if self.debug: print(" attr:", attr) self.current_normalized_string += ' %s="%s"' % attr self.current_normalized_string += ">\n" if tag == 'pre': self.preserve_data = True def handle_endtag(self, tag): if self.debug: print("End tag :", tag) self.current_normalized_string += "\n</%s>" % tag if tag == 'pre': self.preserve_data = False def handle_data(self, data): if self.debug: print("Data :", data) if not self.preserve_data: # normalize whitespace remove leading/trailing data = " ".join(data.strip().split()) if data: self.current_normalized_string += "%s" % data def handle_comment(self, data): print("Comment :", data) def handle_decl(self, data): print("Decl :", data) def reset(self): """wrapper around reset with clearing of csef.current_normalized_string and reset of self.preserve_data """ HTMLParser.reset(self) self.current_normalized_string = "" self.preserve_data = False def normalize(self, html): self.feed(html) result = self.current_normalized_string self.reset() return result if __name__ == "__main__": parser = NormalizingHtmlParser() parser.feed('<div class="markup"><p> paragraph text with whitespace\n and more space <pre><span class="f" data-attr="f">text more text</span></pre></div>') print("\n\ntest1", parser.current_normalized_string) parser.reset() parser.feed('''<div class="markup"> <p> paragraph text with whitespace\n and more space <pre><span class="f" data-attr="f">text \n more text</span></pre> </div>''') print("\n\ntest2", parser.current_normalized_string) parser.reset() print("\n\nnormalize", parser.normalize('''<div class="markup"> <p> paragraph text with whitespace\n and more space <pre><span class="f" data-attr="f">text \n more text <</span></pre> </div>'''))
