Mercurial > p > roundup > code
annotate test/html_norm.py @ 8408:e882a5d52ae5
refactor: move RateLimitExceeded to roundup.cgi.exceptions
RateLimitExceeded is an HTTP exception that raises code 429. Move it
to roundup.cgi.exceptions where all the other exceptions that result
in http status codes are located. Also make it inherit from
HTTPException since it is one.
Also add docstrings for all HTTP exceptions and order HTTPExceptions
by status code.
BREAKING CHANGE: if somebody is importing RateLimitExceeded they will
need to change their import. I consider it unlikely anybody is using
RateLimitExceeded. Detectors and extensions are unlikely to raise
RateLimitExceeded. So I am leaving it out of the upgrading doc. Just
doc in change log.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Sun, 10 Aug 2025 21:27:06 -0400 |
| parents | 5cadcaa13bed |
| children |
| rev | line source |
|---|---|
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
1 """Minimal html parser/normalizer for use in test_templating. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
2 |
| 7077 | 3 When testing markdown -> html conversion libraries, there are |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
4 gratuitous whitespace changes in generated output that break the |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
5 tests. Use this to try to normalize the generated HTML into something |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
6 that tries to preserve the semantic meaning allowing tests to stop |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
7 breaking. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
8 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
9 This is not a complete parsing engine. It supports the Roundup issue |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
10 tracker unit tests so that no third party libraries are needed to run |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
11 the tests. If you find it useful enjoy. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
12 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
13 Ideally this would be done by hijacking in some way |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
14 lxml.html.usedoctest to get a liberal parser that will ignore |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
15 whitespace. But that means the user has to install lxml to run the |
| 7077 | 16 tests. Similarly BeautifulSoup could be used to pretty print the html |
| 17 but again, BeautifulSoup would need to be installed to run the | |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
18 tests. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
19 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
20 """ |
| 6996 | 21 try: |
| 22 from html.parser import HTMLParser | |
| 23 except ImportError: | |
| 24 from HTMLParser import HTMLParser # python2 | |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
25 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
26 try: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
27 from htmlentitydefs import name2codepoint |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
28 except ImportError: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
29 pass # assume running under python3, name2codepoint predefined |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
30 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
31 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
32 class NormalizingHtmlParser(HTMLParser): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
33 """Handle start/end tags and normalize whitespace in data. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
34 Strip doctype, comments when passed in. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
35 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
36 Implements normalize method that takes input html and returns a |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
37 normalized string leaving the instance ready for another call to |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
38 normalize for another string. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
39 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
40 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
41 Note that using this rewrites all attributes parsed by HTMLParser |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
42 into attr="value" form even though HTMLParser accepts other |
|
7560
5cadcaa13bed
prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents:
7077
diff
changeset
|
43 attribute specification forms. |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
44 """ |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
45 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
46 debug = False # set to true to enable more verbose output |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
47 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
48 current_normalized_string = "" # accumulate result string |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
49 preserve_data = False # if inside pre preserve whitespace |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
50 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
51 def handle_starttag(self, tag, attrs): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
52 """put tag on new line with attributes. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
53 Note valid attributes according to HTMLParser: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
54 attrs='single_quote' |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
55 attrs=noquote |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
56 attrs="double_quote" |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
57 """ |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
58 if self.debug: print("Start tag:", tag) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
59 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
60 self.current_normalized_string += "\n<%s" % tag |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
61 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
62 for attr in attrs: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
63 if self.debug: print(" attr:", attr) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
64 self.current_normalized_string += ' %s="%s"' % attr |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
65 |
|
7560
5cadcaa13bed
prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents:
7077
diff
changeset
|
66 self.current_normalized_string += ">\n" |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
67 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
68 if tag == 'pre': |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
69 self.preserve_data = True |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
70 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
71 def handle_endtag(self, tag): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
72 if self.debug: print("End tag :", tag) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
73 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
74 self.current_normalized_string += "\n</%s>" % tag |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
75 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
76 if tag == 'pre': |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
77 self.preserve_data = False |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
78 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
79 def handle_data(self, data): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
80 if self.debug: print("Data :", data) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
81 if not self.preserve_data: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
82 # normalize whitespace remove leading/trailing |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
83 data = " ".join(data.strip().split()) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
84 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
85 if data: |
|
7560
5cadcaa13bed
prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents:
7077
diff
changeset
|
86 self.current_normalized_string += "%s" % data |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
87 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
88 def handle_comment(self, data): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
89 print("Comment :", data) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
90 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
91 def handle_decl(self, data): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
92 print("Decl :", data) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
93 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
94 def reset(self): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
95 """wrapper around reset with clearing of csef.current_normalized_string |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
96 and reset of self.preserve_data |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
97 """ |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
98 HTMLParser.reset(self) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
99 self.current_normalized_string = "" |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
100 self.preserve_data = False |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
101 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
102 def normalize(self, html): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
103 self.feed(html) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
104 result = self.current_normalized_string |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
105 self.reset() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
106 return result |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
107 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
108 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
109 if __name__ == "__main__": |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
110 parser = NormalizingHtmlParser() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
111 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
112 parser.feed('<div class="markup"><p> paragraph text with whitespace\n and more space <pre><span class="f" data-attr="f">text more text</span></pre></div>') |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
113 print("\n\ntest1", parser.current_normalized_string) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
114 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
115 parser.reset() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
116 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
117 parser.feed('''<div class="markup"> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
118 <p> paragraph text with whitespace\n and more space |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
119 <pre><span class="f" data-attr="f">text \n more text</span></pre> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
120 </div>''') |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
121 print("\n\ntest2", parser.current_normalized_string) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
122 parser.reset() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
123 print("\n\nnormalize", parser.normalize('''<div class="markup"> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
124 <p> paragraph text with whitespace\n and more space |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
125 <pre><span class="f" data-attr="f">text \n more text <</span></pre> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
126 </div>''')) |
