Mercurial > p > roundup > code
annotate test/html_norm.py @ 8218:32aaf5dc562b
fix(REST): issue2551383; improve errors for bad json, fix PUT docs
While adding fuzz testing for email addresses via REST
/rest/data/user/1/address, I had an error when setting the address to
the same value it currently had. Traced this to a bug in
userauditor.py. Fixed the bug. Documented in upgrading.txt.
While trying to track down issue, I realized invalid json was being
accepted without error. So I fixed the code that parses the json and
have it return an error. Also modified some tests that broke (used
invalid json, or passed body (e.g. DELETE) but shouldn't have. Add
tests for bad json to verify new code.
Fixed test that wasn't initializing the body_file in each loop, so the
test wasn't actually supplying a body.
Also realised PUT documentation was not correct. Output format isn't
quite like GET.
Fuss tests for email address also added.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Tue, 17 Dec 2024 19:42:46 -0500 |
| parents | 5cadcaa13bed |
| children |
| rev | line source |
|---|---|
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
1 """Minimal html parser/normalizer for use in test_templating. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
2 |
| 7077 | 3 When testing markdown -> html conversion libraries, there are |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
4 gratuitous whitespace changes in generated output that break the |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
5 tests. Use this to try to normalize the generated HTML into something |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
6 that tries to preserve the semantic meaning allowing tests to stop |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
7 breaking. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
8 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
9 This is not a complete parsing engine. It supports the Roundup issue |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
10 tracker unit tests so that no third party libraries are needed to run |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
11 the tests. If you find it useful enjoy. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
12 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
13 Ideally this would be done by hijacking in some way |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
14 lxml.html.usedoctest to get a liberal parser that will ignore |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
15 whitespace. But that means the user has to install lxml to run the |
| 7077 | 16 tests. Similarly BeautifulSoup could be used to pretty print the html |
| 17 but again, BeautifulSoup would need to be installed to run the | |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
18 tests. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
19 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
20 """ |
| 6996 | 21 try: |
| 22 from html.parser import HTMLParser | |
| 23 except ImportError: | |
| 24 from HTMLParser import HTMLParser # python2 | |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
25 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
26 try: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
27 from htmlentitydefs import name2codepoint |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
28 except ImportError: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
29 pass # assume running under python3, name2codepoint predefined |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
30 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
31 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
32 class NormalizingHtmlParser(HTMLParser): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
33 """Handle start/end tags and normalize whitespace in data. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
34 Strip doctype, comments when passed in. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
35 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
36 Implements normalize method that takes input html and returns a |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
37 normalized string leaving the instance ready for another call to |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
38 normalize for another string. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
39 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
40 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
41 Note that using this rewrites all attributes parsed by HTMLParser |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
42 into attr="value" form even though HTMLParser accepts other |
|
7560
5cadcaa13bed
prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents:
7077
diff
changeset
|
43 attribute specification forms. |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
44 """ |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
45 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
46 debug = False # set to true to enable more verbose output |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
47 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
48 current_normalized_string = "" # accumulate result string |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
49 preserve_data = False # if inside pre preserve whitespace |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
50 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
51 def handle_starttag(self, tag, attrs): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
52 """put tag on new line with attributes. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
53 Note valid attributes according to HTMLParser: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
54 attrs='single_quote' |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
55 attrs=noquote |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
56 attrs="double_quote" |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
57 """ |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
58 if self.debug: print("Start tag:", tag) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
59 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
60 self.current_normalized_string += "\n<%s" % tag |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
61 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
62 for attr in attrs: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
63 if self.debug: print(" attr:", attr) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
64 self.current_normalized_string += ' %s="%s"' % attr |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
65 |
|
7560
5cadcaa13bed
prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents:
7077
diff
changeset
|
66 self.current_normalized_string += ">\n" |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
67 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
68 if tag == 'pre': |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
69 self.preserve_data = True |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
70 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
71 def handle_endtag(self, tag): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
72 if self.debug: print("End tag :", tag) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
73 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
74 self.current_normalized_string += "\n</%s>" % tag |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
75 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
76 if tag == 'pre': |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
77 self.preserve_data = False |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
78 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
79 def handle_data(self, data): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
80 if self.debug: print("Data :", data) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
81 if not self.preserve_data: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
82 # normalize whitespace remove leading/trailing |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
83 data = " ".join(data.strip().split()) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
84 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
85 if data: |
|
7560
5cadcaa13bed
prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents:
7077
diff
changeset
|
86 self.current_normalized_string += "%s" % data |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
87 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
88 def handle_comment(self, data): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
89 print("Comment :", data) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
90 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
91 def handle_decl(self, data): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
92 print("Decl :", data) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
93 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
94 def reset(self): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
95 """wrapper around reset with clearing of csef.current_normalized_string |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
96 and reset of self.preserve_data |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
97 """ |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
98 HTMLParser.reset(self) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
99 self.current_normalized_string = "" |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
100 self.preserve_data = False |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
101 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
102 def normalize(self, html): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
103 self.feed(html) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
104 result = self.current_normalized_string |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
105 self.reset() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
106 return result |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
107 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
108 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
109 if __name__ == "__main__": |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
110 parser = NormalizingHtmlParser() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
111 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
112 parser.feed('<div class="markup"><p> paragraph text with whitespace\n and more space <pre><span class="f" data-attr="f">text more text</span></pre></div>') |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
113 print("\n\ntest1", parser.current_normalized_string) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
114 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
115 parser.reset() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
116 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
117 parser.feed('''<div class="markup"> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
118 <p> paragraph text with whitespace\n and more space |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
119 <pre><span class="f" data-attr="f">text \n more text</span></pre> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
120 </div>''') |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
121 print("\n\ntest2", parser.current_normalized_string) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
122 parser.reset() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
123 print("\n\nnormalize", parser.normalize('''<div class="markup"> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
124 <p> paragraph text with whitespace\n and more space |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
125 <pre><span class="f" data-attr="f">text \n more text <</span></pre> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
126 </div>''')) |
