annotate test/html_norm.py @ 8218:32aaf5dc562b

fix(REST): issue2551383; improve errors for bad json, fix PUT docs While adding fuzz testing for email addresses via REST /rest/data/user/1/address, I had an error when setting the address to the same value it currently had. Traced this to a bug in userauditor.py. Fixed the bug. Documented in upgrading.txt. While trying to track down issue, I realized invalid json was being accepted without error. So I fixed the code that parses the json and have it return an error. Also modified some tests that broke (used invalid json, or passed body (e.g. DELETE) but shouldn't have. Add tests for bad json to verify new code. Fixed test that wasn't initializing the body_file in each loop, so the test wasn't actually supplying a body. Also realised PUT documentation was not correct. Output format isn't quite like GET. Fuss tests for email address also added.
author John Rouillard <rouilj@ieee.org>
date Tue, 17 Dec 2024 19:42:46 -0500
parents 5cadcaa13bed
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6995
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
1 """Minimal html parser/normalizer for use in test_templating.
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
2
7077
5bc36b65d06b fix typos in docstring.
John Rouillard <rouilj@ieee.org>
parents: 6996
diff changeset
3 When testing markdown -> html conversion libraries, there are
6995
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
4 gratuitous whitespace changes in generated output that break the
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
5 tests. Use this to try to normalize the generated HTML into something
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
6 that tries to preserve the semantic meaning allowing tests to stop
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
7 breaking.
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
8
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
9 This is not a complete parsing engine. It supports the Roundup issue
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
10 tracker unit tests so that no third party libraries are needed to run
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
11 the tests. If you find it useful enjoy.
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
12
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
13 Ideally this would be done by hijacking in some way
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
14 lxml.html.usedoctest to get a liberal parser that will ignore
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
15 whitespace. But that means the user has to install lxml to run the
7077
5bc36b65d06b fix typos in docstring.
John Rouillard <rouilj@ieee.org>
parents: 6996
diff changeset
16 tests. Similarly BeautifulSoup could be used to pretty print the html
5bc36b65d06b fix typos in docstring.
John Rouillard <rouilj@ieee.org>
parents: 6996
diff changeset
17 but again, BeautifulSoup would need to be installed to run the
6995
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
18 tests.
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
19
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
20 """
6996
3546f23ea493 Add support for python2.
John Rouillard <rouilj@ieee.org>
parents: 6995
diff changeset
21 try:
3546f23ea493 Add support for python2.
John Rouillard <rouilj@ieee.org>
parents: 6995
diff changeset
22 from html.parser import HTMLParser
3546f23ea493 Add support for python2.
John Rouillard <rouilj@ieee.org>
parents: 6995
diff changeset
23 except ImportError:
3546f23ea493 Add support for python2.
John Rouillard <rouilj@ieee.org>
parents: 6995
diff changeset
24 from HTMLParser import HTMLParser # python2
6995
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
25
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
26 try:
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
27 from htmlentitydefs import name2codepoint
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
28 except ImportError:
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
29 pass # assume running under python3, name2codepoint predefined
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
30
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
31
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
32 class NormalizingHtmlParser(HTMLParser):
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
33 """Handle start/end tags and normalize whitespace in data.
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
34 Strip doctype, comments when passed in.
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
35
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
36 Implements normalize method that takes input html and returns a
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
37 normalized string leaving the instance ready for another call to
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
38 normalize for another string.
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
39
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
40
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
41 Note that using this rewrites all attributes parsed by HTMLParser
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
42 into attr="value" form even though HTMLParser accepts other
7560
5cadcaa13bed prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents: 7077
diff changeset
43 attribute specification forms.
6995
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
44 """
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
45
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
46 debug = False # set to true to enable more verbose output
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
47
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
48 current_normalized_string = "" # accumulate result string
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
49 preserve_data = False # if inside pre preserve whitespace
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
50
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
51 def handle_starttag(self, tag, attrs):
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
52 """put tag on new line with attributes.
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
53 Note valid attributes according to HTMLParser:
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
54 attrs='single_quote'
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
55 attrs=noquote
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
56 attrs="double_quote"
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
57 """
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
58 if self.debug: print("Start tag:", tag)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
59
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
60 self.current_normalized_string += "\n<%s" % tag
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
61
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
62 for attr in attrs:
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
63 if self.debug: print(" attr:", attr)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
64 self.current_normalized_string += ' %s="%s"' % attr
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
65
7560
5cadcaa13bed prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents: 7077
diff changeset
66 self.current_normalized_string += ">\n"
6995
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
67
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
68 if tag == 'pre':
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
69 self.preserve_data = True
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
70
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
71 def handle_endtag(self, tag):
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
72 if self.debug: print("End tag :", tag)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
73
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
74 self.current_normalized_string += "\n</%s>" % tag
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
75
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
76 if tag == 'pre':
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
77 self.preserve_data = False
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
78
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
79 def handle_data(self, data):
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
80 if self.debug: print("Data :", data)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
81 if not self.preserve_data:
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
82 # normalize whitespace remove leading/trailing
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
83 data = " ".join(data.strip().split())
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
84
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
85 if data:
7560
5cadcaa13bed prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents: 7077
diff changeset
86 self.current_normalized_string += "%s" % data
6995
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
87
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
88 def handle_comment(self, data):
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
89 print("Comment :", data)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
90
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
91 def handle_decl(self, data):
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
92 print("Decl :", data)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
93
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
94 def reset(self):
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
95 """wrapper around reset with clearing of csef.current_normalized_string
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
96 and reset of self.preserve_data
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
97 """
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
98 HTMLParser.reset(self)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
99 self.current_normalized_string = ""
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
100 self.preserve_data = False
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
101
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
102 def normalize(self, html):
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
103 self.feed(html)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
104 result = self.current_normalized_string
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
105 self.reset()
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
106 return result
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
107
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
108
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
109 if __name__ == "__main__":
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
110 parser = NormalizingHtmlParser()
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
111
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
112 parser.feed('<div class="markup"><p> paragraph text with whitespace\n and more space <pre><span class="f" data-attr="f">text more text</span></pre></div>')
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
113 print("\n\ntest1", parser.current_normalized_string)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
114
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
115 parser.reset()
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
116
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
117 parser.feed('''<div class="markup">
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
118 <p> paragraph text with whitespace\n and more space
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
119 <pre><span class="f" data-attr="f">text \n more text</span></pre>
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
120 </div>''')
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
121 print("\n\ntest2", parser.current_normalized_string)
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
122 parser.reset()
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
123 print("\n\nnormalize", parser.normalize('''<div class="markup">
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
124 <p> paragraph text with whitespace\n and more space
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
125 <pre><span class="f" data-attr="f">text \n more text &lt;</span></pre>
dc83ebff4c90 change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff changeset
126 </div>'''))

Roundup Issue Tracker: http://roundup-tracker.org/