Mercurial > p > roundup > code
annotate test/html_norm.py @ 7853:03c1b7ae3a68
issue2551328/issue2551264 unneeded next link and total_count incorrect
Fix: issue2551328 - REST results show next link if number of
results is a multiple of page size. (Found by members of
team 3 in the UMass-Boston CS682 Spring 2024 class.)
issue2551264 - REST X-Total-Count header and @total_size
count incorrect when paginated
These issues arose because we retrieved the exact number of rows
from the database as requested by the user using the @page_size
parameter. With this changeset, we retrieve up to 10 million + 1
rows from the database. If the total number of rows exceeds 10
million, we set the total_count indicators to -1 as an invalid
size. (The max number of requested rows (default 10 million +1)
can be modified by the admin through interfaces.py.)
By retrieving more data than necessary, we can calculate the
total count by adding @page_index*@page_size to the number of
rows returned by the query.
Furthermore, since we return more than @page_size rows, we can
determine the existence of a row at @page_size+1 and use that
information to determine if a next link should be
provided. Previously, a next link was returned if @page_size rows
were retrieved.
This change does not guarantee that the user will get @page_size
rows returned. Access policy filtering occurs after the rows are
returned, and discards rows inaccessible by the user.
Using the current @page_index/@page_size it would be difficult to
have the roundup code refetch data and make sure that a full
@page_size set of rows is returned. E.G. @page_size=100 and 5 of
them are dropped due to access restrictions. We then fetch 10
items and add items 1-4 and 6 (5 is inaccessible). There is no
way to calculate the new database offset at:
@page_index*@page_size + 6 from the URL. We would need to add an
@page_offset=6 or something.
This could work since the client isn't adding 1 to @page_index to
get the next page. Thanks to HATEOAS, the client just uses the
'next' url. But I am not going to cross that bridge without a
concrete use case.
This can also be handled client side by merging a short response
with the next response and re-paginating client side.
Also added extra index markers to the docs to highlight use of
interfaces.py.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Mon, 01 Apr 2024 09:57:16 -0400 |
| parents | 5cadcaa13bed |
| children |
| rev | line source |
|---|---|
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
1 """Minimal html parser/normalizer for use in test_templating. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
2 |
| 7077 | 3 When testing markdown -> html conversion libraries, there are |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
4 gratuitous whitespace changes in generated output that break the |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
5 tests. Use this to try to normalize the generated HTML into something |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
6 that tries to preserve the semantic meaning allowing tests to stop |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
7 breaking. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
8 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
9 This is not a complete parsing engine. It supports the Roundup issue |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
10 tracker unit tests so that no third party libraries are needed to run |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
11 the tests. If you find it useful enjoy. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
12 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
13 Ideally this would be done by hijacking in some way |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
14 lxml.html.usedoctest to get a liberal parser that will ignore |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
15 whitespace. But that means the user has to install lxml to run the |
| 7077 | 16 tests. Similarly BeautifulSoup could be used to pretty print the html |
| 17 but again, BeautifulSoup would need to be installed to run the | |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
18 tests. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
19 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
20 """ |
| 6996 | 21 try: |
| 22 from html.parser import HTMLParser | |
| 23 except ImportError: | |
| 24 from HTMLParser import HTMLParser # python2 | |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
25 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
26 try: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
27 from htmlentitydefs import name2codepoint |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
28 except ImportError: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
29 pass # assume running under python3, name2codepoint predefined |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
30 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
31 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
32 class NormalizingHtmlParser(HTMLParser): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
33 """Handle start/end tags and normalize whitespace in data. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
34 Strip doctype, comments when passed in. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
35 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
36 Implements normalize method that takes input html and returns a |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
37 normalized string leaving the instance ready for another call to |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
38 normalize for another string. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
39 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
40 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
41 Note that using this rewrites all attributes parsed by HTMLParser |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
42 into attr="value" form even though HTMLParser accepts other |
|
7560
5cadcaa13bed
prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents:
7077
diff
changeset
|
43 attribute specification forms. |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
44 """ |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
45 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
46 debug = False # set to true to enable more verbose output |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
47 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
48 current_normalized_string = "" # accumulate result string |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
49 preserve_data = False # if inside pre preserve whitespace |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
50 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
51 def handle_starttag(self, tag, attrs): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
52 """put tag on new line with attributes. |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
53 Note valid attributes according to HTMLParser: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
54 attrs='single_quote' |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
55 attrs=noquote |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
56 attrs="double_quote" |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
57 """ |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
58 if self.debug: print("Start tag:", tag) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
59 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
60 self.current_normalized_string += "\n<%s" % tag |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
61 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
62 for attr in attrs: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
63 if self.debug: print(" attr:", attr) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
64 self.current_normalized_string += ' %s="%s"' % attr |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
65 |
|
7560
5cadcaa13bed
prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents:
7077
diff
changeset
|
66 self.current_normalized_string += ">\n" |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
67 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
68 if tag == 'pre': |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
69 self.preserve_data = True |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
70 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
71 def handle_endtag(self, tag): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
72 if self.debug: print("End tag :", tag) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
73 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
74 self.current_normalized_string += "\n</%s>" % tag |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
75 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
76 if tag == 'pre': |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
77 self.preserve_data = False |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
78 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
79 def handle_data(self, data): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
80 if self.debug: print("Data :", data) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
81 if not self.preserve_data: |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
82 # normalize whitespace remove leading/trailing |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
83 data = " ".join(data.strip().split()) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
84 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
85 if data: |
|
7560
5cadcaa13bed
prevent <newline tag mangling
John Rouillard <rouilj@ieee.org>
parents:
7077
diff
changeset
|
86 self.current_normalized_string += "%s" % data |
|
6995
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
87 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
88 def handle_comment(self, data): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
89 print("Comment :", data) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
90 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
91 def handle_decl(self, data): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
92 print("Decl :", data) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
93 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
94 def reset(self): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
95 """wrapper around reset with clearing of csef.current_normalized_string |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
96 and reset of self.preserve_data |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
97 """ |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
98 HTMLParser.reset(self) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
99 self.current_normalized_string = "" |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
100 self.preserve_data = False |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
101 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
102 def normalize(self, html): |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
103 self.feed(html) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
104 result = self.current_normalized_string |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
105 self.reset() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
106 return result |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
107 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
108 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
109 if __name__ == "__main__": |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
110 parser = NormalizingHtmlParser() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
111 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
112 parser.feed('<div class="markup"><p> paragraph text with whitespace\n and more space <pre><span class="f" data-attr="f">text more text</span></pre></div>') |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
113 print("\n\ntest1", parser.current_normalized_string) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
114 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
115 parser.reset() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
116 |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
117 parser.feed('''<div class="markup"> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
118 <p> paragraph text with whitespace\n and more space |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
119 <pre><span class="f" data-attr="f">text \n more text</span></pre> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
120 </div>''') |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
121 print("\n\ntest2", parser.current_normalized_string) |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
122 parser.reset() |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
123 print("\n\nnormalize", parser.normalize('''<div class="markup"> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
124 <p> paragraph text with whitespace\n and more space |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
125 <pre><span class="f" data-attr="f">text \n more text <</span></pre> |
|
dc83ebff4c90
change test to use html normalizer when comparing html output.
John Rouillard <rouilj@ieee.org>
parents:
diff
changeset
|
126 </div>''')) |
