Mercurial > p > roundup > code
diff roundup/rest.py @ 7853:03c1b7ae3a68
issue2551328/issue2551264 unneeded next link and total_count incorrect
Fix: issue2551328 - REST results show next link if number of
results is a multiple of page size. (Found by members of
team 3 in the UMass-Boston CS682 Spring 2024 class.)
issue2551264 - REST X-Total-Count header and @total_size
count incorrect when paginated
These issues arose because we retrieved the exact number of rows
from the database as requested by the user using the @page_size
parameter. With this changeset, we retrieve up to 10 million + 1
rows from the database. If the total number of rows exceeds 10
million, we set the total_count indicators to -1 as an invalid
size. (The max number of requested rows (default 10 million +1)
can be modified by the admin through interfaces.py.)
By retrieving more data than necessary, we can calculate the
total count by adding @page_index*@page_size to the number of
rows returned by the query.
Furthermore, since we return more than @page_size rows, we can
determine the existence of a row at @page_size+1 and use that
information to determine if a next link should be
provided. Previously, a next link was returned if @page_size rows
were retrieved.
This change does not guarantee that the user will get @page_size
rows returned. Access policy filtering occurs after the rows are
returned, and discards rows inaccessible by the user.
Using the current @page_index/@page_size it would be difficult to
have the roundup code refetch data and make sure that a full
@page_size set of rows is returned. E.G. @page_size=100 and 5 of
them are dropped due to access restrictions. We then fetch 10
items and add items 1-4 and 6 (5 is inaccessible). There is no
way to calculate the new database offset at:
@page_index*@page_size + 6 from the URL. We would need to add an
@page_offset=6 or something.
This could work since the client isn't adding 1 to @page_index to
get the next page. Thanks to HATEOAS, the client just uses the
'next' url. But I am not going to cross that bridge without a
concrete use case.
This can also be handled client side by merging a short response
with the next response and re-paginating client side.
Also added extra index markers to the docs to highlight use of
interfaces.py.
| author | John Rouillard <rouilj@ieee.org> |
|---|---|
| date | Mon, 01 Apr 2024 09:57:16 -0400 |
| parents | 216662fbaaee |
| children | 171ff2e487df |
line wrap: on
line diff
--- a/roundup/rest.py Sun Mar 31 01:49:07 2024 -0400 +++ b/roundup/rest.py Mon Apr 01 09:57:16 2024 -0400 @@ -440,6 +440,10 @@ api_version = None + # allow 10M row response - can change using interfaces.py + # limit is 1 less than this size. + max_response_row_size = 10000001 + def __init__(self, client, db): self.client = client self.db = db @@ -795,7 +799,7 @@ exact_props = {} page = { 'size': None, - 'index': 1 # setting just size starts at page 1 + 'index': 1, # setting just size starts at page 1 } verbose = 1 display_props = set() @@ -910,12 +914,26 @@ l.append(sort) if exact_props: kw['exact_match_spec'] = exact_props - if page['size'] is not None and page['size'] > 0: - kw['limit'] = page['size'] + if page['size'] is None: + kw['limit'] = self.max_response_row_size + elif page['size'] > 0: + if page['size'] >= self.max_response_row_size: + raise UsageError(_( + "Page size %(page_size)s must be less than admin " + "limit on query result size: %(max_size)s.") % { + "page_size": page['size'], + "max_size": self.max_response_row_size, + }) + kw['limit'] = self.max_response_row_size if page['index'] is not None and page['index'] > 1: kw['offset'] = (page['index'] - 1) * page['size'] obj_list = class_obj.filter(None, *l, **kw) + # Have we hit the max number of returned rows? + # If so there may be more data that the client + # has to explicitly page through using offset/@page_index. + overflow = len(obj_list) == self.max_response_row_size + # Note: We don't sort explicitly in python. The filter implementation # of the DB already sorts by ID if no sort option was given. @@ -930,7 +948,7 @@ for item_id in obj_list: r = {} if self.db.security.hasPermission( - 'View', uid, class_name, itemid=item_id, property='id' + 'View', uid, class_name, itemid=item_id, property='id', ): r = {'id': item_id, 'link': class_path + item_id} if display_props: @@ -942,14 +960,30 @@ result_len = len(result['collection']) + if not overflow: # noqa: SIM108 - no nested ternary + # add back the number of items in the offset. + total_len = kw['offset'] + result_len if 'offset' in kw \ + else result_len + else: + # we have hit the max number of rows configured to be + # returned. We hae no idea how many rows can match. We + # could use 0 as the sentinel, but a filter could match 0 + # rows. So return -1 indicating we exceeded the result + # max size on this query. + total_len = -1 + + # truncate result['collection'] to page size + if page['size'] is not None and page['size'] > 0: + result['collection'] = result['collection'][:page['size']] + # pagination - page_index from 1...N if page['size'] is not None and page['size'] > 0: result['@links'] = {} for rel in ('next', 'prev', 'self'): if rel == 'next': # if current index includes all data, continue - if page['size'] > result_len: continue # noqa: E701 - index = page['index']+1 + if page['size'] >= result_len: continue # noqa: E701 + index = page['index'] + 1 if rel == 'prev': if page['index'] <= 1: continue # noqa: E701 index = page['index'] - 1 @@ -964,8 +998,8 @@ for field in input.value if field.name != "@page_index"])}) - result['@total_size'] = result_len - self.client.setHeader("X-Count-Total", str(result_len)) + result['@total_size'] = total_len + self.client.setHeader("X-Count-Total", str(total_len)) self.client.setHeader("Allow", "OPTIONS, GET, POST") return 200, result
