diff roundup/rest.py @ 7853:03c1b7ae3a68

issue2551328/issue2551264 unneeded next link and total_count incorrect Fix: issue2551328 - REST results show next link if number of results is a multiple of page size. (Found by members of team 3 in the UMass-Boston CS682 Spring 2024 class.) issue2551264 - REST X-Total-Count header and @total_size count incorrect when paginated These issues arose because we retrieved the exact number of rows from the database as requested by the user using the @page_size parameter. With this changeset, we retrieve up to 10 million + 1 rows from the database. If the total number of rows exceeds 10 million, we set the total_count indicators to -1 as an invalid size. (The max number of requested rows (default 10 million +1) can be modified by the admin through interfaces.py.) By retrieving more data than necessary, we can calculate the total count by adding @page_index*@page_size to the number of rows returned by the query. Furthermore, since we return more than @page_size rows, we can determine the existence of a row at @page_size+1 and use that information to determine if a next link should be provided. Previously, a next link was returned if @page_size rows were retrieved. This change does not guarantee that the user will get @page_size rows returned. Access policy filtering occurs after the rows are returned, and discards rows inaccessible by the user. Using the current @page_index/@page_size it would be difficult to have the roundup code refetch data and make sure that a full @page_size set of rows is returned. E.G. @page_size=100 and 5 of them are dropped due to access restrictions. We then fetch 10 items and add items 1-4 and 6 (5 is inaccessible). There is no way to calculate the new database offset at: @page_index*@page_size + 6 from the URL. We would need to add an @page_offset=6 or something. This could work since the client isn't adding 1 to @page_index to get the next page. Thanks to HATEOAS, the client just uses the 'next' url. But I am not going to cross that bridge without a concrete use case. This can also be handled client side by merging a short response with the next response and re-paginating client side. Also added extra index markers to the docs to highlight use of interfaces.py.
author John Rouillard <rouilj@ieee.org>
date Mon, 01 Apr 2024 09:57:16 -0400
parents 216662fbaaee
children 171ff2e487df
line wrap: on
line diff
--- a/roundup/rest.py	Sun Mar 31 01:49:07 2024 -0400
+++ b/roundup/rest.py	Mon Apr 01 09:57:16 2024 -0400
@@ -440,6 +440,10 @@
 
     api_version = None
 
+    # allow 10M row response - can change using interfaces.py
+    # limit is 1 less than this size.
+    max_response_row_size = 10000001
+
     def __init__(self, client, db):
         self.client = client
         self.db = db
@@ -795,7 +799,7 @@
         exact_props = {}
         page = {
             'size': None,
-            'index': 1   # setting just size starts at page 1
+            'index': 1,   # setting just size starts at page 1
         }
         verbose = 1
         display_props = set()
@@ -910,12 +914,26 @@
             l.append(sort)
         if exact_props:
             kw['exact_match_spec'] = exact_props
-        if page['size'] is not None and page['size'] > 0:
-            kw['limit'] = page['size']
+        if page['size'] is None:
+            kw['limit'] = self.max_response_row_size
+        elif page['size'] > 0:
+            if page['size'] >= self.max_response_row_size:
+                raise UsageError(_(
+                    "Page size %(page_size)s must be less than admin "
+                    "limit on query result size: %(max_size)s.") % {
+                        "page_size": page['size'],
+                        "max_size": self.max_response_row_size,
+                    })
+            kw['limit'] = self.max_response_row_size
             if page['index'] is not None and page['index'] > 1:
                 kw['offset'] = (page['index'] - 1) * page['size']
         obj_list = class_obj.filter(None, *l, **kw)
 
+        # Have we hit the max number of returned rows?
+        # If so there may be more data that the client
+        # has to explicitly page through using offset/@page_index.
+        overflow = len(obj_list) == self.max_response_row_size
+
         # Note: We don't sort explicitly in python. The filter implementation
         # of the DB already sorts by ID if no sort option was given.
 
@@ -930,7 +948,7 @@
         for item_id in obj_list:
             r = {}
             if self.db.security.hasPermission(
-                'View', uid, class_name, itemid=item_id, property='id'
+                'View', uid, class_name, itemid=item_id, property='id',
             ):
                 r = {'id': item_id, 'link': class_path + item_id}
             if display_props:
@@ -942,14 +960,30 @@
 
         result_len = len(result['collection'])
 
+        if not overflow:  # noqa: SIM108  - no nested ternary
+            # add back the number of items in the offset.
+            total_len = kw['offset'] + result_len if 'offset' in kw \
+                else result_len
+        else:
+            # we have hit the max number of rows configured to be
+            # returned.  We hae no idea how many rows can match. We
+            # could use 0 as the sentinel, but a filter could match 0
+            # rows.  So return -1 indicating we exceeded the result
+            # max size on this query.
+            total_len = -1
+
+        # truncate result['collection'] to page size
+        if page['size'] is not None and page['size'] > 0:
+            result['collection'] = result['collection'][:page['size']]
+
         # pagination - page_index from 1...N
         if page['size'] is not None and page['size'] > 0:
             result['@links'] = {}
             for rel in ('next', 'prev', 'self'):
                 if rel == 'next':
                     # if current index includes all data, continue
-                    if page['size'] > result_len: continue  # noqa: E701
-                    index = page['index']+1
+                    if page['size'] >= result_len: continue  # noqa: E701
+                    index = page['index'] + 1
                 if rel == 'prev':
                     if page['index'] <= 1: continue  # noqa: E701
                     index = page['index'] - 1
@@ -964,8 +998,8 @@
                                      for field in input.value
                                      if field.name != "@page_index"])})
 
-        result['@total_size'] = result_len
-        self.client.setHeader("X-Count-Total", str(result_len))
+        result['@total_size'] = total_len
+        self.client.setHeader("X-Count-Total", str(total_len))
         self.client.setHeader("Allow", "OPTIONS, GET, POST")
         return 200, result
 

Roundup Issue Tracker: http://roundup-tracker.org/