0

If going to some very end of the pages, it will still fully scan all the previous records (on Disk). One of the solution is to get the ID of the last previous record and then use where to filter (where id > previous ID Limit 0, 20 ). But if the SQL is including group up and sum, this method will not be work as the number of sum will be wrong. Does anyone have any solution? I know that if you want better performance, you won't use mysql. But I still like to know is there a method to optimize it.

For Example:

SELECT 
    productID, 
    locationID, 
    SUM(qty) AS total_qty
FROM 
    inventory
WHERE 
    create_date BETWEEN 'yyyy-mm-dd' AND 'yyyy-mm-dd'
GROUP BY 
    productID, 
    locationID
ORDER BY 
    productID, 
    locationID;
limit 100, 20;

Assuming the inventory log table has the following sample data:

id  productID   locationID  qty create_date
1   101 1   10  2024-01-01 10:00:00
2   101 1   -2  2024-01-02 12:00:00
3   102 1   5   2024-01-03 14:00:00
4   101 2   20  2024-01-01 09:00:00
5   101 1   3   2024-01-04 16:00:00
6   102 1   -1  2024-01-05 18:00:00

With the given sql, the result will be:

SELECT 
    productID, 
    locationID, 
    SUM(qty) AS total_qty
FROM 
    inventory
WHERE 
    create_date BETWEEN '2024-01-01' AND '2024-01-04'
GROUP BY 
    productID, 
    locationID
ORDER BY 
    productID, 
    locationID;

Result:
productID   locationID  total_qty
101 1   11
101 2   20
102 1   5

But there will not only 3 records, so I would like to do pagination with limit. But I found that with (limit), it will still scan the all records before the offset on the disk. And it seem not possible to filter by keyset pagination as the id is not sequential and related.

5
  • It won't necessarily scan the rows on disk. It'll scan them, which is bad, but they may be in the buffer pool in RAM. Assuming the table uses the default storage engine, InnoDB. Commented May 26, 2024 at 6:46
  • The pagination looks faulty. You group by locationID and productID, but only order by productID. Let's say product P30 has three locations L1,L2, L3. Your last three rows with limit 100, 20 may be (P29 ,L3), (P30,L1). Then, in the next pass with limit 120, 20 the first lines may be (P30,L3), (P30,L1), (P31,L2), and you'll have got (P30,L1) twice with (P30,L2) missing from your result, because you can get the P30 rows in any order, one time as L1, L2, L3, one time as L3, L1, L2 or whatever other order - it is undefined. Commented May 26, 2024 at 7:12
  • Worse, your query seems invalid. You group by locationID and productID, but select create_date. Which date would that be with many rows per group? You would have to select, say, MIN(create_date) or MAX(create_date) to make this a valid aggregation. This indicates that you are working in MySQL's notorious cheat mode, which even was the default in old versions. In MySQL, always work in sql_mode = 'ONLY_FULL_GROUP_BY' in order to be warned of invalid group-by queries rather than getting arbitrary results. And maybe you want to upgrade your version where this mode is the default. Commented May 26, 2024 at 7:17
  • Maybe I has not show the example correctly. Let say I have an 'Inventory Log Table'. It records: (id, product ID, location ID, qty, create_date). Given that (product ID) may exist in different (Location ID). As it's save as a log record, the qty should contain positive or negative number. Now, i want to determine how many of each product are in each location over a specified period of time. Isn't my SQL should be correct? Commented May 26, 2024 at 10:20
  • "But there will not only 3 records, so I would like to do pagination with limit." when you have only 3 records you never need pagination.... Commented May 26, 2024 at 13:51

2 Answers 2

2

You may consider adding the following covering index:

CREATE INDEX idx2 ON inventory (create_date, productID, locationID, qty);

This index, if used, should let MySQL discard any records with out of range create_date values. Post that, MySQL can scan the entire index and aggregate by the 2 columns in the GROUP BY clause. The index contains these 2 aggregate columns as well as qty.

As @Thorsten has pointed out in his comment below, your current query is invalid (at least with ONLY_FULL_GROUP_BY mode enabled), as it includes create_date in the select clause while omitting it from the GROUP BY clause. Here is one possible corrected version:

SELECT productID, locationID, SUM(qty)
FROM inventory
WHERE create_date BETWEEN 'yyyy-mm-dd' and 'yyyy-mm-dd'
GROUP BY productID, locationID
ORDER BY productID
LIMIT 100, 20;
Sign up to request clarification or add additional context in comments.

4 Comments

create_date is missing from your index. (As is, this column renders the query invalid, so I assume they are working in cheat mode, where MySQL turns this into ANY_VALUE(create_date). Anyway, we need it in the index, if we want a covering index.)
@ThorstenKettner You mean to say, the OP's query is missing create_date from the GROUP BY clause. I should have detected this, but, assuming the OP can fix the query, it doesn't make my answer less valid.
@ThorstenKettner - Filtering happens before grouping, so your comment about only_full_group_by seems irrelevant.
@Rick James: There was create_date in the select list of the original query. It was altered after my comment.
1

OFFSET is the real problem. The processing has to skip over 100 rows before getting and delivering the desired 20. But because of the WHERE and GROUP BY, there is no easy way to avoid the overhead.

I assume you have millions of rows, correct? (Otherwise, I would consider the performance problem minimal.)

I recommend two techniques for speeding up this query -- one for shrinking the table and one for avoiding OFFSET.

What are the date ranges like? Weekly? Daily? Arbitrary? How many rows rows per week/day/whatever? If at least 10 rows (average) per unit date range unit, building and maintaining a Summary Table .

To get rid of OFFSET (and its flaws), do pagination via "remember where you left off": Pagination

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.