Speed up SQL query with huge amount of data

Question

I have this SQL query

SELECT * 
FROM `storages_remains` 
WHERE `FREE` > 0 
  AND `CATALOG_ID` IN (...)

CATALOG_ID is not unique, because table may contains multistorages data.

CATALOG_ID + STORAGE_ID are unique, BUT I can't specify certain storage in that query, it should work for all.

FREE is an integer value of storages remains.

What can I do to optimize this SQL query?

I have an idea to cache something like map

product_id => storage_id => is_avail:bool

is are there any SQL solutions?

From the asker's comment, note 5000 values in the IN:

Table contains about i 1 billion rows in average. Its updates everyday. I expect about 2thousand in average output. Currently when i query with 5 thousands of product its take about 300ms. In in clause will be about 1-5 thousands ids

Currently table have following indexes

All indexes was added by another programmers and i cant clearly understand how does free index works becouse FREE its just a number.

Is any way to create logical virtual column like avail = free>0 : true :false?

Currently query for example

EXPLAIN SELECT * FROM `s4y_catalog_storages_remains` WHERE `FREE`>0 AND CATALOG_ID IN('0d7bd57b-878a-11ed-96e5-001b21e8ace8','fa2f272f-ca6d-11ea-809e-a0369f4567aa','7661e585-5c7d-11e5-a9c4-a0369f4567aa')

Returns following info

Explain plan from sequlize row query:

 [
  [
    {
      id: 1,
      select_type: 'SIMPLE',
      table: 's4y_catalog_storages_remains',
      partitions: null,
      type: 'range',
      possible_keys: 'CATALOG_ID,CATALOG_ID2,FREE,CATALOG+STORAGE+FREE',
      key: 'CATALOG_ID',
      key_len: '110',
      ref: null,
      rows: 13050,
      filtered: 16.95,
      Extra: 'Using index condition; Using where'
    }
  ],
  [
    {
      id: 1,
      select_type: 'SIMPLE',
      table: 's4y_catalog_storages_remains',
      partitions: null,
      type: 'range',
      possible_keys: 'CATALOG_ID,CATALOG_ID2,FREE,CATALOG+STORAGE+FREE',
      key: 'CATALOG_ID',
      key_len: '110',
      ref: null,
      rows: 13050,
      filtered: 16.95,
      Extra: 'Using index condition; Using where'
    }
  ]
]

How many rows does the table have? How many rows [in average] do you expect to be selected by the second criteria CATALOG_ID IN (...) by itself? — The Impaler
– The Impaler, Commented Apr 5, 2024 at 15:05
How many values [in average] do you expect to have inside the IN (...) clause? — The Impaler
– The Impaler, Commented Apr 5, 2024 at 15:14
@TheImpaler innodb_version 5.7.42 .Table contains about i 1 billion rows in average. Its updates everyday. I expect about 2thousand in average output. Currently when i query with 5 thousands of product its take about 300ms. In in clause will be about 1-5 thousands ids — Sergei Illarionov
– Sergei Illarionov, Commented Apr 5, 2024 at 15:20
Can you add an Explain of your query to your question please? run explain select..... and add the result here. — Jorge Campos
– Jorge Campos, Commented Apr 5, 2024 at 16:18

tpdi · Accepted Answer · 2024-04-06 01:19:54Z

0

Run showplan, then put an index on catalog_id. Run showplan and see if it's better. If not, try...
Put an index on catalog_id and free. Run showplan. If not, try...
If many rows are free = 0, put an index on catalog_id and free > 0. Run showplan. (This is basically the cache you wanted.) If not, try...
IN often results in multiple index lookups or falling back to a table scan. You have an IN with 5000 ids? Yeah, you're either doing 5000 queries or a table scan. And that's huge, IN isn't meant to be used that way, and it may actually be slowing down at the query parser. Sometimes a join is faster. Create a table of the catalog ids in your IN, then join to that table in your query. Note that you'll need the index from solution 1. Run showplan.

answered Apr 6, 2024 at 1:19

tpdi

35.3k11 gold badges83 silver badges123 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sergei Illarionov Over a year ago

How can i add index with free>0 ,in indexes i see there is just free index,but seeems its useless. The origin in cames from map of products map in node js. Thats products may came from cache or from source huge query having a lot of joins. I cant add another joins to that query becouse my current query works only for case that neeed to filter products by avail,when source array not filtered by awail and may be cached

Jorge Campos Over a year ago

To add to this answer, that IN part is exactly why some other RDBMs limits the amount of items in an IN clause, even mysql, though it is configurable.

Sergei Illarionov Over a year ago

@JorgeCampos can you explain how should i add the free>0 index and prouduct_id. Doest i need to create specific boolean column and update it every time when some data in the table inserted/updated. Or maybe i just need to add the free + product_id index?

Sergei Illarionov Over a year ago

@JorgeCampos i created virtual column avail and added the index on avail and product id.But seems it doesnot give significant impact. Feels like there is no way to optimize huge IN() query in sql except case when it contains unique keys.

Collectives™ on Stack Overflow

Speed up SQL query with huge amount of data

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related