0

I have this SQL query

SELECT * 
FROM `storages_remains` 
WHERE `FREE` > 0 
  AND `CATALOG_ID` IN (...)

CATALOG_ID is not unique, because table may contains multistorages data.

CATALOG_ID + STORAGE_ID are unique, BUT I can't specify certain storage in that query, it should work for all.

FREE is an integer value of storages remains.

What can I do to optimize this SQL query?

I have an idea to cache something like map

product_id => storage_id => is_avail:bool

is are there any SQL solutions?

From the asker's comment, note 5000 values in the IN:

Table contains about i 1 billion rows in average. Its updates everyday. I expect about 2thousand in average output. Currently when i query with 5 thousands of product its take about 300ms. In in clause will be about 1-5 thousands ids

Currently table have following indexes enter image description here

All indexes was added by another programmers and i cant clearly understand how does free index works becouse FREE its just a number.

Is any way to create logical virtual column like avail = free>0 : true :false?

Currently query for example

EXPLAIN SELECT * FROM `s4y_catalog_storages_remains` WHERE `FREE`>0 AND CATALOG_ID IN('0d7bd57b-878a-11ed-96e5-001b21e8ace8','fa2f272f-ca6d-11ea-809e-a0369f4567aa','7661e585-5c7d-11e5-a9c4-a0369f4567aa')

Returns following info enter image description here

Explain plan from sequlize row query:

 [
  [
    {
      id: 1,
      select_type: 'SIMPLE',
      table: 's4y_catalog_storages_remains',
      partitions: null,
      type: 'range',
      possible_keys: 'CATALOG_ID,CATALOG_ID2,FREE,CATALOG+STORAGE+FREE',
      key: 'CATALOG_ID',
      key_len: '110',
      ref: null,
      rows: 13050,
      filtered: 16.95,
      Extra: 'Using index condition; Using where'
    }
  ],
  [
    {
      id: 1,
      select_type: 'SIMPLE',
      table: 's4y_catalog_storages_remains',
      partitions: null,
      type: 'range',
      possible_keys: 'CATALOG_ID,CATALOG_ID2,FREE,CATALOG+STORAGE+FREE',
      key: 'CATALOG_ID',
      key_len: '110',
      ref: null,
      rows: 13050,
      filtered: 16.95,
      Extra: 'Using index condition; Using where'
    }
  ]
]
8
  • 1
    How many rows does the table have? How many rows [in average] do you expect to be selected by the second criteria CATALOG_ID IN (...) by itself? Commented Apr 5, 2024 at 15:05
  • 2
    What database are you using (and version of it)? Commented Apr 5, 2024 at 15:06
  • 1
    How many values [in average] do you expect to have inside the IN (...) clause? Commented Apr 5, 2024 at 15:14
  • @TheImpaler innodb_version 5.7.42 .Table contains about i 1 billion rows in average. Its updates everyday. I expect about 2thousand in average output. Currently when i query with 5 thousands of product its take about 300ms. In in clause will be about 1-5 thousands ids Commented Apr 5, 2024 at 15:20
  • 2
    Can you add an Explain of your query to your question please? run explain select..... and add the result here. Commented Apr 5, 2024 at 16:18

1 Answer 1

0
  1. Run showplan, then put an index on catalog_id. Run showplan and see if it's better. If not, try...

  2. Put an index on catalog_id and free. Run showplan. If not, try...

  3. If many rows are free = 0, put an index on catalog_id and free > 0. Run showplan. (This is basically the cache you wanted.) If not, try...

  4. IN often results in multiple index lookups or falling back to a table scan. You have an IN with 5000 ids? Yeah, you're either doing 5000 queries or a table scan. And that's huge, IN isn't meant to be used that way, and it may actually be slowing down at the query parser. Sometimes a join is faster. Create a table of the catalog ids in your IN, then join to that table in your query. Note that you'll need the index from solution 1. Run showplan.

Sign up to request clarification or add additional context in comments.

4 Comments

How can i add index with free>0 ,in indexes i see there is just free index,but seeems its useless. The origin in cames from map of products map in node js. Thats products may came from cache or from source huge query having a lot of joins. I cant add another joins to that query becouse my current query works only for case that neeed to filter products by avail,when source array not filtered by awail and may be cached
To add to this answer, that IN part is exactly why some other RDBMs limits the amount of items in an IN clause, even mysql, though it is configurable.
@JorgeCampos can you explain how should i add the free>0 index and prouduct_id. Doest i need to create specific boolean column and update it every time when some data in the table inserted/updated. Or maybe i just need to add the free + product_id index?
@JorgeCampos i created virtual column avail and added the index on avail and product id.But seems it doesnot give significant impact. Feels like there is no way to optimize huge IN() query in sql except case when it contains unique keys.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.