2

(edited) For more details about the app it self, please, also see: Simple but heavy application consuming a lot of resources. How to Optimize? (The adopted solution was use both joins and fulltext search)

I have the following query running up to roughly 500.000 rows in 25 seconds. If I remove the ORDER, it takes 0.5 seconds.

Fisrt test

Keeping the ORDER and removing all t. and tu. columns, the query takes 7 seconds.

Second test

If I add or remove an INDEX to the i.created_at field the response time remain the same.

QUERY:

**EDITED: I'VE NOTICED THAT BOTH GROUP BY AND ORDER BY SLOW DOWN THE QUERY (I've also achieve a little gain in the query changing the joins. The gain was to 10secs, but at all, the problem remains). With the modification, the EXPLAIN have stopped to return filesort, but stills returning "using temporary" **

SELECT SQL_NO_CACHE
        DISTINCT `i`.`id`, 
        `i`.`entity`, 
        `i`.`created_at`, 
        `i`.`collected_at`, 

        `t`.`status_id` AS  `twt_status_id`, 
        `t`.`user_id` AS `twt_user_id`, 
        `t`.`content` AS `twt_content`, 
        `tu`.`id` AS `twtu_id`, 
        `tu`.`screen_name` AS `twtu_screen_name`, 
        `tu`.`profile_image` AS `twtu_profile_image`


        FROM `mtrt_items` AS `i`

        LEFT JOIN `mtrt_users` AS `u` ON i.user_id =u.id

        LEFT JOIN `twt_tweets_content` AS `t` ON t.id =i.id
        LEFT JOIN `twt_users` AS `tu` ON u.id = tu.id

        INNER JOIN `mtrt_items_searches` AS `r` ON i.id =r.item_id
        INNER JOIN `mtrt_searches` AS `s` ON s.id =r.search_id
        INNER JOIN `mtrt_searches_groups` AS `sg` ON sg.search_id =s.id
        INNER JOIN `mtrt_search_groups` AS `g` ON sg.group_id =g.id
        INNER JOIN `account_clients` AS `c` ON g.client_id =c.id                

    ORDER BY `i`.`created_at` DESC 
    LIMIT 100 OFFSET 0

Here is the EXPLAIN (EDITED):

+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
| id | select_type | table | type   | possible_keys      | key       | key_len | ref                    | rows | Extra                        |
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+
|  1 | SIMPLE      | c     | index  | PRIMARY            | PRIMARY   | 4       | NULL                   |    1 | Using index; Using temporary |
|  1 | SIMPLE      | g     | ref    | PRIMARY,client_id  | client_id | 4       | clubr_new.c.id         |    3 | Using index                  |
|  1 | SIMPLE      | sg    | ref    | group_id,search_id | group_id  | 4       | clubr_new.g.id         |    1 | Using index                  |
|  1 | SIMPLE      | s     | eq_ref | PRIMARY            | PRIMARY   | 4       | clubr_new.sg.search_id |    1 | Using index                  |
|  1 | SIMPLE      | r     | ref    | search_id,item_id  | search_id | 4       | clubr_new.s.id         | 4359 | Using where                  |
|  1 | SIMPLE      | i     | eq_ref | PRIMARY            | PRIMARY   | 8       | clubr_new.r.item_id    |    1 |                              |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY            | PRIMARY   | 8       | clubr_new.i.user_id    |    1 | Using index                  |
|  1 | SIMPLE      | t     | eq_ref | PRIMARY            | PRIMARY   | 4       | clubr_new.i.id         |    1 |                              |
|  1 | SIMPLE      | tu    | eq_ref | PRIMARY            | PRIMARY   | 8       | clubr_new.u.id         |    1 |                              |
+----+-------------+-------+--------+--------------------+-----------+---------+------------------------+------+------------------------------+

Here is the mtrt_items table:

+--------------+-------------------------------------------------------+------+-----+---------+----------------+
| Field        | Type                                                  | Null | Key | Default | Extra          |
+--------------+-------------------------------------------------------+------+-----+---------+----------------+
| id           | bigint(20)                                            | NO   | PRI | NULL    | auto_increment |
| entity       | enum('twitter','facebook','youtube','flickr','orkut') | NO   | MUL | NULL    |                |
| user_id      | bigint(20)                                            | NO   | MUL | NULL    |                |
| created_at   | datetime                                              | NO   | MUL | NULL    |                |
| collected_at | datetime                                              | NO   |     | NULL    |                |
+--------------+-------------------------------------------------------+------+-----+---------+----------------+

 CREATE TABLE `mtrt_items` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `entity` enum('twitter','facebook','youtube','flickr','orkut') COLLATE utf8_unicode_ci NOT NULL,
  `user_id` bigint(20) NOT NULL,
  `created_at` datetime NOT NULL,
  `collected_at` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `mtrt_user_id` (`user_id`),
  KEY `entity` (`entity`),
  KEY `created_at` (`created_at`),
  CONSTRAINT `mtrt_items_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `mtrt_users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=309650 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

The twt_tweets_content is MyISAM and is also used for fulltext searches:

+-----------+--------------+------+-----+---------+-------+
| Field     | Type         | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| id        | int(11)      | NO   | PRI | NULL    |       |
| user_id   | int(11)      | NO   | MUL | NULL    |       |
| status_id | varchar(100) | NO   | MUL | NULL    |       |
| content   | varchar(200) | NO   | MUL | NULL    |       |
+-----------+--------------+------+-----+---------+-------+
10
  • Just to be sure, you're doing RESET QUERY CACHE before each benchmark. right? Commented Aug 11, 2011 at 2:20
  • what happens when you try ORDER BY i.id` DESC` ? Generally it would give you the exact same result. id is auto increment, so an older id means that the record was created further in the past. Commented Aug 11, 2011 at 2:21
  • @spinning_plate Actually, I use SELECT SQL_NO_CACHE.... Commented Aug 11, 2011 at 2:22
  • @Sabeen-Malik The same time actually, it's strange. I've notice that using the EXPLAIN on phpmyadmin, the first row of the result posted in this question has the following: Using index; Using temporary; Using filesort not just using index. Do you know why? Commented Aug 11, 2011 at 2:28
  • Stupid idea, but if you wrap the whole thing in SELECT * FROM (your query without ordering) alias ORDER BY alias.id do you get the 0.05 second run time? Commented Aug 11, 2011 at 2:33

2 Answers 2

6

Instead of placing the Order By into the main query, wrap it, like so:

SELECT * FROM (   
  ... your query
) ORDER BY `created at`

Take a look at the query plan. You will find that in your case, the sort is performed on your table mtrt_items before the outer join is performed. In the rewrite I've partially provided, the sort is applied after the outer joins, and is applied on a much smaller set.

UPDATE

Assuming that the LIMIT is being applied to a large set (500,000?), it looks like you can perform the top before doing any of the joins.

SELECT * from (
    SELECT 
    `id`, ... `created_at`, ...
    ORDER BY `i`.`created_at` DESC 
    LIMIT 100 OFFSET 0) as i

    LEFT JOIN `mtrt_users` AS `u` ON i.user_id =u.id

    LEFT JOIN `twt_tweets_content` AS `t` ON t.id =i.id
    LEFT JOIN `twt_users` AS `tu` ON t.user_id = tu.id

    INNER JOIN `mtrt_items_searches` AS `r` ON i.id =r.item_id
    INNER JOIN `mtrt_searches` AS `s` ON s.id =r.search_id
    INNER JOIN `mtrt_searches_groups` AS `sg` ON sg.search_id =s.id
    INNER JOIN `mtrt_search_groups` AS `g` ON sg.group_id =g.id
    INNER JOIN `account_clients` AS `c` ON g.client_id =c.id                

GROUP BY i.id
Sign up to request clarification or add additional context in comments.

9 Comments

As I said in the other comment... same time.
Also, check that the explain returns "Using index; Using temporary; Using filesort". Maybe the problem?
All filesort means is that it's sorting. It's a bad name. It's effectively a quicksort. What's irritating about this is that, if it is as you say, then the sort is really working to sort a huge set from which it will only return 100 items. If the table from which it pulls 100 items is, say, 500,000 items, then this is why everything slows down on you.
Not sure if I understood you. But in this case I have result of ~300k of items and ~150k of users, totalizing ~450k.
Unfortunately, the large set is from one test account, it will have more than one, each one with approximately ~100k+ per month. Then, the large set will be ~5M+ in just one semester.
|
0

Don't include the VARCHAR/TEXT fields in your initial query. This will create the TEMPORARY table required for the sorting, using the MEMORY engine and this will increase the efficiency dramatically. You can collect the text fields later using another query, without any sorting, simply with a condition on the PRIMARY KEY field and merge the data in your script (assuming that you are using one).
Also get rid of any JOINs (INNER or OUTER) that you don't actually take any data from.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.