How to optimize a count SQL query on a big table_问答_开发者

I have a big table on mysql (innodb) which contains products assets (13 millions of rows). Here a little schema of my database :

product <-many2one-- file_item --one2many--> family --many2one--> download_type

The *file_item* table is the big table with millions of rows. I try to count products by download types with the following sql query :

select t.name as type, 
count(p.product_id) as n 
from file_item p 
inner join family f on f.id = p.family_id 
inner join type t on f.id_type = t.id 
group by t.id order by t.name;

There are 3 indexes on *file_item* table:

开发者_JAVA百科

product_family_idx (product_id, family_id)
family_idx (family_id)
product_idx (product_id) Explain output :

+----+-------------+-------+--------+-----------------------------------+---------+---------+-------------------+----------+---------------------------------+
| id | select_type | table | type   | possible_keys                     | key     | key_len | ref               | rows     | Extra                           |
+----+-------------+-------+--------+-----------------------------------+---------+---------+-------------------+----------+---------------------------------+
|  1 | SIMPLE      | p     | ALL    | FAMILY_IDX,PRODUCT_FAMILY_IDX     | NULL    | NULL    | NULL              | 13862870 | Using temporary; Using filesort | 
|  1 | SIMPLE      | f     | eq_ref | PRIMARY,TYPE_ID                   | PRIMARY | 4       | MEDIA.p.FAMILY_IDX|        1 |                                 | 
|  1 | SIMPLE      | t     | eq_ref | PRIMARY                           | PRIMARY | 4       | MEDIA.f.TYPE_ID   |        1 |                                 | 
+----+-------------+-------+--------+-----------------------------------+---------+---------+-------------------+----------+---------------------------------+

The query takes more than 1 hour to return the results. Please how I can optimize the query ?!

Here is your original query:

select t.name as type,  
count(p.product_id) as n  
from file_item p  
inner join family f on f.id = p.family_id  
inner join type t on f.id_type = t.id  
group by t.id order by t.name;

You will need to make two major changes:

MAJOR CHANGE # 1 : Refactor the Query

SELECT A.ProductCount,B.name type
FROM
(
    SELECT id_type id,COUNT(1) ProductCount
    FROM
    (
        SELECT p.id_type
        FROM (SELECT family_id,id_type FROM file_item) p
        INNER JOIN (SELECT id FROM family) f on f.id = p.family_id
    ) AA
    GROUP BY id_type
) A
INNER JOIN type B USING (id)
ORDER BY B.name;

MAJOR CHANGE # 2 : Create Indexes That Will Support the Refactored Query

ALTER TABLE file_item ADD INDEX family_type_idx (family_id,id_type);

Give it a Try !!!

Lets decompose the query into parts:

First, fetch each row of file_item => 13M rows
For each returned row, fetch a row of family matching f.id = p.family_id. => 13M fetches, 13M rows
For each returned row, fetch a row of type matching f.id_type = t.id. => 13M fetches, 13M rows
Group by type.id => 10 rows
Sort by type.name => 10 rows to sort

As you can see, your query needs fetch 13M rows from family and 13M rows from type.

You should start be reducing the number of row fetches needed to execute the query:

Assuming that f.id_type is a non-NULL foreign key, you can change the inner join type t to a left join type t. Then, change group by t.id to group by f.id_type.

Grouping on the f table instead of the t table and changing the inner join to a left join allows MySQL to execute the group by before fetching rows from t.

group by drastically reduces the number of rows, so this drastically reduce the number of fetches from t too:

First, fetch each row of file_item => 13M rows
For each returned row, fetch a row of family matching f.id = p.family_id. => 13M fetches, 13M rows
Group by type.id => 10 rows
For each returned row, fetch a row of type matching f.id_type = t.id. => 10 fetches, 10 rows
Sort by type.name => 10 rows to sort

The result is that the query already fetches 13M less rows.

You can reduce that even more by denormalizing the schema a little:

If you add a family_type_id column in file_item, you could rewrite your query like this:

SELECT count(1)
FROM file_item p
JOIN type t ON t.id = p.family_type_id
GROUP BY p.family_type_id
ORDER BY t.name

With an index on file_item.family_type_id, this query should execute in milliseconds.

How to optimize a count SQL query on a big table

精彩评论

关注公众号

热门标签

图文推荐

How to optimize a count SQL query on a big table

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：