开发者

How to optimize a count SQL query on a big table

开发者 https://www.devze.com 2023-03-31 00:12 出处:网络
I have a big table on mysql (innodb) which contains products assets (13 millions of rows). Here a little schema of my database :

I have a big table on mysql (innodb) which contains products assets (13 millions of rows). Here a little schema of my database :

product <-many2one-- file_item --one2many--> family --many2one--> download_type

The *file_item* table is the big table with millions of rows. I try to count products by download types with the following sql query :

select t.name as type, 
count(p.product_id) as n 
from file_item p 
inner join family f on f.id = p.family_id 
inner join type t on f.id_type = t.id 
group by t.id order by t.name;

There are 3 indexes on *file_item* table:

    开发者_JAVA百科
  • product_family_idx (product_id, family_id)
  • family_idx (family_id)
  • product_idx (product_id) Explain output :
+----+-------------+-------+--------+-----------------------------------+---------+---------+-------------------+----------+---------------------------------+
| id | select_type | table | type   | possible_keys                     | key     | key_len | ref               | rows     | Extra                           |
+----+-------------+-------+--------+-----------------------------------+---------+---------+-------------------+----------+---------------------------------+
|  1 | SIMPLE      | p     | ALL    | FAMILY_IDX,PRODUCT_FAMILY_IDX     | NULL    | NULL    | NULL              | 13862870 | Using temporary; Using filesort | 
|  1 | SIMPLE      | f     | eq_ref | PRIMARY,TYPE_ID                   | PRIMARY | 4       | MEDIA.p.FAMILY_IDX|        1 |                                 | 
|  1 | SIMPLE      | t     | eq_ref | PRIMARY                           | PRIMARY | 4       | MEDIA.f.TYPE_ID   |        1 |                                 | 
+----+-------------+-------+--------+-----------------------------------+---------+---------+-------------------+----------+---------------------------------+

The query takes more than 1 hour to return the results. Please how I can optimize the query ?!


Here is your original query:

select t.name as type,  
count(p.product_id) as n  
from file_item p  
inner join family f on f.id = p.family_id  
inner join type t on f.id_type = t.id  
group by t.id order by t.name; 

You will need to make two major changes:

MAJOR CHANGE # 1 : Refactor the Query

SELECT A.ProductCount,B.name type
FROM
(
    SELECT id_type id,COUNT(1) ProductCount
    FROM
    (
        SELECT p.id_type
        FROM (SELECT family_id,id_type FROM file_item) p
        INNER JOIN (SELECT id FROM family) f on f.id = p.family_id
    ) AA
    GROUP BY id_type
) A
INNER JOIN type B USING (id)
ORDER BY B.name;

MAJOR CHANGE # 2 : Create Indexes That Will Support the Refactored Query

ALTER TABLE file_item ADD INDEX family_type_idx (family_id,id_type);

Give it a Try !!!


Lets decompose the query into parts:

  1. First, fetch each row of file_item => 13M rows
  2. For each returned row, fetch a row of family matching f.id = p.family_id. => 13M fetches, 13M rows
  3. For each returned row, fetch a row of type matching f.id_type = t.id. => 13M fetches, 13M rows
  4. Group by type.id => 10 rows
  5. Sort by type.name => 10 rows to sort

As you can see, your query needs fetch 13M rows from family and 13M rows from type.

You should start be reducing the number of row fetches needed to execute the query:

Assuming that f.id_type is a non-NULL foreign key, you can change the inner join type t to a left join type t. Then, change group by t.id to group by f.id_type.

Grouping on the f table instead of the t table and changing the inner join to a left join allows MySQL to execute the group by before fetching rows from t.

group by drastically reduces the number of rows, so this drastically reduce the number of fetches from t too:

  1. First, fetch each row of file_item => 13M rows
  2. For each returned row, fetch a row of family matching f.id = p.family_id. => 13M fetches, 13M rows
  3. Group by type.id => 10 rows
  4. For each returned row, fetch a row of type matching f.id_type = t.id. => 10 fetches, 10 rows
  5. Sort by type.name => 10 rows to sort

The result is that the query already fetches 13M less rows.

You can reduce that even more by denormalizing the schema a little:

If you add a family_type_id column in file_item, you could rewrite your query like this:

SELECT count(1)
FROM file_item p
JOIN type t ON t.id = p.family_type_id
GROUP BY p.family_type_id
ORDER BY t.name

With an index on file_item.family_type_id, this query should execute in milliseconds.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号