开发者

Indexing for quick JOINs between a regularly truncated table and a large table in MySQL

开发者 https://www.devze.com 2023-04-12 06:49 出处:网络
I have two tables that get joined regularly. Table One is about 1 Million rows and grows daily. Table Two is always about 200k less than Table One.Furthermore, Table Two is truncated and repopulated

I have two tables that get joined regularly.

Table One is about 1 Million rows and grows daily. Table Two is always about 200k less than Table One. Furthermore, Table Two is truncated and repopulated every night from a report downloaded from an outside service. The UPDATE..JOIN query I use isn't too speedy, so I'm looking for a possible remedy.

Table One's structure:

#I grow daily and currently am around 1 million rows.
CREATE TABLE table_one(
 id INT NOT NULL AUTO_INCREMENT,
 sku VARCHAR(30), 
 other_one VARCHAR(30),
 PRIMARY KEY(id)
);

Table Two's structure:

#I get truncated every night and am about 200k less rows than Table One.
CREATE TABLE table_two(
 id INT NOT NULL AUTO_INCREMENT,
 sku VARCHAR(30), 
 other_two INT,
 PRIMARY KEY(id)
);

Note that the other_one and other_two fields on both tables are just there to demonstrate that each table has fields (mostly varchar) beyond id and sku but there are actually many different columns on each table. I'm not sure it matters but SKU is unique on table two and but only unique about 95% of the time on table one. Because of this, uniqueness is not enforced on either table in MySQL.

So here is my workflow and question:

1) A bunch of new rows get added to Table One during the day.

2) Each night Table Two is truncated (all rows deleted)

3) A report is downloaded from a third party as a CSV flatfile. That report is then loaded into Table Two using a LOAD DATA LOCAL INFILE command.

4) 3 queries are run that update Table One data and involve a JOIN. They all开发者_如何学运维 look very similar to this:

UPDATE table_one t1
JOIN table_two t2 ON t2.sku = t1.sku
SET t1.other_one = "Other two was greater than zero!"
WHERE t1.other_one IS NULL AND t2.other_two > 0

With the number of rows I have, doing JOINs between these two tables seems to take up quite a bit of time. I was curious as to, with 3 heavy update queries, would it be best to create some index for these tables. The issue being that these indexes would most likely have to be recreated each night when Table Two gets populated. I don't know how this might affect population speed nor do I know which type of index I should use.


You certainly want to have indices for the tables. On table two, drop the index before truncating the table and reloading the data. Once the data has been reloaded, re-create your index.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号