开发者

Optimizing order of WHERE clause evaluation in DB2

开发者 https://www.devze.com 2023-03-23 14:55 出处:网络
I\'m working with two extremely large tables (A is ~20E6 rows, B is ~65E3 rows) and I have very elaborate where clauses to get just the items I need. One thing that could speed it up is to first evalu

I'm working with two extremely large tables (A is ~20E6 rows, B is ~65E3 rows) and I have very elaborate where clauses to get just the items I need. One thing that could speed it up is to first evaluate the date the entry was created (info goes back 10+ years, only need last 3 years); this would eliminate most of the data before doing expensive string processing. Is there a way to force the evaluation of the WHERE date > cutoff_date to happen before the other clauses?

Here is what my query looks like.

SELECT A.C1, A.C2, 
SUM(CASE WHEN B.C1 = '[condition 1]' OR ...  [con开发者_如何学Cdition n] THEN 1 ELSE NULL END) 
    AS SUM_OF_B_C1, ... [other sums]
FROM DB.TABLE_A A LEFT JOIN DB.TABLE_B AS B ON B.COMMON_COL = A.COMMON_COL
WHERE B.DATE > DATE('[cutoff date]')
    AND [complex substr conditions]
    AND NOT [other complex string conditions]
GROUP BY A.C1, A.C2

One problem, is that I only have read permission for the DB. Looking through Control Center, it seems that there is no index on either tables A or B, so that optimization route is out.

Any help you could provide would be great; my small test queries are taking about 5 minutes and in product queries will be about 20 times larger.


Trying to out-think modern query optimizers isn't usually very productive. I think you should spend your time doing two other things.

  1. Find out what business process lets you get indexes created on the columns that should have indexes.
  2. Learn how to run and interpret the output of EXPLAIN PLAN. This lets you approach management with numbers that show your code is dead slow not because you're incompetent, but because they are DB2 is forced to do table scans all over the place.

Tables that have no indexes at all are often generated by programs either to simplify querying and reporting or to reduce the load on the main database server. Find out who's responsible for those programs. Befriend them. Buy them a pizza. Think "social engineering".


You can use a WITH clause in your SELECT statement:

WITH
B2(col1, col2, ... ) AS (
    SELECT col1, col2, ...
    FROM DB.TABLE_B AS B
    WHERE B.DATE > DATE('[cutoff date]')
)
SELECT A.C1, A.C2, 
SUM(CASE WHEN B2.C1 = '[condition 1]' OR ...  [condition n] THEN 1 ELSE NULL END) 
    AS SUM_OF_B_C1, ... [other sums]
FROM DB.TABLE_A A LEFT JOIN B2 ON B2.COMMON_COL = A.COMMON_COL
WHERE 
    AND [complex substr conditions]
    AND NOT [other complex string conditions]
GROUP BY A.C1, A.C2
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号