开发者

speed up SQL Query

开发者 https://www.devze.com 2023-04-11 07:09 出处:网络
I have a query which is taking some serious time to execute on anything older than the past, say, hours worth of data.This is going to create a view which will be used for datamining, so the expectati

I have a query which is taking some serious time to execute on anything older than the past, say, hours worth of data. This is going to create a view which will be used for datamining, so the expectations are that it would be able to search back weeks or months of data and return in a reasonable a开发者_StackOverflowmount of time (even a couple minutes is fine... I ran for a date range of 10/3/2011 12:00pm to 10/3/2011 1:00pm and it took 44 minutes!)

The problem is with the two LEFT OUTER JOINs in the bottom. When I take those out, it can run in about 10 seconds. However, those are the bread and butter of this query.

This is all coming from one table. The ONLY thing this query returns differently than the original table is the column xweb_range. xweb_range is a calculated field column (range) which will only use the values from [LO,LC,RO,RC]_Avg where their corresponding [LO,LC,RO,RC]_Sensor_Alarm = 0 (do not include in range calculation if sensor alarm = 1)

WITH Alarm (sub_id, 
LO_Avg, LO_Sensor_Alarm, LC_Avg, LC_Sensor_Alarm, RO_Avg, RO_Sensor_Alarm, RC_Avg, RC_Sensor_Alarm) AS (
SELECT sub_id, LO_Avg, LO_Sensor_Alarm, LC_Avg, LC_Sensor_Alarm, RO_Avg, RO_Sensor_Alarm, RC_Avg, RC_Sensor_Alarm 
FROM dbo.some_table
where sub_id <> '0'
)
, AddRowNumbers AS (
SELECT  rowNumber = ROW_NUMBER() OVER (ORDER BY LO_Avg)
    , sub_id
    , LO_Avg, LO_Sensor_Alarm
    , LC_Avg, LC_Sensor_Alarm
    , RO_Avg, RO_Sensor_Alarm
    , RC_Avg, RC_Sensor_Alarm
FROM Alarm
)
, UnPivotColumns AS (
SELECT rowNumber, value = LO_Avg FROM AddRowNumbers WHERE LO_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, LC_Avg FROM AddRowNumbers WHERE LC_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, RO_Avg FROM AddRowNumbers WHERE RO_Sensor_Alarm = 0
UNION ALL SELECT rowNumber, RC_Avg FROM AddRowNumbers WHERE RC_Sensor_Alarm = 0
)
SELECT rowNumber.sub_id
   , cds.equipment_id
   , cds.read_time
   , cds.LC_Avg
   , cds.LC_Dev
   , cds.LC_Ref_Gap
   , cds.LC_Sensor_Alarm
   , cds.LO_Avg
   , cds.LO_Dev
   , cds.LO_Ref_Gap
   , cds.LO_Sensor_Alarm
   , cds.RC_Avg
   , cds.RC_Dev
   , cds.RC_Ref_Gap
   , cds.RC_Sensor_Alarm
   , cds.RO_Avg
   , cds.RO_Dev
   , cds.RO_Ref_Gap
   , cds.RO_Sensor_Alarm
   , COALESCE(range1.range, range2.range) AS xweb_range
FROM   AddRowNumbers rowNumber
   LEFT OUTER JOIN (SELECT rowNumber, range = MAX(value) - MIN(value) FROM UnPivotColumns GROUP BY rowNumber HAVING COUNT(*) > 1) range1 ON range1.rowNumber = rowNumber.rowNumber
   LEFT OUTER JOIN (SELECT rowNumber, range = AVG(value) FROM UnPivotColumns     GROUP BY rowNumber HAVING COUNT(*) = 1) range2 ON range2.rowNumber = rowNumber.rowNumber
   INNER JOIN dbo.some_table cds
   ON rowNumber.sub_id = cds.sub_id


It's difficult to understand exactly what your query is trying to do without knowing the domain. However, it seems to me like your query is simply trying to find, for each row in dbo.some_table where sub_id is not 0, the range of the following columns in the record (or, if only one matches, that single value):

  • LO_AVG when LO_SENSOR_ALARM=0
  • LC_AVG when LC_SENSOR_ALARM=0
  • RO_AVG when RO_SENSOR_ALARM=0
  • RC_AVG when RC_SENSOR_ALARM=0

You constructed this query assigning each row a sequential row number, unpivoted the _AVG columns along with their row number, computed the range aggregate grouping by row number and then joining back to the original records by row number. CTEs don't materialize results (nor are they indexed, as discussed in the comments). So each reference to AddRowNumbers is expensive, because ROW_NUMBER() OVER (ORDER BY LO_Avg) is a sort.

Instead of cutting this table up just to join it back together by row number, why not do something like:

SELECT cds.sub_id
   , cds.equipment_id
   , cds.read_time
   , cds.LC_Avg
   , cds.LC_Dev
   , cds.LC_Ref_Gap
   , cds.LC_Sensor_Alarm
   , cds.LO_Avg
   , cds.LO_Dev
   , cds.LO_Ref_Gap
   , cds.LO_Sensor_Alarm
   , cds.RC_Avg
   , cds.RC_Dev
   , cds.RC_Ref_Gap
   , cds.RC_Sensor_Alarm
   , cds.RO_Avg
   , cds.RO_Dev
   , cds.RO_Ref_Gap
   , cds.RO_Sensor_Alarm

   --if the COUNT is 0, xweb_range will be null (since MAX will be null), if it's 1, then use MAX, else use MAX - MIN (as per your example)
   , (CASE WHEN stats.[Count] < 2 THEN stats.[MAX] ELSE stats.[MAX] - stats.[MIN] END) xweb_range

FROM dbo.some_table cds

    --cross join on the following table derived from values in cds - it will always contain 1 record per row of cds
    CROSS APPLY
    (
        SELECT COUNT(*), MIN(Value), MAX(Value)
        FROM
        (
            --construct a table using the column values from cds we wish to aggregate
            VALUES (LO_AVG, LO_SENSOR_ALARM),
                   (LC_AVG, LC_SENSOR_ALARM),
                   (RO_AVG, RO_SENSORALARM),
                   (RC_AVG, RC_SENSOR_ALARM)


        ) x (Value, Sensor_Alarm) --give a name to the columns for _AVG and _ALARM
        WHERE Sensor_Alarm = 0 --filter our constructed table where _ALARM=0

    ) stats([Count], [Min], [Max]) --give our derived table and its columns some names

WHERE cds.sub_id <> '0' --this is a filter carried over from the first CTE in your example
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号