So I have a set of data that includes a column of dates and an ID column. Here's a sample of the date data:
5/30/10 12:00 AM
5/30/10 12:01 AM
5/30/10 1:59 AM
5/30/10 1:59 AM
5/30/10 3:58 AM
5/30/10 3:58 AM
5/30/10 5:57 AM
5/30/10 6:57 AM
5/30/10 7:56 AM
5/30/10 7:56 AM
5/30/10 9:55 AM
5/30/10 11:54 AM
What I'd like to do is create buckets for these rows based on a parameter like "2 hours". These two hour windows would start at the earliest date in the dataset, but would jump to the next starting time as you scanned through the list. For example, the expected output of "buckets" for my list would be:
5/30/10 12:00 AM 1
5/30/10 12:01 AM 1
5/30/10 1:59 AM 1
5/30/10 1:59 AM 1
5/30/10 3:58 AM 2
5/30/10 3:58 AM 2
5/30/10 5:57 AM 2
5/30/10 6:57 AM 开发者_运维百科3
5/30/10 7:56 AM 3
5/30/10 7:56 AM 3
5/30/10 8:55 AM 3
5/30/10 11:54 AM 4
So you can see that when I get to 3:58 AM, it's in the 2nd group because it's more than 2 hours past 12:00 AM. However 5:57 AM is still in the 2nd group even though it's more than 4 hours past 12:00 AM because the 2nd group base time is 3:58 AM, not 2:00 AM.
I've tried to create the grouping column by using a partition function like this:
FLOOR(DATEDIFF(SECOND, t.BaseCreateDate, t.CreateDate) / t.DedupWindow)
Where BaseCreateDate is the earliest date in my set, CreateDate is the data I listed, and DedupWindow is the 2 hours. However, that gives me fixed 2 hour windows and I can't seem to find math that resets the base as needed through the data.
I have this working in a cursor, but for a couple of reasons I'd like to get it working set based.
Nasty but works http://cloudexchange.cloudapp.net/stackoverflow/q/2281
Aggregates are not allowed in the recursive part of a CTE.
DECLARE @t AS TABLE (dt datetime) ;
INSERT INTO @t
VALUES ('5/30/10 12:00 AM')
INSERT INTO @t
VALUES ('5/30/10 12:01 AM')
INSERT INTO @t
VALUES ('5/30/10 1:59 AM')
INSERT INTO @t
VALUES ('5/30/10 1:59 AM')
INSERT INTO @t
VALUES ('5/30/10 3:58 AM')
INSERT INTO @t
VALUES ('5/30/10 3:58 AM')
INSERT INTO @t
VALUES ('5/30/10 5:57 AM')
INSERT INTO @t
VALUES ('5/30/10 6:57 AM')
INSERT INTO @t
VALUES ('5/30/10 7:56 AM')
INSERT INTO @t
VALUES ('5/30/10 7:56 AM')
INSERT INTO @t
VALUES ('5/30/10 8:55 AM')
INSERT INTO @t
VALUES ('5/30/10 11:54 AM') ;
WITH CTE
AS (
SELECT dt
,1 AS bucket
FROM @t
WHERE dt = (
SELECT MIN(dt)
FROM @t
)
UNION ALL
SELECT t.dt -- Cannot use aggregates here
,CTE.bucket + 1 AS bucket
FROM CTE
INNER JOIN @t AS t
ON t.dt >= DATEADD(HOUR, 2, CTE.dt)
),
X AS (
SELECT dt
,bucket
,ROW_NUMBER() OVER (PARTITION BY BUCKET ORDER BY dt) AS bucket_start
FROM CTE
),
Y AS (
SELECT *
FROM X
WHERE bucket_start = 1
)
SELECT t.*
,(
SELECT MAX(bucket)
FROM Y
WHERE Y.dt <= t.dt
) AS bucket
FROM @t AS t
Edit: Here's quite a ridiculous solution that appears to bring back the right results though.
http://cloudexchange.cloudapp.net/stackoverflow/q/2282
WITH DATE AS
(
SELECT '20100530 00:00:00.000' AS CreateDate UNION ALL
SELECT '20100530 00:01:00.000' UNION ALL
SELECT '20100530 01:59:00.000' UNION ALL
SELECT '20100530 01:59:00.000' UNION ALL
SELECT '20100530 03:58:00.000' UNION ALL
SELECT '20100530 03:58:00.000' UNION ALL
SELECT '20100530 05:57:00.000' UNION ALL
SELECT '20100530 06:57:00.000' UNION ALL
SELECT '20100530 07:56:00.000' UNION ALL
SELECT '20100530 07:56:00.000' UNION ALL
SELECT '20100530 08:55:00.000' UNION ALL
SELECT '20100530 11:54:00.000'
),PossibleBuckets As
(
SELECT earlier.CreateDate, MIN(later.CreateDate) AS NextBucket
FROM Date AS earlier INNER JOIN
Date AS later ON later.CreateDate > earlier.CreateDate
and later.CreateDate > DATEADD(hour, 2, earlier.CreateDate)
GROUP BY earlier.CreateDate
),
dates AS (
SELECT TOP(1) CreateDate, NextBucket, 1 AS rank
FROM PossibleBuckets
ORDER BY CreateDate
UNION ALL
SELECT PB.CreateDate, PB.NextBucket, Rank + 1 AS Rank
FROM PossibleBuckets PB
JOIN dates on dates.NextBucket = PB.CreateDate
)
SELECT d.CreateDate, COALESCE(x.Rank, (SELECT MAX(Rank)+1 FROM dates), 1) AS Rank
FROM date d left join dates x on d.CreateDate >= x.CreateDate AND d.CreateDate < x.NextBucket
精彩评论