开发者

allocating content to fixed size buckets without looping in SQL Server

开发者 https://www.devze.com 2023-04-04 18:51 出处:网络
I am working in SQL Server 2008 R2 with a priority ordered set of content that must be assigned to a set of buckets to achieve a content specified value.Each item in the content list is related to nod

I am working in SQL Server 2008 R2 with a priority ordered set of content that must be assigned to a set of buckets to achieve a content specified value. Each item in the content list is related to nodes within a ragged tree hierarchy (the buckets). Each bucket has a value assigned to it and can hold a fixed quantity of content.

I am trying to allocate content in priority order to the buckets that they relate to (or any parent/grandparent up the tree from related content). I must start with the highest bucket value (with empty spaces) and stop only when the bucket values match or exceed my content value.

Hopefully my crude example will help. Assuming the B’s are buckets that can each hold 2 pieces of content and C’s are content. The bracketed numbers are the bucket value and required content value.

allocating content to fixed size buckets without looping in SQL Server

C1 would result in being allocated to B1 (highest value in B1’s tree) and B4 to give it a total value of 7. Both B1 an B4 now only have one slot remaining.

C2 would be allocated B1 and B5 leaving no slots in B1 and 1 slot in B2.

C3 would not be able to use B1 as there are no slots available, so would result in B2, B5 and B9 leaving no slots in B5 and one slot in B2 / B5.

And so on...

I can see how to achieve this iteratively by creating a list of all buckets and their relationship with all child / grand child buckets. Looping though content one at a time, assigning its' buckets and reducing the remaining bucket spaces. The reason I feel that it needs to be a loop is due to the unknown number of spaces remaining in each bucket based on processing all higher priority content.

But looping through content one at a time feels intrinsically wrong and there must be a more efficient way to solve this allocation problem – ideally in one pass…

Example SQL Server code (to match the above diagram)

--core table/fields 
CREATE TABLE Bucket 
(
    Id int,
    Name varchar(3),
    BucketValue int,
    SlotRemaining int --only required for my solution to hold number of slots left to fill

)

CREATE TABLE BucketParent
(
    ChildBucketId int,
    ParentBucketId int
)

CREATE TABLE Content
(
    Id int,             
    Name varchar(3),
    ContentValue int,
    AllocationState int, --only required for my solution to identify content that still needs processing
                        --1=unprocessed, 2=Complete
    Priority int        --order to work through content 1=most imnportant
)

CREATE TABLE ContentBucket
(
    ContentId int,
    BucketId int
)
Go

CREATE TABLE ContentPriorityBucket -- table to record my allocation of content to the most valuable bucket
(
    ContentId int,
    BucketId int
)
Go

--test data to match example (wish id made it smaller now :)
INSERT INTO Bucket Values (1,'B1', 4, null)
INSERT INTO Bucket Values (2,'B2', 5, null)
INSERT INTO Bucket Values (3,'B3', 4, null)
INSERT INTO Bucket Values (4,'B4', 3, null)
INSERT INTO Bucket Values (5,'B5', 3, null)
INSERT INTO Bucket Values (6,'B6', 3, null)
INSERT INTO Bucket Values (7,'B7', 4, null)
INSERT INTO Bucket Values (8,'B8', 2, null)
INSERT INTO Bucket Values (9,'B9', 1, null)
INSERT INTO Bucket Values (10,'B10', 2, null)
INSERT INTO Bucket Values (11,'B11', 1, null)

INSERT INTO BucketParent Values (8, 4)
INSERT INTO BucketParent Values (4, 1)
INSERT INTO BucketParent Values (9, 5)
INSERT INTO BucketParent Values (5, 1)
INSERT INTO BucketParent Values (5, 2)
INSERT INTO BucketParent Values (10, 5)
INSERT INTO BucketParent Values (10, 6)
INSERT INTO BucketParent Values (6, 2)
INSERT INTO BucketParent Values (6, 3)
INSERT INTO BucketParent Values (11, 6)
INSERT INTO BucketParent Values (11, 7)
INSERT INTO BucketParent Values (7, 3)

INSERT INTO Content Values (1,'C1', 5, null, 1)
INSERT INTO Content Values (2,'C2', 8, null, 2)
INSERT INTO Content Values (3,'C3', 9, null, 3)
INSERT INTO Content Values (4,'C4', 10, null, 4)

INSERT INTO ContentBucket Values (1,8)
INSERT INTO ContentBucket Values (1,4)
INSERT INTO ContentBucket Values (2,9)
INSERT INTO ContentBucket Values (3,9)
INSERT INTO ContentBucket Values (4,10)
INSERT INTO ContentBucket Values (4,7)
GO

--Iterative solution that I am trying to improve on
UPDATE  Bucket 
SET     SlotRemaining = 2 --clear previous run and allocate maximum bucket size

UPDATE  Content
SET     AllocationState = 1 --set state to unprocessed

--Clear last run
TRUNCATE Table ContentPriorityBucket

GO 

DECLARE @ContentToProcess int = 0
DECLARE @CurrentContent int 
DECLARE @CurrentContentValue int 

SELECT  @ContentToProcess = COUNT(id) FROM Content WHERE AllocationState =1

WHILE (@ContentToProcess > 0)
BEGIN 
    -- get next content to process
    SELECT  Top(1) @CurrentContent = ID,
            @CurrentContentValue = ContentValue
    FROM    Content 
    WHERE   AllocationState =1 
    ORDER BY Priority; 

    WITH    BucketList (Id, BucketValue, SlotRemaining)
    as
    (
        -- list buckets related to content
        SELECT      b.Id
                    ,b.BucketValue
                    ,b.SlotRemaining
        FROM        ContentBucket cb 
        INNER JOIN  Bucket b on cb.BucketId = b.Id
        WHERE       cb.ContentId = @CurrentContent
        -- need to pull back all buckets (even those that are full as they may have empty parents)
        UNION ALL
        SELECT      b.Id
                    ,b.BucketValue
                    ,b.SlotRemaining
        FROM        BucketList bl
        INNER JOIN  BucketParent bp on bl.Id = bp.ChildBucketId
        INNER JOIN  Bucket b on bp.ParentBucketId = b.Id
    ),
    DistinctBucketList (Id, BucketValue, SlotRemaining)
    as
    (
        --dedupe buckets
        SELECT  distinct Id
                , BucketValue
                , SlotRemaining
        FROM    BucketList
    ),
    BucketListOrdered (Id, BucketValue, RowOrder)
    as
    (
        --order buckets
        SELECT      Id
                    ,BucketValue
                    ,ROW_NUMBER() OVER (ORDER BY BucketValue desc, Id)-- added id to get consistant result if two buckets have same value
        FROM        DistinctBucketList
        WHERE       SlotRemaining >0
    ),
    CulmativeBucketListWithinRequiredValue (Id, RowOrder, CulmativeBucketValue, RequiredBucket)
    as
    (
            -- this will mark all buckets up to the bucket value, but will be 1 bucket short
            SELECT      blo.Id
                        ,blo.RowOrder
                        ,SUM(blc.BucketValue) CulmativeBucketValue
                        ,CASE 
                            WHEN SUM(blc.BucketValue) <=@CurrentContentValue THEN 1
                            ELSE 0 
                        END RequiredBucket
            FROM        BucketListOrdered blo
            LEFT  JOIN  BucketListOrdered blc ON blc.RowOrder  <= blo.RowOrder
            GROUP BY    blo.Id, blo.RowOrder
    )
    -- this will identify all buckets required to top content value
    INSERT INTO Co开发者_StackOverflow社区ntentPriorityBucket
    SELECT      @CurrentContent
                ,b.Id
    FROM        CulmativeBucketListWithinRequiredValue b
    WHERE       b.RowOrder <= (SELECT Max(RowOrder) + 1 FROM CulmativeBucketListWithinRequiredValue WHERE RequiredBucket =1)

    --reduce all used bucket sizes by 1 (could alternatively determine this from ContentPriorityBucket)
    UPDATE      Bucket
    SET         SlotRemaining = SlotRemaining -1
    WHERE       id in (SELECT BucketId FROM ContentPriorityBucket WHERE ContentId = @CurrentContent)

    -- update processed bucket
    UPDATE      Content
    SET         AllocationState = 2
    WHERE       @CurrentContent = Id 

    SELECT      @ContentToProcess = COUNT(id) FROM Content WHERE AllocationState =1
END

SELECT ContentId, BucketId  FROM ContentPriorityBucket

/*
DROP TABLE Bucket 
DROP TABLE BucketParent
DROP TABLE Content
DROP TABLE ContentBucket
DROP TABLE ContentPriorityBucket 
*/


There are a couple points to make about this problem.

First, generalized bin-packing is a NP-Complete problem, and therefore cannot be solved in general in a single pass. This specific bin-packing, since it is an ordered packing, may be different, but the issue of the complexity of the problem remains; it's certainly not O(1), so it may need a loop no matter what.

1-pass non-looping solutions for this seem like they should not be possible; it looks like a problem that isn't made for set-based solutions. You could create a table-valued CLR function, which could find the bucket that each item fits into. Otherwise, keeping the looping solution would be fine. (If you post the code, it might be easier to see if there are improvements possible.)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号