开发者

Ordered insertion at next unused index, generic SQL

开发者 https://www.devze.com 2023-02-06 11:50 出处:网络
There have been various similar questions, but they either referred to a too specific DB or assumed unsorted data.

There have been various similar questions, but they either referred to a too specific DB or assumed unsorted data.

In my case, the SQL should be portable if possible. The index column in question is a clustered PK containing a timestamp.

The timestamp is 99% of the time larger than previously inserted value. On rare occasions however, it can be smaller, or collide with an existing value.

I'm currently using this code to insert new values:

IF NOT EXISTS (select * from Foo where Timestamp = @ts) BEGIN
    INSERT INTO Foo (1675655400) VALUES (@ts);
END
ELSE BEGIN
    INSERT INTO Foo (1675655400) VALUES (
    (SELECT Max (t1.Timestamp) - 1
    FROM Foo t1
    WHERE Timestamp < @ts
    AND NOT EXISTS (select * from Foo t2 where t2.Timestamp = t1.Timestamp - 1))
    );
END;

If the row is unused yet, just insert. Else, find the closest free row with a smaller value using an EXISTS check.

I am a novice when it comes to databases, so I'm not sure if there is a better way. I'm open for any ideas to make the code simpler and/or faster (around 100-1000 insertion开发者_如何学Cs per second), or to use a different approach altogether.

Edit Thank you for your comments ans answers so far.

To explain about the nature of my case: The timestamp is the only value ever used to sort the data, minor inconsistencies can be neglected. There are no FK relationships.

However, I agree that my approach is flawed, outweighing the reasons to use the presented idea in the first place. If I understand correctly, a simple way to fix the design is to have a regular, autoincremented PK column in combination with the known (and renamed) timestamp column, which will be clustered.

From a performance POV, I don't see how this could be worse than the initial approach. It also simplifies the code a lot.


This method is a prescription for disaster. In the first place you will have race conditions which will cause user annoyance when their insert won't work. Even worse, if you are adding to another table using that value as the foreign key and the whole thing is not in one transaction, you may be adding child data to the wrong record.

Further, looking for the lowest unused value is a recipe for further data integrity messes if you have not properly set up foreign key relationships and deleted a record without getting all of it's child records. Now you just joined to records which don;t belong with the new record.

This manual method is flawed and unreliable. All the major databases have a way to create an autogenerated value. Use that instead, the problems have been worked out and tested.

Timestamp BTW is a SQL server reserved word and should never be used as a fieldname.


If you can't guaranteed that your PK values are unique, then it's not a good PK candidate. Especially if it's a timestamp - I'm sure Goldman Sachs would love it if their high-frequency trading programs could cause collisions on an insert and get inserted 1 microsecond earlier because the system fiddles the timestamp of their trade.

Since you can't guarantee uniqueness of the timestamps, a better choice would be to use a plain-jane auto-increment int/bigint column, which takes care of the collision problem, gives you a nice method of getting insertion order, and you can still sort on the timestamp field to get a nice straight timeline if need be.


One idea would be to add a surrogate identity/autonumber/sequence key, so the primary key becomes (timestamp, newkey).

This way, you preserve row order and uniqueness without code

To run the code above, you'd need to fiddle with lock granularity and concurrency hints in the code above, or TRY/CATCH to retry with the alternate value (SQL Server). This removes portability. However, under heavy load you'd have to keep retrying because the alternate value may already exist.


A Timestamp as a key? Really? Every time a row is updated, its timestamp is modified. The SQL Server timestamp data type is intended for use in versioning rows. It is not the same as the ANSI/ISO SQL timestamp — that is the equivalent of SQL Server's datetime data type.

As far as "sorting" on a timestamp column goes: the only thing that guaranteed with a timestamp is that every time a row is inserted or updated it gets a new timestamp value and that value is a unique 8-octet binary value, different from the previous value assigned to the row, if any. There is no guarantee that that value has any correlation to the system clock.

0

精彩评论

暂无评论...
验证码 换一张
取 消