How to properly reserve identity values for usage in a database?_问答_开发者

We have some code in which we need to maintain our own identity (PK) column in SQL. We have a table in which we bulk insert data, but we add data to related tables before the bulk insert is done, thus we can no开发者_StackOverflow中文版t use an IDENTITY column and find out the value up front.

The current code is selecting the MAX value of the field and incrementing it by 1. Although there is a highly unlikely chance that two instances of our application will be running at the same time, it is still not thread-safe (not to mention that it goes to the database everytime).

I am using the ADO.net entity model. How would I go about 'reserving' a range of id's to use, and when that range runs out, grab a new block to use, and guarantee that the same range will not be used.

use more universal unique identifier data type like UNIQUEIDENTIFIER (UUID) instead of INTEGER. In this case you can basically create it on the client side, pass it to the SQL and do not have to worry about it. The disadvantage is that, of course, the size of this field.
create a simple table in the database CREATE TABLE ID_GEN (ID INTEGER IDENTITY), and use this as a factory to give you the identifiers. Ideally you would create a stored procedure (or function), to which you would pass the number of identifiers you need. The stored procedure will then insert this number of rows (empty) into this ID_GEN table and will return you all new ID's, which you can use in your code. Obviously, your original tables will not have the IDENTITY anymore.
create your own variation of the ID_Factory above.

I would choose simplicity (UUID) if you are not constrained otherwise.

If it's viable to change the structure of the table, then perhaps use a uniqueidentifier for the PK instead along with newid() [SQL] or Guid.NewGuid() [C#] in your row generation code.

From Guid.NewGuid() doco:

There is a very low probability that the value of the new Guid is all zeroes or equal to any other Guid.

Why are you using ADO.net Entity Framework to do what sounds like ETL work? (See critique of ADO.NET Entity Framework and ORM in general below. It is rant free).

Why use ints at all? Using a uniqueidentifier would solve the "multiple instances of the application running" issue.

Using a uniqueidentifier as a column default will be slower than using an int IDENTITY... it takes more time to generate a guid than an int. A guid will also be larger (16 byte) than an int (4 bytes). Try this first and if it results in acceptable performance, run with it.

If the delay introduced by generating a guid on each row insert it unacceptable, create guids in bulk (or on another server) and cache them in a table.

Sample TSQL code:

CREATE TABLE testinsert
 (
  date_generated datetime   NOT NULL DEFAULT GETDATE(), 
  guid   uniqueidentifier NOT NULL, 
  TheValue  nvarchar(255)  NULL
 )
GO

CREATE TABLE guids 
 (
  guid   uniqueidentifier NOT NULL DEFAULT newid(), 
  used   bit     NOT NULL DEFAULT 0, 
  date_generated datetime   NOT NULL DEFAULT GETDATE(), 
  date_used  datetime   NULL
 )
GO

CREATE PROCEDURE GetGuid
 @guid uniqueidentifier OUTPUT
AS
BEGIN
 SET NOCOUNT ON
 DECLARE @return int = 0

 BEGIN TRY
  BEGIN TRANSACTION
   SELECT TOP 1 @guid = guid FROM guids WHERE used = 0

   IF @guid IS NOT NULL
    UPDATE guids
    SET 
     used = 1, 
     date_used = GETDATE()
    WHERE guid = @guid
   ELSE
    BEGIN
     SET @return = -1
     PRINT 'GetGuid Error: No Unused guids are available'
    END
  COMMIT TRANSACTION
 END TRY

 BEGIN CATCH
  SET @return = ERROR_NUMBER() -- some error occurred
  SET @guid = NULL
  PRINT 'GetGuid Error: ' + CAST(ERROR_NUMBER() as varchar) + CHAR(13) + CHAR(10) + ERROR_MESSAGE()
  ROLLBACK
 END CATCH

 RETURN @return
END
GO

CREATE PROCEDURE InsertIntoTestInsert
 @TheValue nvarchar(255)
AS
 BEGIN
  SET NOCOUNT ON
  DECLARE @return int = 0

  DECLARE @guid uniqueidentifier
  DECLARE @getguid_return int

  EXEC @getguid_return = GetGuid @guid OUTPUT

  IF @getguid_return = 0 
   BEGIN
    INSERT INTO testinsert(guid, TheValue) VALUES (@guid, @TheValue)
   END
  ELSE
   SET @return = -1

  RETURN @return
 END
GO

-- generate the guids
INSERT INTO guids(used) VALUES (0)
INSERT INTO guids(used) VALUES (0)

--Insert data through the stored proc
EXEC InsertIntoTestInsert N'Foo 1'
EXEC InsertIntoTestInsert N'Foo 2'
EXEC InsertIntoTestInsert N'Foo 3' -- will fail, only two guids were created

-- look at the inserted data
SELECT * FROM testinsert

-- look at the guids table
SELECT * FROM guids

The fun question is... how do you map this to ADO.Net's Entity Framework?

This is a classic problem that started in the early days of ORM (Object Relational Mapping).

If you use relational-database best practices (never allow direct access to base tables, only allow data manipulation through views and stored procedures), then you add headcount (someone capable and willing to write not only the database schema, but also all the views and stored procedures that form the API) and introduce delay (the time to actually write this stuff) to the project.

So everyone cuts this and people write queries directly against a normalized database, which they don't understand... thus the need for ORM, in this case, the ADO.NET Entity Framework.

ORM scares the heck out of me. I've seen ORM tools generate horribly inefficient queries which bring otherwise performant database servers to their knees. What was gained in programmer productivity was lost in end-user waiting and DBA frustration.

The Hi/Lo algorithm may be of interest to you:

What's the Hi/Lo algorithm?

Two clients could reserve the same block of id's.

There is no solution short of serializing your inserts by locking.

See Locking Hints in MSDN.

I fyou have a lot of child tables you might not want to change the PK. PLus the integer filedsa relikely to perform better in joins. But you could still add a GUID field and populate it in the bulk insert with pre-generated values. Then you could leave the identity insert alone (almost alawys a bad idea to turn it off) and use the GUID values you pre-generated to get back the Identity values you just inserted for the insert into child tables.

If you use a regular set-based insert (one with the select clause instead of the values clause) instead of a bulk insert, you could use the output clause to get the identities back for the rows if you are using SQL Server 2008.

The most general solution is generate client identifiers that never across with database identifiers - usually it is negative values, then update identifiers with identifier generated by database on inserting.

This way is safe to use in application with many users inserts the data simultaneously. Any other ways except GUIDs are not multiuser-safe.

But if you have that rare case when entity's primary key is required to be known before entity is saved to database, and it is impossible to use GUID, you may use identifier generation algorithm which are prevent identifier overlapping. The most simple is assigning a unique identifier prefix for each connected client, and prepend it to each identifier generated by this client.

If you are using ADO.NET Entity Framework, you probably should not worry about identifier generation: EF generates identifiers by itself, just mark primary key of the entity as IsDbGenerated=true.

Strictly saying, entity framework as other ORM does not require identifier for objects are not saved to database yet, it is enought object reference for correctly operating with new entities. Actual primary key value is required only on updating/deleting entity, and on updating/deleting/inserting entity that references new entity, e.i. in cases when actual primary key value is about to be written in database. If entity is new, it is impossible to save other entites that are referenced new entity until new entity is not saved to database, and ORMs maintains specific order of entities saving which take references map into account.