开发者

Is COMB GUID a good idea with Rails 3.1 if I use GUIDs for primary keys?

开发者 https://www.devze.com 2023-04-12 20:17 出处:网络
I\'m using Rails 3.1 with PostgreSQL 8.4. Let\'s assume I want/need to use GUID primary keys. One potential drawback is index fragmentation. In MS SQL, a recommended solution for that is to use specia

I'm using Rails 3.1 with PostgreSQL 8.4. Let's assume I want/need to use GUID primary keys. One potential drawback is index fragmentation. In MS SQL, a recommended solution for that is to use special sequential GUIDs. One approach to sequential GUIDs is the COMBination GUID that substitutes a 6-byte timestamp for the MAC address portion at the end of the GUID. This has some mainstream adoption: COMBs are available natively in NHibernate (NHibernate/Id/GuidCombGenerator.cs).

I think I've figured out how to create COMB GUIDs in Rails (with the help of the UUIDTools 2.1.2 gem), but it leaves some unanswered qu开发者_如何学Cestions:

  • Does PostgreSQL suffer from index fragmentation when the PRIMARY KEY is type UUID?
  • Is fragmentation avoided if the low-order 6 bytes of the GUID are sequential?
  • Is the COMB GUID as implemented below an acceptable, reliable way to create sequential GUIDs in Rails?

Thanks for your thoughts.


create_contacts.rb migration

class CreateContacts < ActiveRecord::Migration
  def up
    create_table :contacts, :id => false do |t|
      t.column :id, :uuid, :null => false # manually create :id with underlying DB type UUID
      t.string :first_name
      t.string :last_name
      t.string :email

      t.timestamps
    end
    execute "ALTER TABLE contacts ADD PRIMARY KEY (id);"
  end

    # Can't use reversible migration because it will try to run 'execute' again
  def down
    drop_table :contacts # also drops primary key
  end
end

/app/models/contact.rb

class Contact < ActiveRecord::Base
  require 'uuid_helper' #rails 3 does not autoload from lib/*
  include UUIDHelper

  set_primary_key :id
end

/lib/uuid_tools.rb

require 'uuidtools'

module UUIDHelper
  def self.included(base)
    base.class_eval do
      include InstanceMethods
      attr_readonly :id       # writable only on a new record
      before_create :set_uuid
    end
  end

  module InstanceMethods
  private
    def set_uuid
      # MS SQL syntax:  CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)

      # Get current Time object
      utc_timestamp = Time.now.utc

      # Convert to integer with milliseconds:  (Seconds since Epoch * 1000) + (6-digit microsecond fraction / 1000)
      utc_timestamp_with_ms_int = (utc_timestamp.tv_sec * 1000) + (utc_timestamp.tv_usec / 1000)

      # Format as hex, minimum of 12 digits, with leading zero.  Note that 12 hex digits handles to year 10889 (*).
      utc_timestamp_with_ms_hexstring = "%012x" % utc_timestamp_with_ms_int

      # If we supply UUIDTOOLS with a MAC address, it will use that rather than retrieving from system.
      # Use a regular expression to split into array, then insert ":" characters so it "looks" like a MAC address.
      UUIDTools::UUID.mac_address = (utc_timestamp_with_ms_hexstring.scan /.{2}/).join(":")

      # Generate Version 1 UUID (see RFC 4122).
      comb_guid = UUIDTools::UUID.timestamp_create().to_s 

      # Assign generted COMBination GUID to .id
      self.id = comb_guid

      # (*) A note on maximum time handled by 6-byte timestamp that includes milliseconds:
      # If utc_timestamp_with_ms_hexstring = "FFFFFFFFFFFF" (12 F's), then 
      # Time.at(Float(utc_timestamp_with_ms_hexstring.hex)/1000).utc.iso8601(10) = "10889-08-02T05:31:50.6550292968Z".
    end
  end
end


  • Does PostgreSQL suffer from index fragmentation when the PRIMARY KEY is type UUID?

Yes, it's to be expected. But if you're going to use the COMB strategy that won't happen. The rows will be always in order(that's not entirely true, but bear with me).

Also, the performance between native pgsql UUID vs VARCHAR is not all that different. Another point to consider.

  • Is fragmentation avoided if the low-order 6 bytes of the GUID are sequential?

In my test I've found that UUID1(RFC 4122) is sequential, there's already a timestamp added in the generated uuid. But yes, adding a timestamp in the last 6 bytes will reassure that ordering. That's what I did anyway, because apparently the timestamp already present is not guarantee of order. More about COMB here

  • Is the COMB GUID as implemented below an acceptable, reliable way to create sequential GUIDs in Rails?

I'm not using rails, but I'll show you how I did it in django:

import uuid, time

def uuid1_comb(obj):
    return uuid.uuid1(node=int(time.time() * 1000))

Where node is a 48-bit positive integer identifying the hardware address.

About your implementation, one of the main advantages of using uuid's is that you can safely generate them outside the database, so, using a helper class is one valid way to do it. You can always use an external service for uuid generation like snowflake, but it may be premature optimizacion at this point.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号