Enforcing uniqueness on PostgreSQL table column after non-unique values already inserted_问答_开发者

Enforcing uniqueness on PostgreSQL table column after non-unique values already inserted

开发者 https://www.devze.com 2023-01-08 08:46 出处：网络

I already have 80 million records inserted into a table, but need to ensure a few columns are jointly unique. However, the columns already contain non-unique data, so ALTER TABLE doesn\'t work.

I already have 80 million records inserted into a table, but need to ensure a few columns are jointly unique. However, the columns already contain non-unique data, so ALTER TABLE doesn't work.

I'd like either a query that will let me easily delete records that are non-unique, while keeping one of them, or one that will allow me to load the data from t开发者_如何学Che current table into a new one, while filtering for uniqueness.

The query you're looking for is:

select distinct on (my_unique_1, my_unique_2) * from my_table;

This selects one row for each combination of columns within distinct on. Actually, it's always the first row. It's rarely used without order by since there is no reliable order in which the rows are returned (and so which is the first one).

Combined with order by you can choose which rows are the first (this leaves rows with the greatest last_update_date):

 select distinct on (my_unique_1, my_unique_2) * 
 from my_table order by my_unique_1, my_unique_2, last_update_date desc;

Now you can select this into a new table:

 create table my_new_table as
 select distinct on (my_unique_1, my_unique_2) * 
 from my_table order by my_unique_1, my_unique_2, last_update_date desc;

Or you can use it for delete, assuming row_id is a primary key:

 delete from my_table where row_id not in (
     select distinct on (my_unique_1, my_unique_2) row_id 
     from my_table order by my_unique_1, my_unique_2, last_update_date desc);