I already have 80 million records inserted into a table, but need to ensure a few columns are jointly unique. However, the columns already contain non-unique data, so ALTER TABLE doesn't work.
I'd like either a query that will let me easily delete records that are non-unique, while keeping one of them, or one that will allow me to load the data from t开发者_如何学Che current table into a new one, while filtering for uniqueness.
The query you're looking for is:
select distinct on (my_unique_1, my_unique_2) * from my_table;
This selects one row for each combination of columns within distinct on
. Actually, it's always the first row. It's rarely used without order by
since there is no reliable order in which the rows are returned (and so which is the first one).
Combined with order by
you can choose which rows are the first (this leaves rows with the greatest last_update_date):
select distinct on (my_unique_1, my_unique_2) *
from my_table order by my_unique_1, my_unique_2, last_update_date desc;
Now you can select this into a new table:
create table my_new_table as
select distinct on (my_unique_1, my_unique_2) *
from my_table order by my_unique_1, my_unique_2, last_update_date desc;
Or you can use it for delete, assuming row_id
is a primary key:
delete from my_table where row_id not in (
select distinct on (my_unique_1, my_unique_2) row_id
from my_table order by my_unique_1, my_unique_2, last_update_date desc);
精彩评论