I am trying to write a query to removed duplicates records from the following table (valid_columns) and keep only the records with the lowest possible [order] number.
For example in the following table I would like to remove duplicate rows, region 2,3 and job 3 and keep the records with the lowest possible [order].
E.g. The input table, valid_columns looks like this:
name col_order
-------------
job 1
job 3
status 2
cust 2
county 1
state 1
region 1
region 2
region 3
so 4
Desired Output:
name col_order
-------------
job 1
status 2
cust 2
county 1
state 1
region 1
so 4
I am trying to fix a bug and I can't figure out the SQL. Currently it uses a delete statment and a subquery. The query used at the moment looks like this:
-- 3) Remove duplicated columns
开发者_StackOverflow中文版DELETE
FROM valid_columns
WHERE NOT ( col_order = ( SELECT TOP 1 col_order
FROM valid_columns firstValid
WHERE name = firstValid.name
AND col_order = firstValid.col_order
ORDER BY col_order ASC ))
However, this only returns the following, which is incorrect:
name col_order
-------------
job 1
county 1
state 1
region 1
Many thanks
DELETE FROM t1
FROM valid_columns t1
WHERE col_order >
(SELECT MIN(col_order) from valid_columns t2 WHERE t1.name = t2.name)
EDIT: can be simplified to this:
DELETE FROM valid_columns
WHERE col_order >
(SELECT MIN(col_order) from valid_columns t2 WHERE valid_columns.name = t2.name)
The DELETE statement can have a FROM clause to delete a record based on the value of a related record in a second table. In this case the FROM is not really required (I sometimes use the FROM to alias the table name because I don't like the extra typing.)
DELETE FROM TableA
FROM TableA
JOIN TableB On TableA.CriteriaA = TableB.CriteriaA
You could also try this example (might be faster if you have to do this a lot):
DELETE FROM valid_columns
WHERE EXISTS
(SELECT * FROM valid_columns t1
WHERE t1.name = valid_columns.name AND valid_columns.col_order > t1.col_order);
-- Test table
declare @T table(Name varchar(10), col_order int)
-- Sample data
insert into @T
select 'job', 1 union all
select 'job', 3 union all
select 'status', 2 union all
select 'cust', 2 union all
select 'county', 1 union all
select 'state', 1 union all
select 'region', 1 union all
select 'region', 2 union all
select 'region', 3 union all
select 'so', 4
-- Delete using CTE and row_number()
;with cte as
(
select row_number() over(partition by Name order by col_order) as rn
from @T
)
delete from cte
where rn > 1
-- Result
select *
from @T
Or with a sub query instead of CTE
delete vc
from (select row_number() over(partition by Name order by col_order) as rn
from valid_columns) as vc
where vc.rn > 1
Try this (you could replace the delete with a select to make sure you get the right results before you delete).
DELETE FROM [valid_columns] t1
WHERE col_order > (SELECT MIN(col_order) from [valid_columns] t2
WHERE t1.name = t2.name)
This should do what you need:
DELETE FROM valid_columns a
WHERE (SELECT MAX(col_order)
FROM valid_columns b
WHERE a.name = b.name) > a.col_order;
I recommend you make a backup of the data before testing it, though.
Delete record with binary checksum (this is working in any sql server version)
CREATE TABLE #t1(ID INT NULL, VALUE VARCHAR(2))
INSERT INTO #t1(ID, VALUE) VALUES (1,'aa')
INSERT INTO #t1(ID, VALUE) VALUES (2,'bb')
INSERT INTO #t1(ID, VALUE) VALUES (1,'aa')
INSERT INTO #t1(ID, VALUE) VALUES (1,'aa')
INSERT INTO #t1(ID, VALUE) VALUES (3,'cc')
INSERT INTO #t1(ID, VALUE) VALUES (3,'cc')
GO
-- BINARY_CHECKSUM(): are columns that we want to compare duplicates for
-- if you want to compare the full row then change BINARY_CHECKSUM() -> BINARY_CHECKSUM(*)
-- for SQL Server 2000+ a loop
-- save checksums and rowcounts for duplicates
SELECT BINARY_CHECKSUM(ID, VALUE) AS ChkSum, COUNT(*) AS Cnt
INTO #t2
FROM #t1
GROUP BY BINARY_CHECKSUM(ID, VALUE) HAVING COUNT(*)>1
DECLARE @ChkSum BIGINT, @rc INT
-- get the first checksum and set the rowcount to the count - 1
-- because we want to leave one duplicate
SELECT TOP 1 @ChkSum = ChkSum, @rc = Cnt-1 FROM #t2
WHILE EXISTS (SELECT * FROM #t2)
BEGIN
-- rowcount is one less than the duplicate rows count
SET ROWCOUNT @rc
DELETE FROM #t1 WHERE BINARY_CHECKSUM(ID, VALUE) = @ChkSum
-- remove the processed duplicate from the checksum table
DELETE #t2 WHERE ChkSum = @ChkSum
-- select the next duplicate rows to delete
SELECT TOP 1 @ChkSum = ChkSum, @rc = Cnt-1 FROM #t2
END
SET ROWCOUNT 0
GO
SELECT * FROM #t1
-- for SQL Server 2005+ a cool CTE
;WITH Numbered
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ChkSum ORDER BY ChkSum) AS RN, *
FROM (
SELECT BINARY_CHECKSUM(ID, VALUE) AS ChkSum
FROM #t1
) t
)
DELETE FROM Numbered WHERE RN > 1;
GO
SELECT * FROM #t1
DROP TABLE #t1;
DROP TABLE #t2;
Or you could iterate through the table using a cursor and insert the first value encountered for an item in a temp table (make sure that the temp table has an unique constraint specified for the name column).
Edit: I've included a code snippet for convenience...
declare @Ti table(name varchar(10), col_order int);
declare @Tf table(name varchar(10) unique not null, col_order int not null);
declare @name varchar(10);
declare @col_order int;
-- Sample data
insert into @Ti
select 'job', 1 union all
select 'job', 3 union all
select 'status', 2 union all
select 'cust', 2 union all
select 'county', 1 union all
select 'state', 1 union all
select 'region', 1 union all
select 'region', 2 union all
select 'region', 3 union all
select 'so', 4
select * from @Ti
declare i cursor for
select * from @Ti;
open i;
fetch next from i into @name, @col_order;
while @@FETCH_STATUS = 0
begin
if not exists( select * from @Tf where name = @name )
begin
insert into @Tf(name, col_order)
select @name, @col_order;
end
fetch next from i into @name, @col_order;
end
close i;
deallocate i;
select * from @Tf;
加载中,请稍侯......
精彩评论