How to delete duplicate records from a table using sql query

rahuindo · New User Joined: 09 Apr 2008 Posts: 83 Location: Chennai

Hi,
I have duplicate records in a table and i want to retain the latest record and delete all the older duplicate records. Can i achieve it through a single sql query??
Thanks in advance..

Craq Giegerich · Posted: Sat May 03, 2008 6:26 pm

What defines duplicate and what defines latest?

Help-Me-Out · New User Joined: 09 Dec 2006 Posts: 56 Location: Pune

Hi,
try this

DELETE FROM tab_EMP
where EMP_ID IN (SELECT EMP_ID FROM TAB_EMP
GROUP BY EMP_ID
HAVING COUNT(*) > 1)

dbzTHEdinosauer · Posted: Mon May 05, 2008 5:02 pm

help-me-out,

Have you tested your SQL?

looks like it will delete any emp_id that exists more that once in the table.

The OP wants to retain one of the dups, apparently one with the oldest 'date'.

and since the OP has not responed since saturday, have to assume he is as bored with this question as i am.

rahuindo · New User Joined: 09 Apr 2008 Posts: 83 Location: Chennai

Hi All,
Thanks for your replies..i was out of town and hence sorry for the late replies.
The oldest record means the records which were the first one created. Later on, due to some problems in the application, duplicate records got inserted and hence these are the latest records.
help-me-out..the query will delete all the records which i do not want!!
i think the only way out would be to write a small program to delete these duplicate records. I think dbzTHEdinosauer and Craq Giegerich will agree..whats your take??

dick scherrer · Posted: Tue May 06, 2008 8:58 am

Hello rahuindo,

I may be mis-reading your posts, but i read from the first post as wanting to "retain the latest" and from the last post, it is the later rows that are duplicates and should be removed.

In neither case will the sql posted by Sandy come even close. As has been noted, all duplicates would be deleted and there is no menton of a date consideration in that code. I suppose if all of them were deleted, there would be no need to check the date. . .

I'm probably old-fashioned (i really like having a backup before i do surgery on some data), but i'd probably unload the table (which would be my backup), sort the records in the proper sequence (id/date), and then discard the rows i no longer needed creating a new file i'd use to reload the table.

rahuindo · New User Joined: 09 Apr 2008 Posts: 83 Location: Chennai

Hi dick..
oops.i wrote the opposite..actually i want to retain the latest records only..i have a small doubt to the solution you have provided. after sorting the records in the proper sequence, how do i delete the rows i no longer need..will it be manual because there is no sync between the dates of the insertion of the duplicate records and the original records. and then there are more than thousand records having a duplicate records.

dick scherrer · Posted: Tue May 06, 2008 10:58 am

Hello rahuindo,

rahuindo · New User Joined: 09 Apr 2008 Posts: 83 Location: Chennai

thanks dick..i think this will be the best option..

dick scherrer · Posted: Tue May 06, 2008 1:39 pm

You're welcome - good luck

sainathvinod · New User Joined: 01 Apr 2008 Posts: 11 Location: Chennai

If you are particular about doing it in a querry........ just try this out.................its working 4 me:-

DELETE FROM TAB_EMP
WHERE EMP_ID IN
(SELECT T1.EMP_ID FROM TAB_EMP T1
WHERE T1.DATE <
(
SELECT MAX(T2.DATE) FROM TAB_EMP T2
WHERE T1.EMP_ID = T2.EMP_ID
)
GROUP BY T1.EMP_ID
HAVING COUNT(*) >= 1)

rahuindo · New User Joined: 09 Apr 2008 Posts: 83 Location: Chennai

Hey Sai..thanks i will give a try..this is another option which i can suggest.
Thanks a lot to all of you for all your support.

dick scherrer · Posted: Wed May 07, 2008 5:16 am

Hello,

No matter how you delete the unneeded rows, you still need a backup.

An audit trail of what you deleted would be appropriate also.

rahuindo · New User Joined: 09 Apr 2008 Posts: 83 Location: Chennai

Hi all,
I deleted the records with the following query:
DELETE FROM empl A
WHERE A.empl_id IN
( SELECT B.empl_id FROM empl B
GROUP BY B.empl_id
HAVING COUNT(B.empl_id) > 1 )
AND A.CRT_TS <
( SELECT MAX(C.CRT_TS) FROM empl C
WHERE C.empl= A.empl) ;
Before deleting i unloaded the table and kept a backup. The query deleted all the un-needed records.

dick scherrer · Posted: Sat May 10, 2008 4:52 am

Thank you for posting your solution

d