View previous topic :: View next topic
|
Author |
Message |
rahuindo
New User
Joined: 09 Apr 2008 Posts: 83 Location: Chennai
|
|
|
|
Hi,
I have duplicate records in a table and i want to retain the latest record and delete all the older duplicate records. Can i achieve it through a single sql query??
Thanks in advance.. |
|
Back to top |
|
|
Craq Giegerich
Senior Member
Joined: 19 May 2007 Posts: 1512 Location: Virginia, USA
|
|
|
|
What defines duplicate and what defines latest? |
|
Back to top |
|
|
Help-Me-Out
New User
Joined: 09 Dec 2006 Posts: 56 Location: Pune
|
|
|
|
Hi,
try this
DELETE FROM tab_EMP
where EMP_ID IN (SELECT EMP_ID FROM TAB_EMP
GROUP BY EMP_ID
HAVING COUNT(*) > 1) |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
help-me-out,
Have you tested your SQL?
looks like it will delete any emp_id that exists more that once in the table.
The OP wants to retain one of the dups, apparently one with the oldest 'date'.
and since the OP has not responed since saturday, have to assume he is as bored with this question as i am. |
|
Back to top |
|
|
rahuindo
New User
Joined: 09 Apr 2008 Posts: 83 Location: Chennai
|
|
|
|
Hi All,
Thanks for your replies..i was out of town and hence sorry for the late replies.
The oldest record means the records which were the first one created. Later on, due to some problems in the application, duplicate records got inserted and hence these are the latest records.
help-me-out..the query will delete all the records which i do not want!!
i think the only way out would be to write a small program to delete these duplicate records. I think dbzTHEdinosauer and Craq Giegerich will agree..whats your take?? |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello rahuindo,
I may be mis-reading your posts, but i read from the first post as wanting to "retain the latest" and from the last post, it is the later rows that are duplicates and should be removed.
In neither case will the sql posted by Sandy come even close. As has been noted, all duplicates would be deleted and there is no menton of a date consideration in that code. I suppose if all of them were deleted, there would be no need to check the date. . .
I'm probably old-fashioned (i really like having a backup before i do surgery on some data), but i'd probably unload the table (which would be my backup), sort the records in the proper sequence (id/date), and then discard the rows i no longer needed creating a new file i'd use to reload the table. |
|
Back to top |
|
|
rahuindo
New User
Joined: 09 Apr 2008 Posts: 83 Location: Chennai
|
|
|
|
Hi dick..
oops.i wrote the opposite..actually i want to retain the latest records only..i have a small doubt to the solution you have provided. after sorting the records in the proper sequence, how do i delete the rows i no longer need..will it be manual because there is no sync between the dates of the insertion of the duplicate records and the original records. and then there are more than thousand records having a duplicate records. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello rahuindo,
No - that would take too long and be too error-prone
Quote: |
i have a small doubt to the solution you have provided. after sorting the records in the proper sequence, how do i delete the rows i no longer need |
Once the records are sorted by "the id" (ascending) and "the date" descending, you could probaly use your sort product or a bit of cobol to code to read the sorted data and discard all but the first record for each id. The resulting output from that step will be the ones you want to reload.
Quote: |
because there is no sync between the dates of the insertion of the duplicate records and the original records |
This should not be an issue (if i understand the requirement correctly) because what will retained is the "single, newest" records that meets your rule. All of the older duplicates will be discarded. |
|
Back to top |
|
|
rahuindo
New User
Joined: 09 Apr 2008 Posts: 83 Location: Chennai
|
|
|
|
thanks dick..i think this will be the best option.. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
You're welcome - good luck |
|
Back to top |
|
|
sainathvinod
New User
Joined: 01 Apr 2008 Posts: 11 Location: Chennai
|
|
|
|
If you are particular about doing it in a querry........ just try this out.................its working 4 me:-
DELETE FROM TAB_EMP
WHERE EMP_ID IN
(SELECT T1.EMP_ID FROM TAB_EMP T1
WHERE T1.DATE <
(
SELECT MAX(T2.DATE) FROM TAB_EMP T2
WHERE T1.EMP_ID = T2.EMP_ID
)
GROUP BY T1.EMP_ID
HAVING COUNT(*) >= 1) |
|
Back to top |
|
|
rahuindo
New User
Joined: 09 Apr 2008 Posts: 83 Location: Chennai
|
|
|
|
Hey Sai..thanks i will give a try..this is another option which i can suggest.
Thanks a lot to all of you for all your support. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
No matter how you delete the unneeded rows, you still need a backup.
An audit trail of what you deleted would be appropriate also. |
|
Back to top |
|
|
rahuindo
New User
Joined: 09 Apr 2008 Posts: 83 Location: Chennai
|
|
|
|
Hi all,
I deleted the records with the following query:
DELETE FROM empl A
WHERE A.empl_id IN
( SELECT B.empl_id FROM empl B
GROUP BY B.empl_id
HAVING COUNT(B.empl_id) > 1 )
AND A.CRT_TS <
( SELECT MAX(C.CRT_TS) FROM empl C
WHERE C.empl= A.empl) ;
Before deleting i unloaded the table and kept a backup. The query deleted all the un-needed records. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Thank you for posting your solution
d |
|
Back to top |
|
|
|