IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

How to delete duplicate records from a table using sql query


IBM Mainframe Forums -> DB2
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
rahuindo

New User


Joined: 09 Apr 2008
Posts: 83
Location: Chennai

PostPosted: Sat May 03, 2008 5:06 pm
Reply with quote

Hi,
I have duplicate records in a table and i want to retain the latest record and delete all the older duplicate records. Can i achieve it through a single sql query??
Thanks in advance..
Back to top
View user's profile Send private message
Craq Giegerich

Senior Member


Joined: 19 May 2007
Posts: 1512
Location: Virginia, USA

PostPosted: Sat May 03, 2008 6:26 pm
Reply with quote

What defines duplicate and what defines latest?
Back to top
View user's profile Send private message
Help-Me-Out

New User


Joined: 09 Dec 2006
Posts: 56
Location: Pune

PostPosted: Mon May 05, 2008 4:56 pm
Reply with quote

Hi,
try this

DELETE FROM tab_EMP
where EMP_ID IN (SELECT EMP_ID FROM TAB_EMP
GROUP BY EMP_ID
HAVING COUNT(*) > 1)
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Mon May 05, 2008 5:02 pm
Reply with quote

help-me-out,

Have you tested your SQL?

looks like it will delete any emp_id that exists more that once in the table.

The OP wants to retain one of the dups, apparently one with the oldest 'date'.

and since the OP has not responed since saturday, have to assume he is as bored with this question as i am.
Back to top
View user's profile Send private message
rahuindo

New User


Joined: 09 Apr 2008
Posts: 83
Location: Chennai

PostPosted: Tue May 06, 2008 8:46 am
Reply with quote

Hi All,
Thanks for your replies..i was out of town and hence sorry for the late replies.
The oldest record means the records which were the first one created. Later on, due to some problems in the application, duplicate records got inserted and hence these are the latest records.
help-me-out..the query will delete all the records which i do not want!!
i think the only way out would be to write a small program to delete these duplicate records. I think dbzTHEdinosauer and Craq Giegerich will agree..whats your take??
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Tue May 06, 2008 8:58 am
Reply with quote

Hello rahuindo,

I may be mis-reading your posts, but i read from the first post as wanting to "retain the latest" and from the last post, it is the later rows that are duplicates and should be removed.

In neither case will the sql posted by Sandy come even close. As has been noted, all duplicates would be deleted and there is no menton of a date consideration in that code. I suppose if all of them were deleted, there would be no need to check the date. . .

I'm probably old-fashioned (i really like having a backup before i do surgery on some data), but i'd probably unload the table (which would be my backup), sort the records in the proper sequence (id/date), and then discard the rows i no longer needed creating a new file i'd use to reload the table.
Back to top
View user's profile Send private message
rahuindo

New User


Joined: 09 Apr 2008
Posts: 83
Location: Chennai

PostPosted: Tue May 06, 2008 9:59 am
Reply with quote

Hi dick..
oops.i wrote the opposite..actually i want to retain the latest records only..i have a small doubt to the solution you have provided. after sorting the records in the proper sequence, how do i delete the rows i no longer need..will it be manual because there is no sync between the dates of the insertion of the duplicate records and the original records. and then there are more than thousand records having a duplicate records.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Tue May 06, 2008 10:58 am
Reply with quote

Hello rahuindo,

Quote:
will it be manual
No - that would take too long and be too error-prone icon_wink.gif

Quote:
i have a small doubt to the solution you have provided. after sorting the records in the proper sequence, how do i delete the rows i no longer need
Once the records are sorted by "the id" (ascending) and "the date" descending, you could probaly use your sort product or a bit of cobol to code to read the sorted data and discard all but the first record for each id. The resulting output from that step will be the ones you want to reload.

Quote:
because there is no sync between the dates of the insertion of the duplicate records and the original records
This should not be an issue (if i understand the requirement correctly) because what will retained is the "single, newest" records that meets your rule. All of the older duplicates will be discarded.
Back to top
View user's profile Send private message
rahuindo

New User


Joined: 09 Apr 2008
Posts: 83
Location: Chennai

PostPosted: Tue May 06, 2008 1:04 pm
Reply with quote

thanks dick..i think this will be the best option..
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Tue May 06, 2008 1:39 pm
Reply with quote

You're welcome - good luck icon_smile.gif
Back to top
View user's profile Send private message
sainathvinod

New User


Joined: 01 Apr 2008
Posts: 11
Location: Chennai

PostPosted: Tue May 06, 2008 2:40 pm
Reply with quote

If you are particular about doing it in a querry........ just try this out.................its working 4 me:-

DELETE FROM TAB_EMP
WHERE EMP_ID IN
(SELECT T1.EMP_ID FROM TAB_EMP T1
WHERE T1.DATE <
(
SELECT MAX(T2.DATE) FROM TAB_EMP T2
WHERE T1.EMP_ID = T2.EMP_ID
)
GROUP BY T1.EMP_ID
HAVING COUNT(*) >= 1)
Back to top
View user's profile Send private message
rahuindo

New User


Joined: 09 Apr 2008
Posts: 83
Location: Chennai

PostPosted: Tue May 06, 2008 2:51 pm
Reply with quote

Hey Sai..thanks i will give a try..this is another option which i can suggest.
Thanks a lot to all of you for all your support.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Wed May 07, 2008 5:16 am
Reply with quote

Hello,

No matter how you delete the unneeded rows, you still need a backup.

An audit trail of what you deleted would be appropriate also.
Back to top
View user's profile Send private message
rahuindo

New User


Joined: 09 Apr 2008
Posts: 83
Location: Chennai

PostPosted: Fri May 09, 2008 11:32 am
Reply with quote

Hi all,
I deleted the records with the following query:
DELETE FROM empl A
WHERE A.empl_id IN
( SELECT B.empl_id FROM empl B
GROUP BY B.empl_id
HAVING COUNT(B.empl_id) > 1 )
AND A.CRT_TS <
( SELECT MAX(C.CRT_TS) FROM empl C
WHERE C.empl= A.empl) ;
Before deleting i unloaded the table and kept a backup. The query deleted all the un-needed records.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Sat May 10, 2008 4:52 am
Reply with quote

Thank you for posting your solution icon_smile.gif

d
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DB2

 


Similar Topics
Topic Forum Replies
No new posts Need help with ADABAS query (COBOL-AD... All Other Mainframe Topics 0
No new posts combine multiple unique records into ... DFSORT/ICETOOL 2
No new posts To Omit records based n SORT condition DFSORT/ICETOOL 6
No new posts Unload and Load ISPF Table TSO/ISPF 4
No new posts ISPF Table to add a new column TSO/ISPF 1
Search our Forums:

Back to Top