IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Deleting duplicates


IBM Mainframe Forums -> DB2
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Muthukumar.PLindia

New User


Joined: 03 Jul 2006
Posts: 51
Location: Chennai

PostPosted: Tue Jul 04, 2006 9:27 am
Reply with quote

IS it possible to delete one of the duplicate row in a table using sql query with out using cursors and without creating any views?
Back to top
View user's profile Send private message
Hanfur

Active User


Joined: 21 Jun 2006
Posts: 104

PostPosted: Tue Jul 04, 2006 12:38 pm
Reply with quote

Its a quiet popular post but posting answres once again..

Assuming your table name as EMP and have an EMPID column as repeating..


DELETE FROM EMP WHERE
EMPID IN( SELECT EMPID FROM EMP
GROUP BY(EMPID) HAVING COUNT(*)>1)

This would delete the all the duplicate EMPID row from EMP Table.

2.If u want to retain atleast one entry of the duplicate EMPID 's in Table
then issue


DELETE FROM EMP A WHERE EMPID <>
( SELECT MIN(EMPID) FROM EMP B
WHERE B.EMPID=A.EMPID)

-Han
Back to top
View user's profile Send private message
Muthukumar.PLindia

New User


Joined: 03 Jul 2006
Posts: 51
Location: Chennai

PostPosted: Tue Jul 04, 2006 11:02 pm
Reply with quote

Han am bit confused.. in the subquery how can we find the duplicate copy using min(empid) as emp id is a duplicate..
Back to top
View user's profile Send private message
Hanfur

Active User


Joined: 21 Jun 2006
Posts: 104

PostPosted: Wed Jul 05, 2006 12:24 pm
Reply with quote

Try this..Iam assuming EMPID and NAME would come mulitple times(DUPLICATE)..

DELETE FROM EMP A
WHERE ROWID <>
(SELECT MIN(ROWID) FROM EMP B
WHERE A.EMPID = B.EMPID
AND A.NAME = B.NAME



ROWID could be created at the time when create the Table.
using the statement.

MYROWID ROWID NOT NULL WITH DEFAULT GENERATED ALWAYS in Table defenition.

Here MYROWID would be the column Name for ROWID and as on when you insert NAME to Table ;DB2 automatically populate MYROWID with unique value to this.

Hope your confusion is over by now...
-Han.
Back to top
View user's profile Send private message
parikshit123

Active User


Joined: 01 Jul 2005
Posts: 269
Location: India

PostPosted: Wed Jul 05, 2006 12:36 pm
Reply with quote

Hi,

I think, following query will delete the duplicates on column1 and column2.


delete table_a
where rowid not in
(select min(rowid) from table_a
group by column1, column2);
Back to top
View user's profile Send private message
Muthukumar.PLindia

New User


Joined: 03 Jul 2006
Posts: 51
Location: Chennai

PostPosted: Wed Jul 05, 2006 1:29 pm
Reply with quote

HI han and Parshikthi..
thanks for yor super support...
Back to top
View user's profile Send private message
Rameshs

New User


Joined: 15 Jun 2005
Posts: 53
Location: India, Chennai

PostPosted: Wed Jul 05, 2006 2:30 pm
Reply with quote

Deleting Duplicate Records from a Table
It's simple by ROW_NUMBER function available from db2 v8.1 :-)))))

Assume you have a table TRANS(TRANSNO integer,item char(10)) with the
following records:

TRANSNO ITEM
----------- ------------------------------
1 TEST1
1 TEST1
2 Test2
2 Test2
3 Test3
4 Test4
5 Test5
5 Test5
4 Test4
5 Test5
10 record(s) selected.

If you want to delete, all duplicate records from the table, the
following steps may be used:

1) create view del_tmp as select a,b ,row_number() over(partition by
TRANSNO,ITEM) as row# from TRANS

select * from del_tmp


TRANS ITEM ROW#
----------- ------------------------------ --------------------
1 Test1 1
1 Test1 2
2 Test2 1
2 Test2 2
3 Test3 1
4 Test4 1
4 Test4 2
5 Test5 1
5 Test5 2
5 Test5 3

10 record(s) selected.


2) delete from del_tmp where row# >= 2

This will eliminate duplicate rows from the base table TRANS.

select * from TRANS

TRANSNO ITEM
----------- ------------------------------
1 Test1
2 Test2
3 Test3
4 Test4
5 Test5

5 record(s) selected.

Explanation:
ROW_NUMBER is an OLAP function which tells DB2 to create an incremental
number column.

Let's review the syntax of this OLAP specification:
syntax1:Row_Number() Over (Order By col_name)

The OVER keyword instructs DB2 what column or columns should be used to
control the numbering order
(e.g)
Select ProductName,
Row_Number() Over (Order By ProductName)
As ProductNumber
From Products
Order By ProductName
Result:

Product productno

A 1
B 2
C 3
D 4


Syntax2:Row_Number() Over (partition By col_name)

The partition keyword which tells DB2 how to reset the numbering by a
change in value or values.

(e.g) Above
Back to top
View user's profile Send private message
Hanfur

Active User


Joined: 21 Jun 2006
Posts: 104

PostPosted: Wed Jul 05, 2006 3:28 pm
Reply with quote

Parikshit,
Is that possible to use the GROUPBY COLUMN1 ,COLUMN2 with out any referece of COLUMN1,COLUMN2 in the SELECT clause?

-Han.
Back to top
View user's profile Send private message
Muthukumar.PLindia

New User


Joined: 03 Jul 2006
Posts: 51
Location: Chennai

PostPosted: Thu Jul 06, 2006 10:12 am
Reply with quote

We can use aggregate functions wih any fields in the select clause while using the group by clause HAN.


But your query will run based on joins, Parshikts will take it in IN clause.
Correct me if i am wrong..

Parshikt and HAN,
IN my mainframes, subquery put up inside a IN clause tooks enormous amount of time to run.. DO you know the reason. did u face such type of scenarios
Thanks in advance
Muthu
Back to top
View user's profile Send private message
Muthukumar.PLindia

New User


Joined: 03 Jul 2006
Posts: 51
Location: Chennai

PostPosted: Thu Jul 06, 2006 10:16 am
Reply with quote

HI ramesh,
we are left with version 7 DB2 , and the row_numer is not recogniized..............
thank you
Back to top
View user's profile Send private message
Hanfur

Active User


Joined: 21 Jun 2006
Posts: 104

PostPosted: Thu Jul 06, 2006 5:17 pm
Reply with quote

Muth,

Reason for that could be cos the NOT IN (Negation generally) converts
a stage1 predicate to a stage2 predicate degrading peformance.

So performance point of view its always better to go stage1 predicates which would improve performance by using IN instead of NOT IN..



-Han.
Back to top
View user's profile Send private message
Muthukumar.PLindia

New User


Joined: 03 Jul 2006
Posts: 51
Location: Chennai

PostPosted: Sun Jul 09, 2006 5:18 pm
Reply with quote

HI han
Could you please give me any material on performance tuning in JCL... I am very intrested in that.. so that i can understand stage 1 and stage 2 predicates and all

Regards,
Muthu
Back to top
View user's profile Send private message
Muthukumar.PLindia

New User


Joined: 03 Jul 2006
Posts: 51
Location: Chennai

PostPosted: Sun Jul 09, 2006 5:19 pm
Reply with quote

HI han
Could you please give me any material on performance tuning in SQL... I am very intrested in that.. so that i can understand stage 1 and stage 2 predicates and all

Regards,
Muthu
Back to top
View user's profile Send private message
Hanfur

Active User


Joined: 21 Jun 2006
Posts: 104

PostPosted: Mon Jul 10, 2006 11:31 am
Reply with quote

Muth,
Plenty of manuals are available on net..

Check the title SQL tuning by Dan Tow too .In this book a mathematical approach is given on tuning queries based on his experience..

-Han.
Back to top
View user's profile Send private message
Muthukumar.PLindia

New User


Joined: 03 Jul 2006
Posts: 51
Location: Chennai

PostPosted: Mon Jul 10, 2006 11:57 am
Reply with quote

THanks for all..


Han,
A special thanks to you for coming all the way through the discussion..

Muthu
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DB2

 


Similar Topics
Topic Forum Replies
No new posts Deleting a Tape file JCL & VSAM 14
No new posts CA7 deleting files in steps after job... CA Products 4
No new posts How to remove block of duplicates DFSORT/ICETOOL 8
This topic is locked: you cannot edit posts or make replies. Compare files with duplicates in one ... DFSORT/ICETOOL 11
No new posts Merging 2 files but ignore duplicate... DFSORT/ICETOOL 1
Search our Forums:

Back to Top