View previous topic :: View next topic
|
Author |
Message |
himanshupant
New User
Joined: 21 Mar 2007 Posts: 46 Location: India
|
|
|
|
Hello
I have a query ( bad pun)
Suppose there is a table with following rows
Code: |
Branch No Account No Date TYPE AMOUNT
11111 22222 2010-12-01 C 35
11111 22222 2010-12-01 C 10
11111 22222 2011-01-01 C 10
11111 22222 2011-01-01 C 10
11111 22222 2011-02-01 C 10
11111 22222 2011-03-01 C 30
11111 22222 2011-04-01 D 10 |
We want to delete all duplicate rows in a way that
1) For a given DATE there should only be ONE C type record.Furthermore that C record should be the one which has the least AMOUNT values amongst all the C records for that particular branch and account on that particular day.
2) In case two or more records have same value of amount and type C on a given day then only one occurrence should be there.
So the above table after execution of the SQL should look like as below:-
Code: |
Branch No Account No Date TYPE AMOUNT
11111 22222 2010-12-01 C 10
11111 22222 2011-01-01 C 10
11111 22222 2011-02-01 C 10
11111 22222 2011-03-01 C 30
11111 22222 2011-04-01 D 10 |
A point to be noted is that branch + account combination is unique. In other words a same account number could be existing in another branch number. Similarly a single branch would have multiple accounts.
I think we need to use NESTED SELECT inside the WHERE clause of DELETE SQL but am not able to get this out. |
|
Back to top |
|
|
Kjeld
Active User
Joined: 15 Dec 2009 Posts: 365 Location: Denmark
|
|
|
|
Is this question a student assigment?
Mass deletions are not DB2's favorite. Depending on data quantities, I would consider unloading the data, sort the unload in Branch #, Account #, Date, Type, Amount. The sort routine could possibly be configured only to output the first occurrence of the records of type 'C' for a given combination of the first 4 keys. Alternatively, I would code a small batch program to do it.
Then you load the result dataset to the table with replace option. |
|
Back to top |
|
|
himanshupant
New User
Joined: 21 Mar 2007 Posts: 46 Location: India
|
|
|
|
this is not a student assignment..One time data cleanup as described above needs to be done that is why SQL DELETE would be preferred |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
The better solution would to be do as suggested and retain the "footprints" of what was done. A blind delete provides no audit trail. . .
Also, as this has happened, having a solid solution will help if something similar happens one day.
There is no reason to limit solutions to the smallest amount of sql code that can be written. |
|
Back to top |
|
|
himanshupant
New User
Joined: 21 Mar 2007 Posts: 46 Location: India
|
|
|
|
Sure Dick...
I concur with your point that DELETE does leave no audit trail.. I had the routine safety measures in mind like taking a pre DELETE IMAGE COPY , exhaustive checkouts once DELETE is done so ensure that no unwanted data is remaining and no genuine data is lost.
Development effort for batch program would have made sense had this been a regular cleanup ...
However I just thought of a altogether different approach to tweak the data retrieval SELECT queries so that the records which I want to be deleted are not FETCHED in the first place in any application program.
Something like the below
Code: |
SELECT
BRANCH NO,
ACCOUNT NO,
DATE,
TYPE,
MIN(AMOUNT)
FROM TABLE
GROUP BY
BRANCH NO,
ACCOUNT NO,
DATE,
TYPE
HAVING TYPE = 'C' |
|
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
different approach to tweak the data retrieval SELECT queries so that the records which I want to be deleted are not FETCHED in the first place in any application program. |
I don't understand how this deals with the requirement that was being talked about originally
If the table has become corrupted with rows that should not be in the table, how will this SELECT really help? I suggest you correct the data in the table and not make a mess througout the entire application as well as any "other" processes that need to read the table. What about ad-hoc queries/reporting? |
|
Back to top |
|
|
himanshupant
New User
Joined: 21 Mar 2007 Posts: 46 Location: India
|
|
|
|
Hi Dick
I am wanting to tweak the SELECT because very few rows are corrupted and the development effort required to cleanup them up would be difficult to justify in my establishment.
The above SELECT query which I have highlighted is helping me to get the required rows where TYPE is "C"
But the requirement is to get all the possible rows for each branch / account between two dates. The requirement for TYPE = C still stands . So for a branch / account if on a give day between the given date range , if there are multiple 'C' type records we only select single minimum amount from it. |
|
Back to top |
|
|
GuyC
Senior Member
Joined: 11 Aug 2009 Posts: 1281 Location: Belgium
|
|
|
|
you shouldn't use HAVING for fields which are know before grouping.
there is no point in grouping everything and then discarding <> "C"
Code: |
SELECT BRANCH NO, ACCOUNT NO, DATE, TYPE, MIN(AMOUNT)
FROM TABLE1
where TYPE = 'C'
GROUP BY BRANCH NO, ACCOUNT NO, DATE, TYPE |
|
|
Back to top |
|
|
Kjeld
Active User
Joined: 15 Dec 2009 Posts: 365 Location: Denmark
|
|
|
|
The requirement was to DELETE the rows. I am not sure if it is at all possible to delete specific rows from a select with aggregate functions imbedded in it. |
|
Back to top |
|
|
GuyC
Senior Member
Joined: 11 Aug 2009 Posts: 1281 Location: Belgium
|
|
|
|
you can use aggregate functions in a subselect of a delete, but :
I don't think you can Delete 1 of 2 "completely the same rows".
maybe with RID(), but haven't tried that yet. |
|
Back to top |
|
|
himanshupant
New User
Joined: 21 Mar 2007 Posts: 46 Location: India
|
|
|
|
Kjeld wrote: |
The requirement was to DELETE the rows. I am not sure if it is at all possible to delete specific rows from a select with aggregate functions imbedded in it. |
Unfortunately it looks like this only so I have started thinking on changing the data retrieval routines so that unwanted rows are not in the result set.. Maybe a collection of UNION with SELECT along with nested queries... |
|
Back to top |
|
|
GuyC
Senior Member
Joined: 11 Aug 2009 Posts: 1281 Location: Belgium
|
|
|
|
DB2 is mostly limited by the knowledge of its users.
Code: |
select BRANCHNO, ACCNO, DT1, type, AMOUNT , rid(A)
,case when rid(A) is distinct from ( select min(rid(b)) from table1 B
where a.BRANCHNO = b.BRANCHNO
and a.ACCNO = b.accno
and a.DT1 = b.dt1
and a.type = b.type
and a.AMOUNT = (select min(amount) from table1 C
where a.BRANCHNO = C.BRANCHNO
and a.ACCNO = C.accno
and a.DT1 = C.dt1
and a.type = C.type )
)
then '<=Del' else '' end
from table1 A
where A.type = 'C' |
Code: |
delete
from table1 A
where A.type = 'C'
and rid(A) is distinct from (
select min(rid(b)) from table1 B
where a.BRANCHNO = b.BRANCHNO
and a.ACCNO = b.accno
and a.DT1 = b.dt1
and a.type = b.type
and a.AMOUNT = (select min(amount) from table1 C
where a.BRANCHNO = C.BRANCHNO
and a.ACCNO = C.accno
and a.DT1 = C.dt1
and a.type = C.type )
) |
|
|
Back to top |
|
|
Kjeld
Active User
Joined: 15 Dec 2009 Posts: 365 Location: Denmark
|
|
|
|
GuyC wrote: |
DB2 is mostly limited by the knowledge of its users.
|
True, but often knowledge is aquired when it is actively needed. And then your true skills in aquiring knowledge efficiently will make the difference.
Anyway, it looks like a neat solution, Guy |
|
Back to top |
|
|
himanshupant
New User
Joined: 21 Mar 2007 Posts: 46 Location: India
|
|
|
|
Kjeld wrote: |
GuyC wrote: |
DB2 is mostly limited by the knowledge of its users.
|
True, but often knowledge is aquired when it is actively needed. And then your true skills in aquiring knowledge efficiently will make the difference.
Anyway, it looks like a neat solution, Guy |
I do agree it does look very neat.. and I do agree with you that knowledge is aquired when actively needed . One more example of this is various possible usage of IBM DFSORT |
|
Back to top |
|
|
himanshupant
New User
Joined: 21 Mar 2007 Posts: 46 Location: India
|
|
|
|
GuyC wrote: |
DB2 is mostly limited by the knowledge of its users.
Code: |
select BRANCHNO, ACCNO, DT1, type, AMOUNT , rid(A)
,case when rid(A) is distinct from ( select min(rid(b)) from table1 B
where a.BRANCHNO = b.BRANCHNO
and a.ACCNO = b.accno
and a.DT1 = b.dt1
and a.type = b.type
and a.AMOUNT = (select min(amount) from table1 C
where a.BRANCHNO = C.BRANCHNO
and a.ACCNO = C.accno
and a.DT1 = C.dt1
and a.type = C.type )
)
then '<=Del' else '' end
from table1 A
where A.type = 'C' |
Code: |
delete
from table1 A
where A.type = 'C'
and rid(A) is distinct from (
select min(rid(b)) from table1 B
where a.BRANCHNO = b.BRANCHNO
and a.ACCNO = b.accno
and a.DT1 = b.dt1
and a.type = b.type
and a.AMOUNT = (select min(amount) from table1 C
where a.BRANCHNO = C.BRANCHNO
and a.ACCNO = C.accno
and a.DT1 = C.dt1
and a.type = C.type )
) |
|
Correct me if I am wrong but the DELETE query looks like it would delete only one occurence of the minimum C type record. In case there are 3 'C' entries with same value , only one value should remain |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
I'm a bit late to the party, but this:
Quote: |
I am wanting to tweak the SELECT because very few rows are corrupted and the development effort required to cleanup them up would be difficult to justify in my establishment. |
makes no sense. . . The "development effort" should be minutes possibly even an hour or 2. . .
Versus the "effort" of propogating this thru every query that does (or ever will be written) query the table. The small number of rows should not even be considered. It is the code that will take the time and cause confusion.
Had the first reply/suggestion from Kjeld been implemented this would have been complete 2 days ago. . .
Just because data lives in a database is no reason to use only the database to work on the data. Wholesale processes are often better served operating outside the database/sql. |
|
Back to top |
|
|
GuyC
Senior Member
Joined: 11 Aug 2009 Posts: 1281 Location: Belgium
|
|
|
|
DB2 version ?
I think RID() came in DB2 v9 NFM.
If two rows are completely identical(all columns), the only difference is the RID.
pre v9 you have a few possibilities ( some already explained) :
1) do an unload/sort/reload
2) write a small program with a cursor and do a DELETE WHERE CURRENT OF
3) create a new table and do an insert (select group by)
... |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
himanshupant wrote: |
looks like it |
had you tested, and based on the results of the test, reached the same conclusion as your assumption,
i would pay attention to your criticizm of the sql.
you didn't, so I won't. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Testing - we don' need no steenkin' testing. . .
d |
|
Back to top |
|
|
|