Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Optimization Required for ICETOOL duplicate removal

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL
View previous topic :: :: View next topic  
Author Message
VivekKhanna

New User


Joined: 09 Feb 2009
Posts: 57
Location: India

PostPosted: Tue May 08, 2012 11:09 pm    Post subject: Optimization Required for ICETOOL duplicate removal
Reply with quote

Hi Team,

I am performing SORT and Duplicate Removal (using ICETOOL) on a tape file holding 900 million records. Consider an example of file with key information as:

Code:

FIELD             START_POS              LENGTH
EMP_ID              1                       10
EMP_DEPT            15                      10
DEPT_DIV            25                      04
SEQ_NB              11                      04



The requirement says that:

a. Perform SORT based on ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, and descending order of SEQ_NB.
b. Use the output from previous step and Perform duplicate removal using ICETOOL using the following control card:

Code:
SELECT FROM(SORTIN) TO(SORTOUT)-     
 ON(15,10,CH)-                         
 ON(25,04,CH)-
 ON(01,10,CH)-                         
 FIRST DISCARD(SORTXSUM) 


I have performed the above in two different steps. Here I am facing performance issues, since the time taken by both the steps is (2.5+3.5) hrs approx. All the intermediate and final files are on tape. DCB used is:

Code:
DCB = (LRECL=242,BLKSIZE=0,RECFM=FB,BUFNO=10)


Changing the BLKSIZE,BUFNO be of any help in this regard?

NOTE: I cannot use SORT using SUM FIELDS=NONE, for duplicate removal.
Back to top
View user's profile Send private message

Bill Woodger

DFSORT Moderator


Joined: 09 Mar 2011
Posts: 7228

PostPosted: Wed May 09, 2012 12:17 am    Post subject: Reply to: Optimization Required for ICETOOL duplicate remova
Reply with quote

Isn't the selection, without specifying SORT FIELDS=NONE or OPTION COPY in a xxxxCNTL file, going to sort again?

Do you need two separate sorted files, one just sorted and one, on different key, de-duped?

If you can set out exactly why there SORT followed by SELECT(with SORT) are needed, it'll be a help.
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Wed May 09, 2012 12:37 am    Post subject: Re: Optimization Required for ICETOOL duplicate removal
Reply with quote

VivekKhanna wrote:
The requirement says that:

a. Perform SORT based on ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, and descending order of SEQ_NB.
b. Use the output from previous step and Perform duplicate removal using ICETOOL using the following control card:

Code:
SELECT FROM(SORTIN) TO(SORTOUT)-     
 ON(15,10,CH)-                         
 ON(25,04,CH)-
 ON(01,10,CH)-                         
 FIRST DISCARD(SORTXSUM) 


I have performed the above in two different steps. Here I am facing performance issues, since the time taken by both the steps is (2.5+3.5) hrs approx. All the intermediate and final files are on tape. DCB used is:

Code:
DCB = (LRECL=242,BLKSIZE=0,RECFM=FB,BUFNO=10)


Changing the BLKSIZE,BUFNO be of any help in this regard?

NOTE: I cannot use SORT using SUM FIELDS=NONE, for duplicate removal.


Your requirement does NOT match the Control cards you show on the SELECT . How did you change the order from ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, to EMP_DEPT,DEPT_DIV,EMP_ID ???

Just for the record, you do not need 2 passes of data. Here is a DFSORT JCL which will give you the first record upon sorting the data in ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, and descending order of SEQ_NB.


Code:

//STEP0100 EXEC PGM=ICETOOL   
//TOOLMSG  DD SYSOUT=*         
//DFSMSG   DD SYSOUT=*         
//IN       DD DSN=Your Input tape file,DISP=SHR
//OUT      DD SYSOUT=*                   
//TOOLIN   DD *                           
  SELECT FROM(IN) TO(OUT) FIRST USING(CTL1) -
      ON(01,10,CH) ON(15,10,CH) ON(25,4,CH)
//*
//CTL1CNTL DD *
  SORT FIELDS=(01,10,CH,A,    $ EMP-ID
               15,10,CH,A,    $ EMP-DEPT   
               25,04,CH,A,    $ DEPT-DIV
               11,04,CH,D)    $ SEQ_NB
//*
Back to top
View user's profile Send private message
Bill Woodger

DFSORT Moderator


Joined: 09 Mar 2011
Posts: 7228

PostPosted: Wed May 09, 2012 4:37 am    Post subject: Reply to: Optimization Required for ICETOOL duplicate remova
Reply with quote

What order is your original file in?

Is the order of fields for your SORT correct or does the SORT or SELECT contain a typo?

If your file is in the correct order, except for this SEQ-NB (sequence number?) and the purpose of the sequence number is to ensure that the last record is the duplicate that is retained, then I believe you can do the whole thing without even sorting the file once. To put it another way, is SEQ-NB ascending within the rest of the key, and you want to use it to keep the last record with duplicate key?

Are you that lucky?

SELECT as Kolusu showed, but with LASTDUP, and SORT FIELDS=NONE/OPTION COPY in the CTL1CNTL.

If that's what you've got, I'll PM you two invoices, one for the resource saving and the other for the mindreading.
Back to top
View user's profile Send private message
VivekKhanna

New User


Joined: 09 Feb 2009
Posts: 57
Location: India

PostPosted: Wed May 09, 2012 3:16 pm    Post subject:
Reply with quote

Hi Bill

1. There is no order for original file. Original file is a table unload.

2. Order for SORT and SELECT is different. It is clearly mentioned in the above requirement.

Quote:
a. Perform SORT based on ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, and descending order of SEQ_NB.


3. MAX(SEQ_NB) gives me the latest record, for the EMP_ID, EMP_DEPT and DEPT_DIV. I need to pick the record for above key combination with MAX(SEQ_NB).

4. LUCKY?? Didn't get that.

Hope the above replies resolve your queries.
Back to top
View user's profile Send private message
Bill Woodger

DFSORT Moderator


Joined: 09 Mar 2011
Posts: 7228

PostPosted: Wed May 09, 2012 3:43 pm    Post subject: Reply to: Optimization Required for ICETOOL duplicate remova
Reply with quote

The better you describe your requirement, the better answers you get. If you'd mentioned that it was a database unload...

Lucky, because if it was like that, already in order, you'd have had no sort to do at all.

You mentioned fields in different order in the SORT and in the SELECT. Was that deliberate?

The SELECT, unless told otherwise, does a SORT.

If you only need one order, and the file is currently unordered, then Kolusu's code is going to do exactly what you want, and it is going to save a whole SORT. Which is pretty lucky.

You only need to SORT twice if you need the output in two different orders for some reason. So just one step needed, Kolusu's, unless you need two orders and forgot to mention that as well, and mistyped the keys.
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Wed May 09, 2012 9:20 pm    Post subject: Re: Reply to: Optimization Required for ICETOOL duplicate re
Reply with quote

Bill Woodger wrote:

SELECT as Kolusu showed, but with LASTDUP, and SORT FIELDS=NONE/OPTION COPY in the CTL1CNTL.


Bill,

I think you overlooked that I am sorting on the Seq num DESCENDING. BTW you need LAST and not LASTDUP in this case to pick any Unique combo records. As is SELECT with FIRST is much more efficient than LASTDUP or LAST.

VivekKhanna wrote:

3. MAX(SEQ_NB) gives me the latest record, for the EMP_ID, EMP_DEPT and DEPT_DIV. I need to pick the record for above key combination with MAX(SEQ_NB).


If this indeed your requirement then the above Shown JCL by me will give you the desired results. Did you try it?
Back to top
View user's profile Send private message
Bill Woodger

DFSORT Moderator


Joined: 09 Mar 2011
Posts: 7228

PostPosted: Wed May 09, 2012 10:10 pm    Post subject: Reply to: Optimization Required for ICETOOL duplicate remova
Reply with quote

I had my failed mind-reading hat on. Without the knowledge that it was a database unload, I was thinking it must be some type of ordinary file, with some sequence to it already, and with a sequence number already that must/should/could be ascending on the natural sequence of the file.

Then, if someone wanted to "de-dupe" but keep the last record, not the first, and therefore sorted 980 million records with the sequence number Descending to get the last first, being the only purpose of the sort, then the whole sorting of the file could be avoided with an appropriate SELECT.

However, the mind-reading hat, as mentioned, failed. I should keep it locked away. I've been told hats don't suit me anyway.

Turns out it was a database unload, sequence "whatever", needs sorting anyway, so why not descending on the sequence number, with NOEQUALS if they are unique (and they are not part of the de-dupe key).

I didn't know that FIRST was more efficient, so that is filed away now, thanks.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> DFSORT/ICETOOL All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts Limit duplicate records in the SORT pshongal SYNCSORT 6 Mon Nov 21, 2016 12:54 pm
No new posts How to delete second instance from Fl... Gunapala CN DFSORT/ICETOOL 6 Tue Oct 18, 2016 11:42 pm
No new posts ICETOOL/SYNCSORT - SYSPRINT output nartcr DFSORT/ICETOOL 17 Wed Oct 05, 2016 10:46 pm
No new posts SQL Query optimization. arunsoods DB2 12 Sun Sep 04, 2016 6:27 am
No new posts Icetool or Joinkeys example to compar... ishant chauhan DFSORT/ICETOOL 21 Sat Aug 20, 2016 2:40 am


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us