I am performing SORT and Duplicate Removal (using ICETOOL) on a tape file holding 900 million records. Consider an example of file with key information as:
a. Perform SORT based on ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, and descending order of SEQ_NB.
b. Use the output from previous step and Perform duplicate removal using ICETOOL using the following control card:
Code:
SELECT FROM(SORTIN) TO(SORTOUT)-
ON(15,10,CH)-
ON(25,04,CH)-
ON(01,10,CH)-
FIRST DISCARD(SORTXSUM)
I have performed the above in two different steps. Here I am facing performance issues, since the time taken by both the steps is (2.5+3.5) hrs approx. All the intermediate and final files are on tape. DCB used is:
Code:
DCB = (LRECL=242,BLKSIZE=0,RECFM=FB,BUFNO=10)
Changing the BLKSIZE,BUFNO be of any help in this regard?
NOTE: I cannot use SORT using SUM FIELDS=NONE, for duplicate removal.
Joined: 07 Dec 2007 Posts: 2205 Location: San Jose
VivekKhanna wrote:
The requirement says that:
a. Perform SORT based on ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, and descending order of SEQ_NB.
b. Use the output from previous step and Perform duplicate removal using ICETOOL using the following control card:
Code:
SELECT FROM(SORTIN) TO(SORTOUT)-
ON(15,10,CH)-
ON(25,04,CH)-
ON(01,10,CH)-
FIRST DISCARD(SORTXSUM)
I have performed the above in two different steps. Here I am facing performance issues, since the time taken by both the steps is (2.5+3.5) hrs approx. All the intermediate and final files are on tape. DCB used is:
Code:
DCB = (LRECL=242,BLKSIZE=0,RECFM=FB,BUFNO=10)
Changing the BLKSIZE,BUFNO be of any help in this regard?
NOTE: I cannot use SORT using SUM FIELDS=NONE, for duplicate removal.
Your requirement does NOT match the Control cards you show on the SELECT . How did you change the order from ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, to EMP_DEPT,DEPT_DIV,EMP_ID ???
Just for the record, you do not need 2 passes of data. Here is a DFSORT JCL which will give you the first record upon sorting the data in ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, and descending order of SEQ_NB.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
What order is your original file in?
Is the order of fields for your SORT correct or does the SORT or SELECT contain a typo?
If your file is in the correct order, except for this SEQ-NB (sequence number?) and the purpose of the sequence number is to ensure that the last record is the duplicate that is retained, then I believe you can do the whole thing without even sorting the file once. To put it another way, is SEQ-NB ascending within the rest of the key, and you want to use it to keep the last record with duplicate key?
Are you that lucky?
SELECT as Kolusu showed, but with LASTDUP, and SORT FIELDS=NONE/OPTION COPY in the CTL1CNTL.
If that's what you've got, I'll PM you two invoices, one for the resource saving and the other for the mindreading.
1. There is no order for original file. Original file is a table unload.
2. Order for SORT and SELECT is different. It is clearly mentioned in the above requirement.
Quote:
a. Perform SORT based on ascending order of EMP_ID, EMP_DEPT, DEPT_DIV, and descending order of SEQ_NB.
3. MAX(SEQ_NB) gives me the latest record, for the EMP_ID, EMP_DEPT and DEPT_DIV. I need to pick the record for above key combination with MAX(SEQ_NB).
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
The better you describe your requirement, the better answers you get. If you'd mentioned that it was a database unload...
Lucky, because if it was like that, already in order, you'd have had no sort to do at all.
You mentioned fields in different order in the SORT and in the SELECT. Was that deliberate?
The SELECT, unless told otherwise, does a SORT.
If you only need one order, and the file is currently unordered, then Kolusu's code is going to do exactly what you want, and it is going to save a whole SORT. Which is pretty lucky.
You only need to SORT twice if you need the output in two different orders for some reason. So just one step needed, Kolusu's, unless you need two orders and forgot to mention that as well, and mistyped the keys.
Joined: 07 Dec 2007 Posts: 2205 Location: San Jose
Bill Woodger wrote:
SELECT as Kolusu showed, but with LASTDUP, and SORT FIELDS=NONE/OPTION COPY in the CTL1CNTL.
Bill,
I think you overlooked that I am sorting on the Seq num DESCENDING. BTW you need LAST and not LASTDUP in this case to pick any Unique combo records. As is SELECT with FIRST is much more efficient than LASTDUP or LAST.
VivekKhanna wrote:
3. MAX(SEQ_NB) gives me the latest record, for the EMP_ID, EMP_DEPT and DEPT_DIV. I need to pick the record for above key combination with MAX(SEQ_NB).
If this indeed your requirement then the above Shown JCL by me will give you the desired results. Did you try it?
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
I had my failed mind-reading hat on. Without the knowledge that it was a database unload, I was thinking it must be some type of ordinary file, with some sequence to it already, and with a sequence number already that must/should/could be ascending on the natural sequence of the file.
Then, if someone wanted to "de-dupe" but keep the last record, not the first, and therefore sorted 980 million records with the sequence number Descending to get the last first, being the only purpose of the sort, then the whole sorting of the file could be avoided with an appropriate SELECT.
However, the mind-reading hat, as mentioned, failed. I should keep it locked away. I've been told hats don't suit me anyway.
Turns out it was a database unload, sequence "whatever", needs sorting anyway, so why not descending on the sequence number, with NOEQUALS if they are unique (and they are not part of the de-dupe key).
I didn't know that FIRST was more efficient, so that is filed away now, thanks.