|
View previous topic :: View next topic
|
| Author |
Message |
Nilanjan Sikdar
New User
Joined: 26 Feb 2016 Posts: 9 Location: India
|
|
|
|
Hi,
I have a requirement to remove duplicate record based on threshold limit. The limit will be parameterized and can be mentioned in control card. If the number of duplicate is more than the limit then the job should fail. Is it possible to do using DFSORT? Please help
Thanks,
Nilanjan |
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
I would think of this as one step solution.
1. Add duplicate counts per key at the end of the record using INREC
2. Using OUTFIL only include records whos counts form the INREC is greater than threshold limit (use JP1) and using NULLOFL set RC. |
|
| Back to top |
|
 |
Nic Clouston
Global Moderator
Joined: 10 May 2007 Posts: 2454 Location: Hampshire, UK
|
|
|
|
I think that should be BELOW the limit not GREATER than the limit.
Also, I am not clear on 2 points:
1 - are you removing ALL duplicates or just duplicates over the limit
2 - is the limit referring to the total duplicates in the data set or the number of duplicates per record. |
|
| Back to top |
|
 |
Nilanjan Sikdar
New User
Joined: 26 Feb 2016 Posts: 9 Location: India
|
|
|
|
Hi Rohit,
I don't want to keep count for individual key rather overall duplicate count.
Hi Nic,
1 - in case of duplicate over the limit job should fail (in a sense that the file is not correct).
2 - the limit referring to total duplicate.
For example: Say the max duplicate limit is set to 3 in sort card and below is the records:
input:
AAAAA
BBBBB
AAAAA
CCCCC
DDDDD
AAAAA
EEEEE
Output should be:
AAAAA
BBBBB
CCCCC
DDDDD
EEEEE
But if the input is like below:
AAAAA
BBBBB
CCCCC
CCCCC
DDDDD
CCCCC
CCCCC
then the step should fail as number of duplicate here is 4 which is more than the threshold limit.
Thanks,
Nilanjan |
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
Nilanjan,
Try this. You are in control of //SYMNAMES DD * to change it dynamically.
| Code: |
//*
//*GET THE TOTAL DUPPLICATE COUNT ACROSS KEYS AND UNIQUE RECORDS
//*
//STEP0100 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SORTIN DD *
AAAAA
AAAAA
AAAAA
AAAAA
AAAAA
AAAAA
CCCCC
BBBBB
DDDDD
//GOOD DD SYSOUT=*
//BAD DD DSN=&&S1,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SORTLIST DD SYSOUT=*
//SYSIN DD *
SORT FIELDS=(1,5,CH,A)
INREC OVERLAY=(20:C'00000001')
SUM FIELDS=(20,8,ZD)
OUTFIL FNAMES=BAD,REMOVECC,NODETAIL,INCLUDE=(20,8,ZD,GE,00000002),
TRAILER1=(C'TOTAL :',TOT=(20,8,ZD,EDIT=(TTTTTTTT)))
OUTFIL FNAMES=GOOD,BUILD=(1,5)
//*
//*SET RC=04 IF DUPLICATES ARE BEYOND THE THRESHOLD SUPPLIED IN SYM
//*
//STEP0200 EXEC PGM=SORT,PARM='NULLOUT=RC4'
//SYMNAMES DD *
DUPLIMIT,00000002
//SYSOUT DD SYSOUT=*
//SYSPRINT DD SYSOUT=*
//SORTIN DD DSN=&&S1,DISP=(OLD,PASS)
//SORTOUT DD SYSOUT=*
//SORTLIST DD SYSOUT=*
//SYSIN DD *
OPTION COPY
INCLUDE COND=(12,8,ZD,LE,DUPLIMIT) |
|
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|