IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Removing duplicate record based on threshold limit


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   This topic is locked: you cannot edit posts or make replies.
View previous topic :: View next topic  
Author Message
Nilanjan Sikdar

New User


Joined: 26 Feb 2016
Posts: 9
Location: India

PostPosted: Mon Jul 22, 2019 8:29 pm
Reply with quote

Hi,

I have a requirement to remove duplicate record based on threshold limit. The limit will be parameterized and can be mentioned in control card. If the number of duplicate is more than the limit then the job should fail. Is it possible to do using DFSORT? Please help

Thanks,
Nilanjan
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3049
Location: NYC,USA

PostPosted: Tue Jul 23, 2019 1:05 am
Reply with quote

I would think of this as one step solution.
1. Add duplicate counts per key at the end of the record using INREC
2. Using OUTFIL only include records whos counts form the INREC is greater than threshold limit (use JP1) and using NULLOFL set RC.
Back to top
View user's profile Send private message
Nic Clouston

Global Moderator


Joined: 10 May 2007
Posts: 2455
Location: Hampshire, UK

PostPosted: Tue Jul 23, 2019 12:45 pm
Reply with quote

I think that should be BELOW the limit not GREATER than the limit.
Also, I am not clear on 2 points:
1 - are you removing ALL duplicates or just duplicates over the limit
2 - is the limit referring to the total duplicates in the data set or the number of duplicates per record.
Back to top
View user's profile Send private message
Nilanjan Sikdar

New User


Joined: 26 Feb 2016
Posts: 9
Location: India

PostPosted: Tue Jul 23, 2019 2:06 pm
Reply with quote

Hi Rohit,

I don't want to keep count for individual key rather overall duplicate count.

Hi Nic,

1 - in case of duplicate over the limit job should fail (in a sense that the file is not correct).
2 - the limit referring to total duplicate.

For example: Say the max duplicate limit is set to 3 in sort card and below is the records:
input:

AAAAA
BBBBB
AAAAA
CCCCC
DDDDD
AAAAA
EEEEE

Output should be:
AAAAA
BBBBB
CCCCC
DDDDD
EEEEE

But if the input is like below:
AAAAA
BBBBB
CCCCC
CCCCC
DDDDD
CCCCC
CCCCC

then the step should fail as number of duplicate here is 4 which is more than the threshold limit.

Thanks,
Nilanjan
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3049
Location: NYC,USA

PostPosted: Wed Jul 24, 2019 1:24 am
Reply with quote

Nilanjan,
Try this. You are in control of //SYMNAMES DD * to change it dynamically.
Code:
//*                                                 
//*GET THE TOTAL DUPPLICATE COUNT ACROSS KEYS AND UNIQUE RECORDS
//*                                                 
//STEP0100 EXEC PGM=SORT                           
//SYSOUT   DD SYSOUT=*                             
//SYSPRINT DD SYSOUT=*                             
//SORTIN   DD *                                     
AAAAA                                               
AAAAA                                               
AAAAA                                               
AAAAA                                               
AAAAA                                               
AAAAA                                               
CCCCC                                               
BBBBB                                               
DDDDD                                               
//GOOD     DD SYSOUT=*                                             
//BAD      DD DSN=&&S1,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)   
//SORTLIST DD SYSOUT=*                                             
//SYSIN    DD *                                                     
  SORT FIELDS=(1,5,CH,A)                                           
  INREC OVERLAY=(20:C'00000001')                                   
  SUM FIELDS=(20,8,ZD)                                             
  OUTFIL FNAMES=BAD,REMOVECC,NODETAIL,INCLUDE=(20,8,ZD,GE,00000002),
  TRAILER1=(C'TOTAL    :',TOT=(20,8,ZD,EDIT=(TTTTTTTT)))           
  OUTFIL FNAMES=GOOD,BUILD=(1,5)                                   
//*                                                                 
//*SET RC=04 IF DUPLICATES ARE BEYOND THE THRESHOLD SUPPLIED IN SYM
//*                                                                 
//STEP0200 EXEC PGM=SORT,PARM='NULLOUT=RC4'                         
//SYMNAMES DD *                                                     
DUPLIMIT,00000002                                                   
//SYSOUT   DD SYSOUT=*                                             
//SYSPRINT DD SYSOUT=*                                             
//SORTIN   DD DSN=&&S1,DISP=(OLD,PASS)   
//SORTOUT  DD SYSOUT=*               
//SORTLIST DD SYSOUT=*               
//SYSIN    DD *                       
  OPTION COPY                         
  INCLUDE COND=(12,8,ZD,LE,DUPLIMIT)   
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   This topic is locked: you cannot edit posts or make replies. View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts SFTP Issue - destination file record ... All Other Mainframe Topics 2
No new posts Duplicate transid's declared using CEDA CICS 3
No new posts FINDREP - Only first record from give... DFSORT/ICETOOL 3
No new posts To find whether record count are true... DFSORT/ICETOOL 6
Search our Forums:

Back to Top