Remove duplicates without sorting the sequence

saithvis2 · Posted: Tue Aug 22, 2006 5:20 pm

Hi all,

I have a file which is 80 byte in length(of which only the first 10 bytes are of my use ), i have to remove duplicates from the file which are in file. I dont want to change the sequence of the same , i just want to remove the duplicates.

eg:
file a
---------------

abc
xyz
xyz
abd
abd
mno

o/p file
------------------

abc
xyz
abd
mno

Thus , how to go about it ? i have written a job, but it is not accepting
sum fields = none with sort fields =copy. Now, since i want to removing only duplicates and do not sort the sequence , what modification shall i add in my jcl.

my current Jcl:
---------------------

dinguduse · New User Joined: 24 Jun 2005 Posts: 8

Hi,

1) add sequence number using sort
2) remove duplicates
3) resequence based on the sequence number you added.

Rgds,
Aravind

saithvis2 · Posted: Tue Aug 22, 2006 7:22 pm

Hi all,

I have done the same using rexx , but is there a way i can do the same using only sort.

job used:

Frank Yaeger · Posted: Tue Aug 22, 2006 7:59 pm

Here's a DFSORT/ICETOOL job that will do what you asked for:

saithvis2 · Posted: Tue Aug 22, 2006 8:42 pm

Hi Frank,

Thanks alot , ICETOOL is really a powerful tool. I will go through other DFSORT/ICETOOL tricks for which you have given link in other posts of this community.

Have a great day !

Regards
Vishal

vjai6977 · New User Joined: 08 Aug 2008 Posts: 19 Location: Chennai

Hi, Referring to the above job and requirement, I have the below requirement and the sort card modified for my requirement, but I don't get my required result.

Job I used:
-------------

//S1 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD DSN=USR7.TST.INPUT,DISP=SHR
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//OUT DD DSN=USR7.TST.OUTPUT,DISP=OLD
//TOOLIN DD *
SORT FROM(IN) TO(T1) USING(CTL1)
SORT FROM(T1) TO(OUT) USING(CTL2)
/*
//CTL1CNTL DD *
INREC FIELDS=(1,65,70:SEQNUM,8,ZD)
SORT FIELDS=(5,25,CH,A,51,3,CH,A,55,3,CH,A,59,3,CH,A,63,3,CH,A)
SUM FIELDS=NONE
/*
//CTL2CNTL DD *
SORT FIELDS=(70,8,ZD,A)
OUTREC FIELDS=(1,65)
/*

Input File:
------------

----+----1----+----2----+----3----+----4----+----5----+----6----+----
TEST.INPUT.FILE1(0) JN1 PN1 PS1 DD1
TEST.INPUT.FILE1(-1) JN1 PN1 PS1 DD1
TEST.INPUT.INPUT1(0) JN1 PN1 PS1 DD2
TEST.INPUT.INPUT2 JN1 PN1 PS1 DD3
TEST.INPUT.INPUT1 JN2 PN2 PS1 DD1
TEST.INPUT.INPUT2 JN2 PN2 PS1 DD2
TEST.INPUT.INPUT2 JN2 PN2 PS1 DD2
TEST.INPUT.INPUT1 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT2 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT3 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT4 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT5 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT6 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT1 JN4 PN4 PS1 DD1

Output File:
--------------

----+----1----+----2----+----3----+----4----+----5----+----6----+----
TEST.INPUT.FILE1(0) JN1 PN1 PS1 DD1
TEST.INPUT.FILE1(-1) JN1 PN1 PS1 DD1
TEST.INPUT.INPUT1(0) JN1 PN1 PS1 DD2
TEST.INPUT.INPUT2 JN1 PN1 PS1 DD3
TEST.INPUT.INPUT1 JN2 PN2 PS1 DD1
TEST.INPUT.INPUT2 JN2 PN2 PS1 DD2
TEST.INPUT.INPUT1 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT2 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT3 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT4 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT5 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT6 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT1 JN4 PN4 PS1 DD1

REQUIRED OUTPUT::
------------------------

----+----1----+----2----+----3----+----4----+----5----+----6----+----
TEST.INPUT.FILE1(0) JN1 PN1 PS1 DD1
TEST.INPUT.FILE1(-1) JN1 PN1 PS1 DD1
TEST.INPUT.INPUT1(0) JN1 PN1 PS1 DD2
TEST.INPUT.INPUT2 JN1 PN1 PS1 DD3
TEST.INPUT.INPUT1 JN2 PN2 PS1 DD1
TEST.INPUT.INPUT1 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT2 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT3 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT4 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT5 JN3 PN3 PS1 DD1
TEST.INPUT.INPUT6 JN3 PN3 PS1 DD1

Note:

This input data is a Job Vs Dataset cross reference containing
Dataset Name - Columns 5-25
Jobname - Columns 51-53
Proc name - Columns 55-57
proc Step name - Columns 59-61
proc DD name - Columns 63-65

The Input Rows 8 thru 13 are are datasets belonging to concatenated dataset of a DD DD1 .

My Requirement to the get the output as in Required Output column:
-------------------------------------------------------------------------------
1. Sorting should be done removing duplicates and not by doing a sorting and changing the order of the dataset.

2. In case of concatenated dataset if any duplicates those should be removed and the order of dataset input file should be maintained.

3. For direct datasets the duplicates should be removed

I welcome a solution on this.

Jai

Frank Yaeger · Posted: Tue Oct 14, 2008 8:48 pm

Your "rules" are not clear.

Which field or fields define a duplicate?

Which fields/values tell us we have a concatenated dataset?

Which fields/values tell us we have a "direct dataset"?

Please do a better job of explaining what you want to do based on the fields in the input records.

vjai6977 · New User Joined: 08 Aug 2008 Posts: 19 Location: Chennai

Hi Frank, I had re-structured the input dataset as below.

0______________25___51__ 55___59___63
TEST.INPUT.FILE1(0)***JN1**PN1**PS1**DD1
TEST.INPUT.FILE1(-1)**JN1**PN1**PS1**DD1
TEST.INPUT.INPUT1(0)*JN1**PN1** PS1**DD2
TEST.INPUT.INPUT2****JN1**PN1**PS1**DD3

TEST.INPUT.INPUT1****JN2**PN2**PS1**DD1
TEST.INPUT.INPUT2****JN2**PN2**PS1**DD2
TEST.INPUT.INPUT2****JN2**PN2**PS1**DD2
TEST.INPUT.INPUT1****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT2****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT3****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT4****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT5****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT6****JN3**PN3**PS1**DD1

TEST.INPUT.INPUT1****JN4**PN4**PS1**DD1

Here '*' is a blank space.

1. The fields on position 0 thru 25 should be considered for eliminating duplicates.

2. If the values on fields 51-53 / 55-57 / 59-61 / 63-65 repeats in more then one row those are concatenated dataset entries. Here duplicates if any should be eliminated and the dataset order should be maintained.

3. For the column positions stated in point 2, if the values in one row is different from the next row then those are direct datasets. For all these duplicates should be removed on position 0-25

I need the duplicates datasets removed from entries 0-25 for all rows, but the dataset order against concatenated entries (means values in columns 51 - 65 repeated more then one row) should not get jumbled.

I thank you for you time to look into my query.

Jai

Frank Yaeger · Posted: Wed Oct 15, 2008 10:26 pm

I've tried, but I just can't figure out what you're trying to do. I don't know if you're matching up the dsnames and the other four fields or just the other four fields. I don't know if by "duplicates if any should be eliminated" you mean to eliminate all records with a match or just keep the first record with a match.

It would really help if you would show example of the input records for each case and the expected output record (or none) for that case. Particularly, the case where you want to remove one or more records.
For example:

Input
TEST.INPUT.FILE1(0)***JN1**PN1**PS1**DD1
TEST.INPUT.FILE1(-1)**JN1**PN1**PS1**DD1

Output
?

The dsnames don't match, but the four fields match so do you want the first record, both records or neither record?

Input
TEST.INPUT.INPUT2****JN2**PN2**PS1**DD2
TEST.INPUT.INPUT2****JN2**PN2**PS1**DD2

Output
?

The dsname and four fields match so do you want the first record, both records or neither record?

Input
TEST.INPUT.INPUT1****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT2****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT3****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT4****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT5****JN3**PN3**PS1**DD1
TEST.INPUT.INPUT6****JN3**PN3**PS1**DD1

Output
?

The dsnames don't match, but the four fields match so do you want the first record, all records or no records?

vjai6977 · New User Joined: 08 Aug 2008 Posts: 19 Location: Chennai

Hi Frank,

My requirement is

1. I should get the duplicate datasets eliminated on column positions 1-25

2. At the same time, the dataset order of concatenated datasets should
not get changed after sort.

This I should have told you earlier

I worked out the below JCL and was able to get the required output.

Frank Yaeger · Posted: Sat Oct 18, 2008 1:41 am