Count number of duplicate records

nayanishpatil · New User Joined: 16 Aug 2007 Posts: 14 Location: INDIA

Hi,

Is there a way to count the number of duplicate records in an input file.

For example, input file are having the following records:
AAAAAAAAA
VVVVVVVVV
GGGGGGGG
AAAAAAAAA
HHHHHHHHH
AAAAAAAAA
GGGGGGGG
VVVVVVVVV

Then, as we can see that AAAAAAAAA record is occuring 3 times,
VVVVVVVVV record is occuring 2 times and GGGGGGGG records is occuring 2 times.

dick scherrer · Posted: Tue Sep 25, 2007 12:00 am

Hello,

Please post what you want your output to look like.

nayanishpatil · New User Joined: 16 Aug 2007 Posts: 14 Location: INDIA

If the input record AAAAAAAAA occurs 3 times, then AAAAAAAAA record should be written in one output file,

VVVVVVVVV output record occurs 2 times, as well as GGGGGGGG record occurs 2 times. Then VVVVVVVVV and GGGGGGGG records must be written to another file.

Similarly, the HHHHHHHHH record must be written to a separate file.

CICS Guy · Posted: Tue Sep 25, 2007 1:13 am

Wow, where you got "Count number of duplicate records", I just do not understand....

What happens if XXXXXXXXX occsur 102 times?

Craq Giegerich · Posted: Tue Sep 25, 2007 1:19 am

I suppose you want to do this without sorting the input file.

dick scherrer · Posted: Tue Sep 25, 2007 2:14 am

Hello,

krisprems · Posted: Tue Sep 25, 2007 1:22 pm

nayanishpatil

This DFSORT/ICETOOL JOB, counts the occurance of the key that have duplicate records.

murmohk1 · Posted: Tue Sep 25, 2007 2:34 pm

krisprems,

If I understood the original post properly, Nayanish wants to write all the records depending on the occurences in one file.

Let me put this way, AAAAAAAAA record occured thrice. So anyother record set (eg ZZZZZZZZZ etc...) which has repeated 3 times should go with AAAAAAAAA in one file (say this file OCCUR3).

Whereas VVVVVVVVV & GGGGGGGGG record occured twice. So he wanted to write these records in one file (say this as OCCUR2) etc.....

The record HHHHHHHH should go to another file (say OCCUR1).

krisprems · Posted: Tue Sep 25, 2007 3:25 pm

This DFSORT/ICETOOL JCL, writes the records having the key with one
occurance into one file, and 2 occurance in to 2nf file, and 3 occurance in to third file.

murmohk1 · Posted: Tue Sep 25, 2007 6:02 pm

Kris,

dick scherrer · Posted: Tue Sep 25, 2007 6:27 pm

Hello,

Let's say you have the understood the requirement. . .

What happens when there is a count other than 1, 2, or 3? As i asked earlier, what happens if there are 900 (ok, that's too many, so let's say 300) different counts? That is too many dd statements for one step.

I'd be interested in how this output wouild be used and maybe we can offer more alternatives.

Frank Yaeger · Posted: Tue Sep 25, 2007 9:18 pm

You don't need three passes over the input to do this. You can do it in one pass with a DFSORT job like the following:

spath12 · New User Joined: 22 Jul 2009 Posts: 2 Location: Gurgaon

I have tried the first code for getting the duplicate records count but records are coming from second column.

=COLS> ----+----1----+----2----+----3----+----4
****** ***************************** Top of Dat
000001 1(1,10,CH) VALUE COUNT
000002 ABCDFFF 000000000000007
000003 KUMAR 000000000000006
000004 AAAAAAA 000000000000007

Please suggest that how to get the records from first column and in first line char 1 will not come with (1,10,CH).

dick scherrer · Posted: Thu Nov 19, 2009 9:22 pm

Hello,

You have replied to a topic that has been inactive for over 2 years.

You have also not very clearly described what you want to do. Describe the "rules" for getting from your input to the desired output.

Please post the output you want from the sample input. If a more representative sample is needed, add some more data to show the possible situations. Once the input sample has been built, show the output you want from the sample input.

Also mention the recfm and lrecl of all files.

Frank Yaeger · Posted: Thu Nov 19, 2009 11:08 pm

spath12,

If I understand you correctly, you do not want the ANSI carriage control character that DFSORT's OCCUR usually puts in the output of the report (e.g' 1' in column 1 for page eject).

You can eliminate that character by using the NOCC operand, e.g.

OCCUR NOCC FROM(...

For complete details on the OCCUR operator of DFSORT's ICETOOL, see:

publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/ICE1CA40/6.10?DT=20090527161936

spath12 · New User Joined: 22 Jul 2009 Posts: 2 Location: Gurgaon

Hi Frank,

Yes, you are correct and I wanted to remove the ANCI carriage control character. I was not aware of this feature of DFSORT so getting the 1 in column 1 for page eject.

After using the NOCC i got the desired output.

Thanks Alot