IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

How to match two files having duplicates


IBM Mainframe Forums -> COBOL Programming
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
gvel19

New User


Joined: 20 Jul 2008
Posts: 19
Location: Schenactady, US

PostPosted: Wed Oct 01, 2008 1:48 pm
Reply with quote

I have two input files. Whereas I need to match the files using keys.

File-1: (Sorted on key and no dups)
------
1
2
3
4
5
6

File-2: (sorted on key and have duplicates)
-------
1
2
4
4
4
5
5
6
I need write the matched records into an output file.I have tried but I'm not able to take care of the duplicates.It would be great if some one gives me hint to tackle the dups.

Thanks,
Vel
Back to top
View user's profile Send private message
expat

Global Moderator


Joined: 14 Mar 2007
Posts: 8797
Location: Welsh Wales

PostPosted: Wed Oct 01, 2008 1:53 pm
Reply with quote

Have you thought of using one of the sort products to do this for you ?

There are so many examples of available solutions in the SORT / JCL forums.
Back to top
View user's profile Send private message
karthikr44

Active User


Joined: 25 Aug 2007
Posts: 235
Location: Chennai

PostPosted: Wed Oct 01, 2008 2:17 pm
Reply with quote

Hi,

Please post the sample output for ur example. I want to know whether u want matched records from file1 or file2.

Regards
R KARTHIK
Back to top
View user's profile Send private message
Escapa

Senior Member


Joined: 16 Feb 2007
Posts: 1399
Location: IL, USA

PostPosted: Wed Oct 01, 2008 2:21 pm
Reply with quote

Quote:
I have tried but I'm not able to take care of the duplicates.It would be great if some one gives me hint to tackle the dups

What is the logic you are using?
Back to top
View user's profile Send private message
gvel19

New User


Joined: 20 Jul 2008
Posts: 19
Location: Schenactady, US

PostPosted: Wed Oct 01, 2008 4:23 pm
Reply with quote

Hi Karthik,

My output should contain
1
2
4
4
4
5
5
6
My output should contain the matched records of file-1.
Back to top
View user's profile Send private message
roopannamdhari
Warnings : 1

New User


Joined: 14 Sep 2006
Posts: 71
Location: Bangalore

PostPosted: Tue Oct 07, 2008 11:00 am
Reply with quote

Hi Karthik,

Code:
My output should contain
1
2
4
4
4
5
5
6
My output should contain the matched records of file-1.


output should contain file-1 r file-2.bcz here your output is having file-2 records
Back to top
View user's profile Send private message
Escapa

Senior Member


Joined: 16 Feb 2007
Posts: 1399
Location: IL, USA

PostPosted: Tue Oct 07, 2008 4:30 pm
Reply with quote

ip1
Code:

1
2
3
5
6

ip2
Code:

1
2
4
4
4
5
5
6

Here i assume that you want all the instances of file2 rec which are present in file1
Code:

DATA DIVISION.                         
FILE SECTION.                           
FD FILE1.                               
01 REC1.                               
    02 REC1-CMP-KEY PIC 9(1).           
    02 FILLER PIC X(79).               
FD FILE2.                               
01 REC2.                               
    02 REC2-CMP-KEY PIC 9(1).           
    02 FILLER PIC X(79).               
WORKING-STORAGE SECTION.               
77 EOF1 PIC X VALUE 'N'.               
77 EOF2 PIC X VALUE 'N'.               
PROCEDURE DIVISION.                     
    OPEN INPUT FILE1 FILE2.             
    READ FILE1 AT END MOVE 'Y' TO EOF1.
    READ FILE2 AT END MOVE 'Y' TO EOF2.
    PERFORM READ-BOTH-FILES               
    UNTIL EOF1 = 'Y' OR EOF2 = 'Y'.       
    CLOSE FILE1 FILE2.                     
    STOP RUN.                             
READ-BOTH-FILES.                           
    EVALUATE TRUE                         
    WHEN REC1-CMP-KEY = REC2-CMP-KEY       
    PERFORM MATCH-PARA                     
    WHEN REC1-CMP-KEY > REC2-CMP-KEY       
    PERFORM READF2                         
    WHEN REC1-CMP-KEY < REC2-CMP-KEY       
    PERFORM READF1                         
    END-EVALUATE.                         
MATCH-PARA.                               
    DISPLAY REC1.                         
    READ FILE2 AT END MOVE 'Y' TO EOF2.   
    IF REC1-CMP-KEY NOT = REC2-CMP-KEY THEN
    READ FILE1 AT END MOVE 'Y' TO EOF1.     
READF1.                                     
    READ FILE1 AT END MOVE 'Y' TO EOF1.     
READF2.                                     
    READ FILE2 AT END MOVE 'Y' TO EOF2.     

Output will be
Code:

1
2
5
5
6
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Wed Oct 08, 2008 1:11 am
Reply with quote

Hello,

The posted code does not work for all cases. . . icon_sad.gif

Unfortunately, it will work some of the time. Due to insufficient testing/test data it would fail in production. It would be better if it abended, but it will most likely only give incorrect output sometimes. Very difficult to find sometimes.
Back to top
View user's profile Send private message
Escapa

Senior Member


Joined: 16 Feb 2007
Posts: 1399
Location: IL, USA

PostPosted: Thu Oct 09, 2008 12:27 pm
Reply with quote

Quote:
Unfortunately, it will work some of the time. Due to insufficient testing/test data it would fail in production. It would be better if it abended, but it will most likely only give incorrect output sometimes. Very difficult to find sometimes.


Hi Dick,
Am confused with this... icon_confused.gif icon_confused.gif icon_confused.gif

Below are some of ip1 ip2 and o\p i have tested and its working as expected..
IP1 KEYS-----------IP2_KEYS--------------O/P
------------------------------------------------
1,2,3,5,6----------1,2,4,4,4,5,5,6--------1,2,5,5,6
EMPTY--------------1,2,4,4,4,5,5,6--------EMPTY
1,2,3,4------------1,2,4,4,4,5,5,6--------1,2,4,4,4
1,2,3,4------------EMPTY------------------EMPTY
1,2,3,4------------1,2,3,4----------------1,2,3,4
EMPTY--------------EMPTY------------------EMPTY
Back to top
View user's profile Send private message
star_dhruv2000

New User


Joined: 03 Nov 2006
Posts: 87
Location: Plymouth, MN USA

PostPosted: Tue Oct 14, 2008 3:12 pm
Reply with quote

Its will be good if you can use SORT JOIN statement. Following is an example for the same and hope will clear all your issues:


Code:

//SRTJNF1 DD *
1
2
3
4
//SRTJNF2 DD *
1
2
4
4
//SORTOUT DD SYSOUT=*
//SYSIN DD *
JOINKEYS FILE=F1,FIELDS=(1,1,CH,A)
JOINKEYS FILE=F2,FIELDS=(1,1,CH,A)
JOIN UNPAIRED
SORT FIELDS=COPY
/*


Hope this will resolve your issues icon_smile.gif

Happy coding!
Cheers
Back to top
View user's profile Send private message
Escapa

Senior Member


Joined: 16 Feb 2007
Posts: 1399
Location: IL, USA

PostPosted: Tue Oct 14, 2008 3:17 pm
Reply with quote

Quote:

Its will be good if you can use SORT JOIN statement.

May be. But as poster has posted it in COBOL FORUM it seems he wants in COBOL
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Oct 14, 2008 7:15 pm
Reply with quote

Hello,

You can only use JOINKEYS if the sort for the system is Syncsort. . .
Back to top
View user's profile Send private message
expat

Global Moderator


Joined: 14 Mar 2007
Posts: 8797
Location: Welsh Wales

PostPosted: Tue Oct 14, 2008 7:47 pm
Reply with quote

What happens in your program in neither input file is sorted,

To me, if both files need to be in sorted order before processing, why not let the sort product do all of the work in one go rather than perform two sorts to get the input ready and then a COBOL program to do what SORT can do anyway.

file1 =
Code:

3
1
6
5


file 2 =
Code:

6
4
1
4
2
6
4
5
4
5
6
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Tue Oct 14, 2008 11:33 pm
Reply with quote

Hi Expat,

If the only thing the process needed to accomplish is the match, i might agree. What i am seeing more and more of is jobstreams that have many unneeded steps so that things can be done one-at-a-time (using the sort or other utilitites) - each requiring at least one pass of all the data.

Pretty much every process i've been asked to look at because of poor performance lately have been because no one properly defined the process and kept plugging in "one more" step. Usually a bit of design saves many of these singleton steps, but does require there be some "real" programmer available.

While on some systems 100k or a million records is considered a large file, most of what i've supported for years have run to the hundreds of millions records and cannot afford the multi-passes of the data.

The topic process almost surely needs some additional processing of the data other than just the match. . .
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> COBOL Programming

 


Similar Topics
Topic Forum Replies
No new posts Write line by line from two files DFSORT/ICETOOL 7
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts Merge two VSAM KSDS files into third ... JCL & VSAM 6
No new posts Joinkeys - 5 output files DFSORT/ICETOOL 7
No new posts How to append a PS file into multiple... JCL & VSAM 3
Search our Forums:

Back to Top