two Files compare - having duplicates in both files

Kothai Jayaraj · New User Joined: 08 Aug 2012 Posts: 3 Location: INDIA

Hi,

The below is my requirement,

File1:
Key Date comment
---- ---------- ----------
101 20-10-12 AAAA
101 21-10-12 BBBB
101 23-11-12 CCCC
102 12-11-12 DDDD
102 12-12-12 AAAV
103 11-11-12 BBBB

File 2:

Key1 Item number DATA
----- --------------- -------
100 1 Data
100 2 Data
100 3 Data
101 1 Data
101 2 Data
101 4 Data
102 1 Data
102 4 Data
103 1 Data
103 9 data

My expected output file should have all the combination of matching records

101 20-10-12 AAAA 1 DATA
101 20-10-12 AAAA 2 DATA
101 20-10-12 AAAA 4 DATA
101 21-10-12 BBBB 1 DATA
101 21-10-12 BBBB 2 DATA
101 21-10-12 BBBB 4 DATA
102 12-11-12 DDDD 1 DATA
102 12-11-12 DDDD 4 DATA
102 12-12-12 AAAV 1 DATA
102 12-12-12 AAAV 4 DATA
103 11-11-12 BBBB 1 Data
103 11-11-12 BBBB 9 Data

Input file1:
Having Duplicates with different dates and comments

Input file2
Having Dupicated with different item no and data

Output file should be written whenever there is a matching key found between two files for all the combination that has in inputfile 1 and inputfile 2

My idea is to read the input file1 repeatedly comparing the key with itself till the key changes and store it in the array, then read the second file and compare the key of second file with first file ke, if key matches then loop the array to write records for each array occurance and then read the second file till the key changes to repeat the same. I am not sure if this program logic will work. Please Help me with your ideas

dbzTHEdinosauer · Posted: Wed Aug 08, 2012 10:26 am

before we discuss logic,
explain why
101 23-11-12 CCCC
is not included in the output?

now, logic:
this is a very typical Sort problem. that would be the better solution.

if you insist that it must be cobol,
then post this homework in the student forum

Kothai Jayaraj · New User Joined: 08 Aug 2012 Posts: 3 Location: INDIA

Sorry I missed to include 101 23-11-12 CCCC combination in the output file by mistake

dbzTHEdinosauer · Posted: Wed Aug 08, 2012 10:57 am

the solution to your problem if you have DFSORT is
Match, FB, keys in different places, duplicates
which is one of the topics in the SORTRCK.PDF which is on page 27.

obviously you need to modify so that it looks for the keys in the same place on both files.

Peter cobolskolan · Posted: Wed Aug 08, 2012 11:01 am

Kothai Jayaraj · New User Joined: 08 Aug 2012 Posts: 3 Location: INDIA

Thanks Dick for your suggestion, I will try in Sort. There were some validations for each field thats i why i have chosen the Cobol coding, I think after getting the cartesian combination through Sort and will do validations writing simple program.

dbzTHEdinosauer · Posted: Wed Aug 08, 2012 11:54 am

depending upon the validation required,
there is a lot that SORT can accomplish.

dick scherrer · Posted: Wed Aug 08, 2012 6:47 pm

Hello,

I believe you have already downloaded the 2-file match/merge sample code form the "Sticky". If you need a COBOL solution, modify that code to handle the situations of your requirement.

It will be far easier to modify code that is already proven (or should be far easier) than starting from scratch.

Recommend you look into sort as suggested, but if it must be code, you already have a start. Indeed, more than a start - it is nearly done. . .

dick scherrer · Posted: Wed Aug 08, 2012 6:50 pm

Follow on:

Handling duplicates in both files will take a bit more thought regardless of whether you use the sort or code. You need to determine which fileA records should match with which fileB records.

GuyC · Posted: Wed Aug 08, 2012 8:56 pm

I would reverse it and store rec2 in the array:

dick scherrer · Posted: Wed Aug 08, 2012 9:08 pm

Hello,

Yup, that could work, depending on which of the 8 duplicates from fileB and the 6 duplicates from fileA should match. . . (these are made up "stats" but the concern is real and often mis-handled).

Same issue with the sort or the sample match/merge code from the sticky. How to deal with unbalanced sets of duplicates. . .

Some kind of rule must be established rather than "just let'er rip" and use what comes out

GuyC · Posted: Wed Aug 08, 2012 9:19 pm

@dick : He said he wanted the cartesian product

dbzTHEdinosauer · Posted: Wed Aug 08, 2012 9:30 pm

actually,
the TS used the redundant phraseology:
cartesian combination

dick scherrer · Posted: Wed Aug 08, 2012 11:30 pm