Compare one Million Records with 50 Records

vbhat · New User Joined: 29 Apr 2005 Posts: 38

1. I have two files. In one file there are millions of
records. and in the second file there are only 50
records. i want to compare these two files and if
matches i want to write in a third file. what is logic
in cobol. Only in cobol not in JCL.

kanak · Moderator Joined: 12 Mar 2005 Posts: 252 Location: India

you want to only those record which are present in both the file, then sort the first file, and take the sorted file as input to ur cobol program and perform binary search on the file which contains millions of record then to match the record present in the smaller file.
i m sujjesting binary search as it will be faster and more efficient in term of CPU ussage. Then when u find a record present in both the file write to ur output file.
if u take sorting as external one then i think it will be more better.
if you have any better idea, plz let me know.

avalanches · New User Joined: 10 May 2005 Posts: 28

I'm not sure if the following the 'most' effective way. But, depends upon the records in the smaller file, it could be effective.

1. Sort mn rcds file & 50 rcds file in the same order.
2. From 50 rcds file get 1 rcd.
3. From mn rcds file get 1 rcd. Try finding out a match in mn rcds file. Upon finding one, write it out in the third file. If the mn rcds file is lesser, get next record from mns record file. If 50 rcds file record is lesser then do step 2 till all 50 records are exhausted OR mns records are exhausted.

Cheers,
avalanches.

mmwife · Super Moderator Joined: 30 May 2003 Posts: 1592

If this is a ques resulting from a real life (business) problem, you have a bigger system design problem, but I suppose, it's one of those "one time only" problems that wind up running one a week.

The key is to avoid (or reduce) the reads to the 1MM rec file.

First off, I'd look around to see if the 1MM data exists in a database or VSAM (keyed) file and do 50 direct reads from there.

If you must read thru the 1MM file take advantage of its existing sort seq (if any) and only sort the 50 rec file in 1MM order. Also use the 50 rec file as the "controlling" file (i.e. EOF on that file ends the process).

MGIndaco · Posted: Tue May 31, 2005 6:01 pm

I think that the better way is a binary search that in each case require 7 or less access to determine the existence of the record and so, in your case you will have not than 350 access to satisfy your search.
The only clause is that both files must be sorted for the same key.