View previous topic :: View next topic
|
Author |
Message |
ashishsr123
New User
Joined: 06 May 2008 Posts: 33 Location: Chennai
|
|
|
|
Hi,
Interview question !
Scenario:
We have 3 files:
File A : contains Merchant ID only..10 in number.
File B: Contains transaction records for various merchants. Here are Merchants including merchants in file A. This file is HUGE ,having millions of records.
Output:
File C: Having transaction details of merchants in file A.
Note: File B has records which has merchant ID's as well. Merchant id is numeric number.
We have to find most efficient way to achieve this. A simple algorithm is required irrespective of any language.
Any ideas:
I suggest we sort the file b and then do binary search on file B against file merchant id is file A.( Now i see there will be lot issues with this as binary search would pic the first matching record and move out.)
We could also do a sequential search on file B and write record to file C when we have a match for merchant ID in file B. |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8697 Location: Dubuque, Iowa, USA
|
|
|
|
Write a short program which reads file A into a table in memory. Make one pass against file B, checking each merchant id against table in memory; if there is a match output file B record to file C.
Your suggestion requires random access to the file for the binary search; I'm not even sure it would be possible to implement this under VSAM. Binary search also wouldn't necessarily find the first matching record -- depending on how the keys occur, any matching record could be selected. You would then have to read backward and forward in the file B to find the other matches.
As far as your suggestion of a sequential search on B, why make one pass against millions of records for each record of A when a table will reduce it to one pass, total? |
|
Back to top |
|
|
ashishsr123
New User
Joined: 06 May 2008 Posts: 33 Location: Chennai
|
|
|
|
Not sure what you mean by table in memory...you mean using.. occurs ? |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
use sort to match a and b and output to c.
no program, only control cards. |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8697 Location: Dubuque, Iowa, USA
|
|
|
|
Yes, occurs in COBOL. |
|
Back to top |
|
|
sajjan jindal Warnings : 1 New User
Joined: 09 Sep 2007 Posts: 60 Location: india
|
|
|
|
I assume that the File B will consist a single matching record for the record in File - A (however the below algo wont need much of modification for multiple records into the File B)
1. Start Reading the records from both the files until EOF
2. If File-A-Rec > File-B-Rec
Read next record from the File-B.
3. Else If File-A-Rec < File-B-Rec
Read next record from the File-A.
4. Else if File-A-Rec = File-B-Rec
write the record to File-C-Rec.
Read next record from the File-A.
Read next record from the File-B.
5. go to step 1 |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
For this particular question, possibly the most important bit of information wasn't provided/posted. . .
Are file A and file B in merchant id sequence or not?
If they are not in sequence, Rovert's suggestion would be most efficient.
If they are in sequence, DBZ's sort solution would be efficient.
If the rules were that this must be done with a program, code like pseudo code from sajjan jindal could work well. This is known as a "2-file match/merge" or "line balance". There is a "Sticky" near the top of the cobol forum that contains working code that will provide this functionality with very minor modification. |
|
Back to top |
|
|
ashishsr123
New User
Joined: 06 May 2008 Posts: 33 Location: Chennai
|
|
|
|
Hello,
Dick:
Both the files are not sorted
sajjan jindal:
Your solution would not fit here as File B can have many matching records for File A.
Your solution would work in both the Files are sorted. Since it contains millions of records , it is not feasible.
I think Robert solution is simple and best. |
|
Back to top |
|
|
Arun Raj
Moderator
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
|
|
|
|
Quote: |
We have 3 files:
File A : contains Merchant ID only..10 in number. |
ashishsr123,
Do you mean that you'll have 10 records in File-A? If you have only a few records in File A, you can build INCLUDE statements out of this and apply this sort card on your huge file - File B. Kind of similar logic as Robert's, but using a utility.
Remember, this solution can be implemented ONLY if you have a limited number of records in File-A. |
|
Back to top |
|
|
|