IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

File comparison to find most efficient way


IBM Mainframe Forums -> Mainframe Interview Questions
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
ashishsr123

New User


Joined: 06 May 2008
Posts: 33
Location: Chennai

PostPosted: Wed May 27, 2009 6:19 pm
Reply with quote

Hi,

Interview question !

Scenario:

We have 3 files:
File A : contains Merchant ID only..10 in number.
File B: Contains transaction records for various merchants. Here are Merchants including merchants in file A. This file is HUGE ,having millions of records.

Output:
File C: Having transaction details of merchants in file A.
Note: File B has records which has merchant ID's as well. Merchant id is numeric number.

We have to find most efficient way to achieve this. A simple algorithm is required irrespective of any language.

Any ideas:

I suggest we sort the file b and then do binary search on file B against file merchant id is file A.( Now i see there will be lot issues with this as binary search would pic the first matching record and move out.)

We could also do a sequential search on file B and write record to file C when we have a match for merchant ID in file B.
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8697
Location: Dubuque, Iowa, USA

PostPosted: Wed May 27, 2009 6:30 pm
Reply with quote

Write a short program which reads file A into a table in memory. Make one pass against file B, checking each merchant id against table in memory; if there is a match output file B record to file C.

Your suggestion requires random access to the file for the binary search; I'm not even sure it would be possible to implement this under VSAM. Binary search also wouldn't necessarily find the first matching record -- depending on how the keys occur, any matching record could be selected. You would then have to read backward and forward in the file B to find the other matches.

As far as your suggestion of a sequential search on B, why make one pass against millions of records for each record of A when a table will reduce it to one pass, total?
Back to top
View user's profile Send private message
ashishsr123

New User


Joined: 06 May 2008
Posts: 33
Location: Chennai

PostPosted: Wed May 27, 2009 8:39 pm
Reply with quote

Quote:
table in memory


Not sure what you mean by table in memory...you mean using.. occurs ?
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Wed May 27, 2009 9:03 pm
Reply with quote

use sort to match a and b and output to c.
no program, only control cards.
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8697
Location: Dubuque, Iowa, USA

PostPosted: Wed May 27, 2009 9:31 pm
Reply with quote

Yes, occurs in COBOL.
Back to top
View user's profile Send private message
sajjan jindal
Warnings : 1

New User


Joined: 09 Sep 2007
Posts: 60
Location: india

PostPosted: Mon Jul 06, 2009 11:42 am
Reply with quote

I assume that the File B will consist a single matching record for the record in File - A (however the below algo wont need much of modification for multiple records into the File B)

1. Start Reading the records from both the files until EOF
2. If File-A-Rec > File-B-Rec
Read next record from the File-B.
3. Else If File-A-Rec < File-B-Rec
Read next record from the File-A.
4. Else if File-A-Rec = File-B-Rec
write the record to File-C-Rec.
Read next record from the File-A.
Read next record from the File-B.
5. go to step 1
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Jul 06, 2009 7:16 pm
Reply with quote

Hello,

For this particular question, possibly the most important bit of information wasn't provided/posted. . .

Are file A and file B in merchant id sequence or not?

If they are not in sequence, Rovert's suggestion would be most efficient.

If they are in sequence, DBZ's sort solution would be efficient.

If the rules were that this must be done with a program, code like pseudo code from sajjan jindal could work well. This is known as a "2-file match/merge" or "line balance". There is a "Sticky" near the top of the cobol forum that contains working code that will provide this functionality with very minor modification.
Back to top
View user's profile Send private message
ashishsr123

New User


Joined: 06 May 2008
Posts: 33
Location: Chennai

PostPosted: Sat Jul 11, 2009 5:21 pm
Reply with quote

Hello,

Dick:
Both the files are not sorted

sajjan jindal:
Your solution would not fit here as File B can have many matching records for File A.
Your solution would work in both the Files are sorted. Since it contains millions of records , it is not feasible.

I think Robert solution is simple and best.
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Sat Jul 11, 2009 7:21 pm
Reply with quote

Quote:
We have 3 files:
File A : contains Merchant ID only..10 in number.
ashishsr123,

Do you mean that you'll have 10 records in File-A? If you have only a few records in File A, you can build INCLUDE statements out of this and apply this sort card on your huge file - File B. Kind of similar logic as Robert's, but using a utility.

Remember, this solution can be implemented ONLY if you have a limited number of records in File-A.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> Mainframe Interview Questions

 


Similar Topics
Topic Forum Replies
No new posts Compare 2 files and retrive records f... DFSORT/ICETOOL 3
No new posts FTP VB File from Mainframe retaining ... JCL & VSAM 8
No new posts Extract the file name from another fi... DFSORT/ICETOOL 6
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
Search our Forums:

Back to Top