View previous topic :: View next topic
|
Author |
Message |
anand tr
New User
Joined: 12 Aug 2008 Posts: 41 Location: chennai
|
|
|
|
Hi,
I have two flat files (file1 & file2) which i use in a program.The number of records in both the files are not constant (as they are daily updated).
I need to search whether the record present in file1 is also present in file2 and write the same to o/p.
I am opting for a simple sequential search(reading 1 record from file1 and comparing with all the records in file2) and its consuming lot of time as the files are huge.
each time end of file2 is reached am closing and reopening it so that the first record is fetched for the next iteration. Is there any method so that this opening and closing could be avoided?
I guess, neither the cobol table could be used as the number of records in both the files are not constant. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
I am opting for a simple sequential search(reading 1 record from file1 and comparing with all the records in file2) and its consuming lot of time as the files are huge. |
Possibly the worst possible performance choice. . . As you have seen
What you need is a 2-file match/merge that only reads each file one time.
You need to sort both files in the same sequence (whatever you are trying to match on) if they are not already in this sequence.
At the top of this COBOL part of the forum is a "Sticky" that contains working code for what you want to do. Download that source file to your system, review it and modify it to do what you need. Here is the link:
ibmmainframes.com/viewtopic.php?t=22649
If there are questions, post back here. |
|
Back to top |
|
|
anand tr
New User
Joined: 12 Aug 2008 Posts: 41 Location: chennai
|
|
|
|
Thanks Dick,
But am not able to view/download the code. it says-"Sorry but you are not authorized to view or download this Attachment". |
|
Back to top |
|
|
Escapa
Senior Member
Joined: 16 Feb 2007 Posts: 1399 Location: IL, USA
|
|
|
|
As told by dick you need to sort input files first
Then you can use below logic..
Code: |
READ FILE1 AT END MOVE 'Y' to EOF1.
READ FILE2 AT END MOVE 'Y' to EOF2.
PERFORM READ-BOTH-FILES UNTIL EOF1='Y' or EOF2='Y'.
STOP RUN.
READ-BOTH-FILES.
EVALUATE TRUE
WHEN REC1-CMP-KEY = REC2-CMP-KEY
MOVE REC1 TO REC3
WRITE REC3
READ FILE1 AT END MOVE 'Y' to EOF1
READ FILE2 AT END MOVE 'Y to EOF2
WHEN REC1-CMP-KEY > REC2-CMP-KEY
READ FILE2 AT END MOVE 'Y' to EOF2
WHEN REC1-CMP-KEY < REC2-CMP-KEY
READ FILE1 AT END MOVE 'Y to EOF1
END EVALUATE.
|
|
|
Back to top |
|
|
anand tr
New User
Joined: 12 Aug 2008 Posts: 41 Location: chennai
|
|
|
|
Hi Dick,
I tried again and was able to download successfully. Thanks for that. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
You're welcome.
Also note that the code posted as a reply in this topic will not work correctly. It will only be close. The code you downloaded is complete, though you may need to make minor modificatoins to meet your requirement.
If there are questons, someone will be here. |
|
Back to top |
|
|
anand tr
New User
Joined: 12 Aug 2008 Posts: 41 Location: chennai
|
|
|
|
Thanks Dick,
ll surely try and get back to u .I guess the performance wud b be surely increased with this method. |
|
Back to top |
|
|
Escapa
Senior Member
Joined: 16 Feb 2007 Posts: 1399 Location: IL, USA
|
|
|
|
Thanks Dick for pointing that out. Code is corrected
Code: |
OPEN INPUT FILE1 FILE2.
OPEN OUTPUT FILE3.
READ FILE1 AT END MOVE 'Y' TO EOF1.
READ FILE1 AT END MOVE 'Y' TO EOF2.
PERFORM READ-BOTH-FILES
UNTIL EOF1 = 'Y' OR EOF2 = 'Y'.
CLOSE FILE1 FILE2 FILE3.
STOP RUN.
READ-BOTH-FILES.
EVALUATE TRUE
WHEN REC1-CMP-KEY = REC2-CMP-KEY
PERFORM MATCH-PARA
WHEN REC1-CMP-KEY > REC2-CMP-KEY
PERFORM READF2
WHEN REC1-CMP-KEY < REC2-CMP-KEY
PERFORM READF1
END-EVALUATE.
MATCH-PARA.
MOVE REC1 TO REC3.
WRITE REC3.
READ FILE1 AT END MOVE 'Y' TO EOF1.
READ FILE2 AT END MOVE 'Y' TO EOF2.
READF1.
READ FILE1 AT END MOVE 'Y' TO EOF1.
READF2.
READ FILE2 AT END MOVE 'Y' TO EOF2. |
|
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
if you are going to sort the files anyway, why not employ one of the sort tricks to accomplish this rather easy task - with sort.
to write the code to accomplish this, one has to think. |
|
Back to top |
|
|
anand tr
New User
Joined: 12 Aug 2008 Posts: 41 Location: chennai
|
|
|
|
Thanks Dick,
I tried the way you suggested and successfully ran the code in few seconds(compared to the 1 hour which was very huge using the older method)..
Brenholtz ,
Could you suggest or elaborate a bit on hw we can i accomplish the same. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello Sambhaji,
Quote: |
Thanks Dick for pointing that out. Code is corrected |
Sorry, but the code is still not going to work all of the time. . .
Hi Dick,
Quote: |
if you are going to sort the files anyway, why not employ one of the sort tricks to accomplish this rather easy task - with sort. |
If the only requirement is the "compare", yup, the sort would be the way to go. Nearly every requirement that my teams have involves more than just the compare - business processing is also involved (things like getting info from some database table(s) or vsam file(s). A few places i've been have gone overboard at not writing code (the ever popular "have to do this with a jcl"). I've been asked to look at processes that ran "too long" and the reason was multiple passes of several hundred million records - because they did things "one at a time" using the sort and other utilities rather than writing a couple of far easier to maintain and far better performing coded modules.
Hi Anand,
Quote: |
I tried the way you suggested and successfully ran the code in few seconds |
Good to hear it is working - thank you for the feedback |
|
Back to top |
|
|
Escapa
Senior Member
Joined: 16 Feb 2007 Posts: 1399 Location: IL, USA
|
|
|
|
Quote: |
Sorry, but the code is still not going to work all of the time. . .
|
Hi Dick, Can you tell me case when it wont work? |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hi Sambhaji,
The first thing i noticed is that the program will not work when either of the files has more records than the other.
I suggest you download the code from the "Sticky" and compare that to your example code. Keep in mind that the code in the "Sticky" works in over 100 different production systems. I put that "model" together before vsam was available and when processing sequential files part of every application. You will probably need to right-click / save-as to get the file.
If you find anything in the "sticky" code that is not clear, post any questions back here. |
|
Back to top |
|
|
Naveen_Babu
New User
Joined: 18 Jan 2006 Posts: 1
|
|
|
|
Hi,
You can use a cobol table to store the file content and use 'Search All' it will increase the performance further. You have mentioned that the file record count is not fixed so can't use a internal table, you can use a table in this case also; How?
Table Declaration -
01 WT-TSET-TABLE.
03 WT-TEST OCCURS 0 TO 500000 TIMES
DEPENDING ON WT-P-MAX
ASCENDING KEY IS WT-P-Field1
INDEXED BY WT-P-IDX.
05 WT-P-Field1 PIC 9(9) COMP-3.
05 WT-P-Field2 PIC X(5).
05 WT-P-Field3 PIC X(1).
1) Give OCCURS a Max number of records file can have
2) While storing the file contents into the tables increment WT-P-MAX for every record stored, so WT-P-MAX contains the file record count.
3) As per the 'DEPENDING' clause the table is automatically define with the value of the WT-P-MAX field.
Hope this helps... |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello Naveen and welcome to the forums,
Quote: |
You can use a cobol table to store the file content and use 'Search All' it will increase the performance further. |
Sorry, but this is incorrect. Doing this will use more cpu time than properly reading the 2 files and matching the "keys". While SEARCH ALL is faster than a serial/sequential SEARCH, it still uses far more cpu than a proper 2-file match.
There are times when dynamically building a table is proper, but matching files is not one of them.
It is typically done because the developer does not know how to properly code for the requirement (or is just somewhat lazy. . .). |
|
Back to top |
|
|
|