View previous topic :: View next topic
|
Author |
Message |
Vijay Subramaniyan
New User
Joined: 06 Jul 2011 Posts: 14 Location: india
|
|
|
|
Hi,
I have two PS files in VB format . Each file is of length 27500. I want to
compare the two files record by record . I need to have the matched records in one dataset and the unmatched records in the other. I tried with superCE for this. It is taking more than 1 and half hours and still the job didnt get completed . Is there any way to accomplish this in sort ? I think the maximum number of bytes that we can specify in the control fields is 4082 odd. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Can you paste from the screen/batch job all the options/control cards that you are using for the SuperCe, please? |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
other than record count,
blocking factor of files
and bufno parm of dd statements. |
|
Back to top |
|
|
Vijay Subramaniyan
New User
Joined: 06 Jul 2011 Posts: 14 Location: india
|
|
|
|
This is the code used .
The block size of the file is 27504
//SUPERC EXEC PGM=ISRSUPC,
// PARM=(CHNGL,LINECMP,
// '',
// '')
//NEWDD DD DSN=DATX00D.PB.DXP2.HRDCPY.SCL.NEW1,
// DISP=SHR
//OLDDD DD DSN=DATX00D.GB.DXP2.HRDCPY.SCL.OUT,
// DISP=SHR
//OUTDD DD DSN=DATX00D.COVER.PAGE.TR.WORK1,DISP=(NEW,CATLG,DELETE),
// DATACLAS=JUMBO,DCB=(DATX00D.GB.DXP2.HRDCPY.SCL.OUTFILE) |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10886 Location: italy
|
|
|
|
superce is not for application file compare ...
it is for source code compare,
the joinkey process assumes sorted data,
superce does not make any assumptions,
joinkey is record oriented
superce is block oriented
( the update option will generate the update cards to get from source1 to source2)
the conclusion ... You are using the wrong tool
in this case faster to use the two file compare COBOL program that You can find here |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
Quote: |
in this case faster to use the two file compare COBOL program that You can find here |
To add to what Enrico mentioned:
There is a "Sticky" at the top of the COBOL part of the forum that is a wporking sample of a 2-file match/merge process. Download and Modify for your needs. |
|
Back to top |
|
|
Vijay Subramaniyan
New User
Joined: 06 Jul 2011 Posts: 14 Location: india
|
|
|
|
Thanks Dick.
But your code seems to work only if the files are in sequence.
The two input files that I have are of length 27500.
To make these files in sequence, I can not have a sort card like the below
SORT FIELDS=(5,27296,CH,A) . Is it not ?
What would you advise to achieve this( Making both the files in sequence) ? |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
If your files are not in the same sequence as each other, then you'll have a problem using anything to compare them.
Is there anything which makes the records unique? You could sort on that, without having to sort on the whole thing. If you get "nrarly unique" you might have some amount fields you can include in the sort. |
|
Back to top |
|
|
Vijay Subramaniyan
New User
Joined: 06 Jul 2011 Posts: 14 Location: india
|
|
|
|
Bill .
I dont see anything unique. Both the files are AFP files . No other option in achieving this ? Can we be able to achieve this in Cobol sort ? |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
why don't you explain how you are generating the two files,
and why they would have to be sorted for comparison?
you were not sorting them for the superc,
why now???
this has indeed been a thread meandering everywhere,
because no explanation or direction was given.
a software engineer started this silliness and the train has been derailed. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
You are trying to compare reports?
Are the input data the same?
Are the reports supposed to be different, or the same? |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
Quote: |
Both the files are AFP files |
Quote: |
You are trying to compare reports? |
pretty broad report
does your printer do photos? |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Well, for the picky...
You are trying to compare "documents"?
Are the input data the same?
Are the "documents" supposed to be different, or the same?
Reason being, if they (whatever they are as represented by your records) are supposed to be the same, then you do a one-to-one match on a "record number".
If the inputs are "different but equivalent" and the document definitions are unchanged, compare the inputs.
If everything is changed and the outputs are supposed to be different, then go whistle. |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
Quote: |
Are the "documents" supposed to be different, or the same? |
TS doesn't know, he is only following the requirement. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
Yes, matching 2 files requires they be in sequence.
If all you want to know if there is a difference, change the sample code to just read a record from each file and compare them. If they are equal, read the next. If there is a not equal, show the difference and stop because there would be no way for this code to determine which file was "different". Show the record number to makt it easier to look at the files to see why the difference.
The program would run to end of job if they all match and terminate when an unequal is found. |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
The attempt to match with SuperCE might(!) imply that the files do not match very well.
You really need to tell us what you are trying to match. We can assume "documents" as you mentioned AFP. Should they be the same? Identical, or logical, for instance if there is a "time" anywhere in the document pages? Does AFP put in control information of some sort, specific to the job?
I'm suspecting you're going to have to rethink it. However, try to answer everything and we'll get a clearer picture of what you have.
If they are "documents" there seems little point in sorting the records.
If not documents you might test sorting seven times on chunks of 4000, with EQUALS. However, you seem to have variable-length records which would throw additional spanners.
Do you have the same, exactly, number of records on each? If not....
If so, you could extract the first 4000 bytes, including the RDW, sort on the whole thing and compare that using JOINKEYS.
I think you'll end up rethinking, no matter what the requirement says... |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10886 Location: italy
|
|
|
|
Quote: |
Both the files are AFP files . |
as in Advanced Function Printing print files ?
in this case it is a pretty silly requirement and approach
the ratio data to control is pretty unfavorable
better to compare the originating data
and extracting random data fro AFP streams will just result in useless jung |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10886 Location: italy
|
|
|
|
and in general to simply check if two files match the most effective approach it to use
BYTE compare
FMSTOP ==> stop at first mismatch |
|
Back to top |
|
|
Vijay Subramaniyan
New User
Joined: 06 Jul 2011 Posts: 14 Location: india
|
|
|
|
Bill,
I would respond to your question with the following case
An input file having 1000 records is processed/formatted through a COBOL program giving an output file that has the same number of records .
The same input file having 1000 records is now processed/formatted through another COBOL program giving an output file that has 1010 records ( i. e 10 records have been inserted somewhere, the formatting of the remaining 1000 records are the same as the first COBOL program)
Now I would want to compare both the output files and the resultant that I I expect is the 10 records which were inserted.
Pls say me if I am not clear |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
let me see if i have it correctly:
file-A goes into pgm-1 creating file-B
file-A goes into pgm-2 creating file-B with additional records.
unless the goal is
to prove that pgm-2 creates the same stuff as pgm1 with the addition of new records
so that pgm-1 can be removed from the system,
why have both pgm-1 and pgm-2?
why not test both old-file-B and new-file-B
by inputing them to pgm-3 and insuring that the results are correct? |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Where does AFP come into this?
You do a "sideways" match.
Your "driver" is the first output.
Read driver until end of file.
With each record from driver:
Read record on subsidiary (the second output).
If records are equal, all is well, continue from Read driver...
If unequal, write to output file.
Continue from Read record on subsidiary...
When end of file on driver, write any remaining records on subsidiary to the output file.
At the end of this you should have your 10 extra records. You could include "record numbers" on the output if you wanted to know where they came from :-)
Note: this is a description of the process, not an indication as to how to structure your program :-) |
|
Back to top |
|
|
Vijay Subramaniyan
New User
Joined: 06 Jul 2011 Posts: 14 Location: india
|
|
|
|
Quote: |
The attempt to match with SuperCE might(!) imply that the files do not match very well.
You really need to tell us what you are trying to match. We can assume "documents" as you mentioned AFP. Should they be the same? Identical, or logical, for instance if there is a "time" anywhere in the document pages? Does AFP put in control information of some sort, specific to the job?
I'm suspecting you're going to have to rethink it. However, try to answer everything and we'll get a clearer picture of what you have.
If they are "documents" there seems little point in sorting the records.
If not documents you might test sorting seven times on chunks of 4000, with EQUALS. However, you seem to have variable-length records which would throw additional spanners.
Do you have the same, exactly, number of records on each? If not....
If so, you could extract the first 4000 bytes, including the RDW, sort on the whole thing and compare that using JOINKEYS.
I think you'll end up rethinking, no matter what the requirement says... |
Sorry for all the confusion . Yes. The two files are documents in AFP formats. Actually this requirement came into picture when
A defective program accepts an afp file as input and inserts some page(a number of records) records in production . As I say its a defective one , we have rectified the program and now the program prouces an output that may more or less be the same as the output that was produced by the defective one but with the correct number of pages( a number of records ) inserted.
Now I would want to compare both these files and send a compare report to my senior officials . The output file has more than 10 million records ( i said as 1000 for an example) . So storing the records in an array may not be feasible. I know I have confused you people a lot . Sorry again |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
OK, if the output is supposed to be otherwise identical other than the "extras" then you can proceed as I suggested.
However, things like dates/times of production can mess you up. AFP control information...
Code it up. Test it with your test data (into bad program and good program) and see if you have any problems with information outside of your control varying between the runs.
If you have a problem with that, establish whether it can reasonably be "masked" in its native format.
If it can't be masked natively, how about taking the "spool" files and masking? |
|
Back to top |
|
|
|