View previous topic :: View next topic
|
Author |
Message |
andrea
New User
Joined: 08 Jun 2012 Posts: 9 Location: Italia
|
|
|
|
Hi all,
first of all, well foud to everybody.
Now, that's my need (I already searched for a solution, but I wasn't be able to find it).
I have two files, produced by two different procedures (an unknown* old one, and a brand new one), that should compare, but they, often are a bit different.
*unknonw = without source, and without any documentation about, developed many years ago from retired people.
The files ar sequential flat files, without any key or something similar.
I need to compare them and extract lines that are equal, putting them in a new file in the same order in which they are in original files.
It could be useful (but it's not a must) to extract also lines that are only in the first file, and lines in the second one.
I posted my question here, because the only suggestion I found, require use of DFSORT or ICETOOL, that, as I tried, change order of lines extracted (I tried comparing a cobol program, with an its lightly modified copy, and the result file begins with the blank lines ).
It's important to say that I can not know if in each file there are records that can appear as duplicates, even in a far position. |
|
Back to top |
|
|
krishna_ragav
New User
Joined: 29 Oct 2010 Posts: 10 Location: Chennai
|
|
|
|
Hi,
Need more clarity on your requirement. There should be some fields which must be present in both the files. Please look for common fields and try your options. |
|
Back to top |
|
|
andrea
New User
Joined: 08 Jun 2012 Posts: 9 Location: Italia
|
|
|
|
krishna_ragav wrote: |
Hi,
Need more clarity on your requirement. There should be some fields which must be present in both the files. Please look for common fields and try your options. |
I don't know the real content of the files, I only know that, when the one produced by the new procedure should compare with the old one.
Extracting differences will help developers in tuning the new application (or will let them be confident that differences are caused by old/dirty data).
According to that, I con only see files as strings of data, not as a set of fields. |
|
Back to top |
|
|
Nic Clouston
Global Moderator
Joined: 10 May 2007 Posts: 2455 Location: Hampshire, UK
|
|
|
|
You can get sort to add a sequence number as it READS the files, then do your analysis and then sort the output files on the sequence number that was added on input and build your output without the keys. Samples abound. But I suspect you need keys unless you use the whole record as the key. |
|
Back to top |
|
|
andrea
New User
Joined: 08 Jun 2012 Posts: 9 Location: Italia
|
|
|
|
I saw in section about DFSORT/ICETOOL some solutions using that utilities.
But, any of these I tried modifies the original order of my files.
It's important that I say, for example, that the n-th record of the first file doesn't exists on the second, or vice-versa.
Imagine to have:
File1
AAAAA
BBBBB
CCCCC
DDDDD
AAAAA
EEEEE
File2:
AAAAA
BBBBB
FFFFF
CCCCC
CCCCC
EEEEE
In this case, records 1, 2, 3, 6 of File1 are conteined in File2, record 4 and 6 doesn't
Records 1, 2, 4, 6 of File2 are conteined in File1, records 3 and 5 doesn't |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Don't you have a file comparison product available? |
|
Back to top |
|
|
andrea
New User
Joined: 08 Jun 2012 Posts: 9 Location: Italia
|
|
|
|
Bill Woodger wrote: |
Don't you have a file comparison product available? |
I thought to use ISRSUPC, but I couldn't find any useful parameter.
It's a particular situation in wich we have not access to a list of products installed, so it's adviceble to use utilities included in a standard environment, or to write specific programs in COBOL. |
|
Back to top |
|
|
Anuj Dhawan
Superior Member
Joined: 22 Apr 2006 Posts: 6250 Location: Mumbai, India
|
|
|
|
For a start - let's consider the entire records as the key. But we still need to know, what is LRECL of the inputs, how many records in both the files, what are your options to choose from -- e.g.: SORT (which one - DFSORT or SyncSort), COBOL any other language. |
|
Back to top |
|
|
andrea
New User
Joined: 08 Jun 2012 Posts: 9 Location: Italia
|
|
|
|
Anuj Dhawan wrote: |
For a start - let's consider the entire records as the key. But we still need to know, what is LRECL of the inputs, how many records in both the files, what are your options to choose from -- e.g.: SORT (which one - DFSORT or SyncSort), COBOL any other language. |
LRECL (I'm in an other place, this week) should be 560, RECFM=FB (I'm sure about it).
Amount of records changes day by day (is a daily job, actually); until now we can estimate about 50-60.000 records each day.
Both the files should be identical, so number of records is about the same.
The option for sort is DFSORT, I'm sure.
About languages, people involved in this task know only COBOL |
|
Back to top |
|
|
Pandora-Box
Global Moderator
Joined: 07 Sep 2006 Posts: 1592 Location: Andromeda Galaxy
|
|
|
|
For this input
File1
Code: |
AAAAA
BBBBB
CCCCC
DDDDD
AAAAA
EEEEE |
File2:
Code: |
AAAAA
BBBBB
FFFFF
CCCCC
CCCCC
EEEEE |
you wanted to find the matched and unmatched records with its line number in the file ?
Please confirm |
|
Back to top |
|
|
andrea
New User
Joined: 08 Jun 2012 Posts: 9 Location: Italia
|
|
|
|
Pandora-Box wrote: |
For this input
File1
Code: |
AAAAA
BBBBB
CCCCC
DDDDD
AAAAA
EEEEE |
File2:
Code: |
AAAAA
BBBBB
FFFFF
CCCCC
CCCCC
EEEEE |
you wanted to find the matched and unmatched records with its line number in the file ?
Please confirm |
In order of importance:
1) I need to extract a file contaning only matching records, in the same order of File1
2) Producing two more files containing records of File1 not present in File2 , and viceversa
3) It can be useful to report the line numbers of, at least, non matching records.
However, file produced must appear identical to original ones: the only difference will be the number of records.
When new procedure will be identical to the old one, File1 and File2 will compare, File with matching records will compare to them, and the two files contanining unmatching records will be empty. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19244 Location: Inside the Matrix
|
|
|
|
Hello,
If there is no "key" and the records have no "sequence" other than arrival order, how is a duplicate or a difference identified?
It may help if you post some "real" data (not 500+ bytes, but only enough to demonstrate the actual data. If the data is sensitive, change the values consistently between the 2 sample input files. |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
this is the kind of thing that superc does very well.
taking advantage of the C.4.1 Update control file (LINE Compare Type) generated,
one could parse this and use as input to sort to generate the
files desired (matches, inserts, deletes).
you want the files compared line by line.
sort does not do this well when there is no key involved.
what you want is a utility that will and that would be a 3.12.
because of superc output methodology, you would only have the sequence of the record in one-or-the-other-file, which could then be used as a key
(records > 133 would not display the complete record)
to obtain the complete record for sort to generate the actual files. |
|
Back to top |
|
|
|