IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Comparing 2 unsorted files


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
andrea

New User


Joined: 08 Jun 2012
Posts: 9
Location: Italia

PostPosted: Mon Jun 11, 2012 3:05 pm
Reply with quote

Hi all,
first of all, well foud to everybody.

Now, that's my need (I already searched for a solution, but I wasn't be able to find it).

I have two files, produced by two different procedures (an unknown* old one, and a brand new one), that should compare, but they, often are a bit different.

*unknonw = without source, and without any documentation about, developed many years ago from retired people.

The files ar sequential flat files, without any key or something similar.

I need to compare them and extract lines that are equal, putting them in a new file in the same order in which they are in original files.

It could be useful (but it's not a must) to extract also lines that are only in the first file, and lines in the second one.

I posted my question here, because the only suggestion I found, require use of DFSORT or ICETOOL, that, as I tried, change order of lines extracted (I tried comparing a cobol program, with an its lightly modified copy, and the result file begins with the blank lines icon_biggrin.gif ).

It's important to say that I can not know if in each file there are records that can appear as duplicates, even in a far position.
Back to top
View user's profile Send private message
krishna_ragav

New User


Joined: 29 Oct 2010
Posts: 10
Location: Chennai

PostPosted: Mon Jun 11, 2012 3:54 pm
Reply with quote

Hi,

Need more clarity on your requirement. There should be some fields which must be present in both the files. Please look for common fields and try your options.
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 9
Location: Italia

PostPosted: Mon Jun 11, 2012 4:37 pm
Reply with quote

krishna_ragav wrote:
Hi,

Need more clarity on your requirement. There should be some fields which must be present in both the files. Please look for common fields and try your options.


I don't know the real content of the files, I only know that, when the one produced by the new procedure should compare with the old one.

Extracting differences will help developers in tuning the new application (or will let them be confident that differences are caused by old/dirty data).

According to that, I con only see files as strings of data, not as a set of fields.
Back to top
View user's profile Send private message
Nic Clouston

Global Moderator


Joined: 10 May 2007
Posts: 2455
Location: Hampshire, UK

PostPosted: Mon Jun 11, 2012 5:01 pm
Reply with quote

You can get sort to add a sequence number as it READS the files, then do your analysis and then sort the output files on the sequence number that was added on input and build your output without the keys. Samples abound. But I suspect you need keys unless you use the whole record as the key.
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 9
Location: Italia

PostPosted: Mon Jun 11, 2012 6:07 pm
Reply with quote

I saw in section about DFSORT/ICETOOL some solutions using that utilities.
But, any of these I tried modifies the original order of my files.

It's important that I say, for example, that the n-th record of the first file doesn't exists on the second, or vice-versa.

Imagine to have:
File1
AAAAA
BBBBB
CCCCC
DDDDD
AAAAA
EEEEE

File2:
AAAAA
BBBBB
FFFFF
CCCCC
CCCCC
EEEEE

In this case, records 1, 2, 3, 6 of File1 are conteined in File2, record 4 and 6 doesn't
Records 1, 2, 4, 6 of File2 are conteined in File1, records 3 and 5 doesn't
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Mon Jun 11, 2012 6:13 pm
Reply with quote

Don't you have a file comparison product available?
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 9
Location: Italia

PostPosted: Mon Jun 11, 2012 6:35 pm
Reply with quote

Bill Woodger wrote:
Don't you have a file comparison product available?


I thought to use ISRSUPC, but I couldn't find any useful parameter.
It's a particular situation in wich we have not access to a list of products installed, so it's adviceble to use utilities included in a standard environment, or to write specific programs in COBOL.
Back to top
View user's profile Send private message
Anuj Dhawan

Superior Member


Joined: 22 Apr 2006
Posts: 6250
Location: Mumbai, India

PostPosted: Mon Jun 11, 2012 7:04 pm
Reply with quote

For a start - let's consider the entire records as the key. But we still need to know, what is LRECL of the inputs, how many records in both the files, what are your options to choose from -- e.g.: SORT (which one - DFSORT or SyncSort), COBOL any other language.
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 9
Location: Italia

PostPosted: Mon Jun 11, 2012 7:25 pm
Reply with quote

Anuj Dhawan wrote:
For a start - let's consider the entire records as the key. But we still need to know, what is LRECL of the inputs, how many records in both the files, what are your options to choose from -- e.g.: SORT (which one - DFSORT or SyncSort), COBOL any other language.


LRECL (I'm in an other place, this week) should be 560, RECFM=FB (I'm sure about it).
Amount of records changes day by day (is a daily job, actually); until now we can estimate about 50-60.000 records each day.
Both the files should be identical, so number of records is about the same.

The option for sort is DFSORT, I'm sure.
About languages, people involved in this task know only COBOL
Back to top
View user's profile Send private message
Pandora-Box

Global Moderator


Joined: 07 Sep 2006
Posts: 1592
Location: Andromeda Galaxy

PostPosted: Mon Jun 11, 2012 7:57 pm
Reply with quote

For this input

File1
Code:
AAAAA
BBBBB
CCCCC
DDDDD
AAAAA
EEEEE

File2:
Code:
AAAAA
BBBBB
FFFFF
CCCCC
CCCCC
EEEEE


you wanted to find the matched and unmatched records with its line number in the file ?

Please confirm
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 9
Location: Italia

PostPosted: Mon Jun 11, 2012 8:31 pm
Reply with quote

Pandora-Box wrote:
For this input

File1
Code:
AAAAA
BBBBB
CCCCC
DDDDD
AAAAA
EEEEE

File2:
Code:
AAAAA
BBBBB
FFFFF
CCCCC
CCCCC
EEEEE


you wanted to find the matched and unmatched records with its line number in the file ?

Please confirm


In order of importance:
1) I need to extract a file contaning only matching records, in the same order of File1
2) Producing two more files containing records of File1 not present in File2 , and viceversa
3) It can be useful to report the line numbers of, at least, non matching records.

However, file produced must appear identical to original ones: the only difference will be the number of records.
When new procedure will be identical to the old one, File1 and File2 will compare, File with matching records will compare to them, and the two files contanining unmatching records will be empty.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19244
Location: Inside the Matrix

PostPosted: Mon Jun 11, 2012 10:14 pm
Reply with quote

Hello,

If there is no "key" and the records have no "sequence" other than arrival order, how is a duplicate or a difference identified?

It may help if you post some "real" data (not 500+ bytes, but only enough to demonstrate the actual data. If the data is sensitive, change the values consistently between the 2 sample input files.
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Mon Jun 11, 2012 11:41 pm
Reply with quote

this is the kind of thing that superc does very well.

taking advantage of the C.4.1 Update control file (LINE Compare Type) generated,
one could parse this and use as input to sort to generate the
files desired (matches, inserts, deletes).

you want the files compared line by line.
sort does not do this well when there is no key involved.

what you want is a utility that will and that would be a 3.12.

because of superc output methodology, you would only have the sequence of the record in one-or-the-other-file, which could then be used as a key
(records > 133 would not display the complete record)
to obtain the complete record for sort to generate the actual files.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts Write line by line from two files DFSORT/ICETOOL 7
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts Merge two VSAM KSDS files into third ... JCL & VSAM 6
No new posts Joinkeys - 5 output files DFSORT/ICETOOL 7
No new posts How to append a PS file into multiple... JCL & VSAM 3
Search our Forums:

Back to Top