Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Comparing 2 unsorted files

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM
View previous topic :: :: View next topic  
Author Message
andrea

New User


Joined: 08 Jun 2012
Posts: 6
Location: Italia

PostPosted: Mon Jun 11, 2012 3:05 pm    Post subject: Comparing 2 unsorted files
Reply with quote

Hi all,
first of all, well foud to everybody.

Now, that's my need (I already searched for a solution, but I wasn't be able to find it).

I have two files, produced by two different procedures (an unknown* old one, and a brand new one), that should compare, but they, often are a bit different.

*unknonw = without source, and without any documentation about, developed many years ago from retired people.

The files ar sequential flat files, without any key or something similar.

I need to compare them and extract lines that are equal, putting them in a new file in the same order in which they are in original files.

It could be useful (but it's not a must) to extract also lines that are only in the first file, and lines in the second one.

I posted my question here, because the only suggestion I found, require use of DFSORT or ICETOOL, that, as I tried, change order of lines extracted (I tried comparing a cobol program, with an its lightly modified copy, and the result file begins with the blank lines icon_biggrin.gif ).

It's important to say that I can not know if in each file there are records that can appear as duplicates, even in a far position.
Back to top
View user's profile Send private message

krishna_ragav

New User


Joined: 29 Oct 2010
Posts: 10
Location: Chennai

PostPosted: Mon Jun 11, 2012 3:54 pm    Post subject:
Reply with quote

Hi,

Need more clarity on your requirement. There should be some fields which must be present in both the files. Please look for common fields and try your options.
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 6
Location: Italia

PostPosted: Mon Jun 11, 2012 4:37 pm    Post subject:
Reply with quote

krishna_ragav wrote:
Hi,

Need more clarity on your requirement. There should be some fields which must be present in both the files. Please look for common fields and try your options.


I don't know the real content of the files, I only know that, when the one produced by the new procedure should compare with the old one.

Extracting differences will help developers in tuning the new application (or will let them be confident that differences are caused by old/dirty data).

According to that, I con only see files as strings of data, not as a set of fields.
Back to top
View user's profile Send private message
Nic Clouston

Global Moderator


Joined: 10 May 2007
Posts: 1712
Location: UK

PostPosted: Mon Jun 11, 2012 5:01 pm    Post subject:
Reply with quote

You can get sort to add a sequence number as it READS the files, then do your analysis and then sort the output files on the sequence number that was added on input and build your output without the keys. Samples abound. But I suspect you need keys unless you use the whole record as the key.
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 6
Location: Italia

PostPosted: Mon Jun 11, 2012 6:07 pm    Post subject:
Reply with quote

I saw in section about DFSORT/ICETOOL some solutions using that utilities.
But, any of these I tried modifies the original order of my files.

It's important that I say, for example, that the n-th record of the first file doesn't exists on the second, or vice-versa.

Imagine to have:
File1
AAAAA
BBBBB
CCCCC
DDDDD
AAAAA
EEEEE

File2:
AAAAA
BBBBB
FFFFF
CCCCC
CCCCC
EEEEE

In this case, records 1, 2, 3, 6 of File1 are conteined in File2, record 4 and 6 doesn't
Records 1, 2, 4, 6 of File2 are conteined in File1, records 3 and 5 doesn't
Back to top
View user's profile Send private message
Bill Woodger

DFSORT Moderator


Joined: 09 Mar 2011
Posts: 7224

PostPosted: Mon Jun 11, 2012 6:13 pm    Post subject: Reply to: Comparing 2 unsorted files
Reply with quote

Don't you have a file comparison product available?
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 6
Location: Italia

PostPosted: Mon Jun 11, 2012 6:35 pm    Post subject: Re: Reply to: Comparing 2 unsorted files
Reply with quote

Bill Woodger wrote:
Don't you have a file comparison product available?


I thought to use ISRSUPC, but I couldn't find any useful parameter.
It's a particular situation in wich we have not access to a list of products installed, so it's adviceble to use utilities included in a standard environment, or to write specific programs in COBOL.
Back to top
View user's profile Send private message
Anuj Dhawan

Senior Member


Joined: 22 Apr 2006
Posts: 6258
Location: Mumbai, India

PostPosted: Mon Jun 11, 2012 7:04 pm    Post subject:
Reply with quote

For a start - let's consider the entire records as the key. But we still need to know, what is LRECL of the inputs, how many records in both the files, what are your options to choose from -- e.g.: SORT (which one - DFSORT or SyncSort), COBOL any other language.
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 6
Location: Italia

PostPosted: Mon Jun 11, 2012 7:25 pm    Post subject:
Reply with quote

Anuj Dhawan wrote:
For a start - let's consider the entire records as the key. But we still need to know, what is LRECL of the inputs, how many records in both the files, what are your options to choose from -- e.g.: SORT (which one - DFSORT or SyncSort), COBOL any other language.


LRECL (I'm in an other place, this week) should be 560, RECFM=FB (I'm sure about it).
Amount of records changes day by day (is a daily job, actually); until now we can estimate about 50-60.000 records each day.
Both the files should be identical, so number of records is about the same.

The option for sort is DFSORT, I'm sure.
About languages, people involved in this task know only COBOL
Back to top
View user's profile Send private message
Pandora-Box

Moderator


Joined: 07 Sep 2006
Posts: 1529
Location: Andromeda Galaxy

PostPosted: Mon Jun 11, 2012 7:57 pm    Post subject:
Reply with quote

For this input

File1
Code:
AAAAA
BBBBB
CCCCC
DDDDD
AAAAA
EEEEE

File2:
Code:
AAAAA
BBBBB
FFFFF
CCCCC
CCCCC
EEEEE


you wanted to find the matched and unmatched records with its line number in the file ?

Please confirm
Back to top
View user's profile Send private message
andrea

New User


Joined: 08 Jun 2012
Posts: 6
Location: Italia

PostPosted: Mon Jun 11, 2012 8:31 pm    Post subject:
Reply with quote

Pandora-Box wrote:
For this input

File1
Code:
AAAAA
BBBBB
CCCCC
DDDDD
AAAAA
EEEEE

File2:
Code:
AAAAA
BBBBB
FFFFF
CCCCC
CCCCC
EEEEE


you wanted to find the matched and unmatched records with its line number in the file ?

Please confirm


In order of importance:
1) I need to extract a file contaning only matching records, in the same order of File1
2) Producing two more files containing records of File1 not present in File2 , and viceversa
3) It can be useful to report the line numbers of, at least, non matching records.

However, file produced must appear identical to original ones: the only difference will be the number of records.
When new procedure will be identical to the old one, File1 and File2 will compare, File with matching records will compare to them, and the two files contanining unmatching records will be empty.
Back to top
View user's profile Send private message
dick scherrer

Site Director


Joined: 23 Nov 2006
Posts: 19270
Location: Inside the Matrix

PostPosted: Mon Jun 11, 2012 10:14 pm    Post subject:
Reply with quote

Hello,

If there is no "key" and the records have no "sequence" other than arrival order, how is a duplicate or a difference identified?

It may help if you post some "real" data (not 500+ bytes, but only enough to demonstrate the actual data. If the data is sensitive, change the values consistently between the 2 sample input files.
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Mon Jun 11, 2012 11:41 pm    Post subject:
Reply with quote

this is the kind of thing that superc does very well.

taking advantage of the C.4.1 Update control file (LINE Compare Type) generated,
one could parse this and use as input to sort to generate the
files desired (matches, inserts, deletes).

you want the files compared line by line.
sort does not do this well when there is no key involved.

what you want is a utility that will and that would be a 3.12.

because of superc output methodology, you would only have the sequence of the record in one-or-the-other-file, which could then be used as a key
(records > 133 would not display the complete record)
to obtain the complete record for sort to generate the actual files.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> JCL & VSAM All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts High CPU consumption Job using IAM fi... aswinir JCL & VSAM 8 Thu Dec 01, 2016 8:28 pm
No new posts Match or compare two files in VB Format anatol DFSORT/ICETOOL 14 Thu Nov 03, 2016 7:41 pm
No new posts Comparing Decimal and CHAR columns rakesh17684 DB2 7 Thu Oct 20, 2016 2:33 am
This topic is locked: you cannot edit posts or make replies. How to use 2 input files in control c... Gunapala CN DFSORT/ICETOOL 23 Thu Oct 13, 2016 3:42 pm
No new posts Adding records from two files into on... shiitiizz SYNCSORT 4 Mon Sep 19, 2016 8:41 pm


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us