Comparing two files using REXX

amargallani · New User Joined: 04 May 2010 Posts: 5 Location: ballarpur

Hi, Can anyone help me with the below requiremet:

I have to comapre two file's, acctually the files should be identical if the files are not identical the we have two provide the reason. The reasons can be:
1. Number of records are not equal
2. File attributes not matching
3. An extra record in a file
4. Records are equal but in jumbeled order ( Header is placed at the bottom of a new file)

As this can be done mannually if there are few files, but we are expecting 400 + files and thinking to automate the process or reduce the efforts by 50-60%. its difficult to achieve, I have proposed a solution can you guys suggest me if there is any thing with wich it can be achieved.

This can be done using REXX:

1. Invoke SuperC via Rexx / Batch to compare New_File and Old_File

IF the file matches (Apple to Apple match) Exit the Process

IF there is a mismatch go ahead with step2

2. Indexing the files (Add the sequence number to both the files )

3. Sort the files in Asecnding order, here the files would be sorted leaving the index.

4. Now compare the sorted files, If file matches the issue is files are not in sorted order.

5. If still there is a mismatch the this file can be processed manually.

As we cannot fully automate the process, bu this work can be reduced by 50%.

Can anybody help me on this....

Thanks in advance

expat · Posted: Mon May 10, 2010 2:25 pm

It looks as though you have the best part of the logic worked out, so what do you need help with.

Escapa · Posted: Mon May 10, 2010 2:39 pm

amargallani · New User Joined: 04 May 2010 Posts: 5 Location: ballarpur

Actually we should not sort the given files, as the data in one of the files can be in jumbled order....here we get the reason that the new file is Sorted/not sorted and this is the disceprency.

In order to easily compare these files we are adding the sequence no.

This indexing can also be used...if the 1st record of new file is placed at the bottom of old file.....we will add SEQNUM and sort the files...and the sorting will be done on the data not considering the sequence number.... after compare we can use the sequence no. to tell the user where the discrepancy is....

Pedro · Posted: Mon May 10, 2010 8:15 pm

Your comments about sorting and jumbled order are not clear to me.

I think you should first create test files and have one that is in 'jumbled order'. And then see if SUPERC will find it.

I agree with Escapa about processing the SUPERC output file.

amargallani · New User Joined: 04 May 2010 Posts: 5 Location: ballarpur

I agree, we can use SuperC to compare two files.

If the files are matching, result of superC is X no.of rows matching and 0 Paired and 0 unpaired

If few records are not matching then the result is 1 paired and 2 unpaired,
it is difficult to analyse and report the user why the file is not mathcing...

For example:

[b]NEW_FILE:
AAA011
BBB021
CCC031
FFFF041

OLD_FILE:
XXX011
BBB021
CCC031
FFFF041
AAA011

After compare we have to report the user

The new file is not matching wiht the old file.
Reason: Last record int he old file is placed at the top of the ne wfile and the header (XXX011) is missing in the new file.

dick scherrer · Posted: Tue May 11, 2010 9:32 am

Hello,

Possibly (probably), the summary at the end of the SUPERC run is enough information to report to the "user". . . .

If more detailed analysis is needed for one file or another, then do some more detailed work to provide some additional info.

This assumes that most of the files will completely match most of the time or that they are supposed to match most/all of the time. If the files are supposed to match, and they do not, i suspect that the problem process needs to be corrected and this should not require much depth. . .

Pedro · Posted: Tue May 11, 2010 10:37 am

amargallani · New User Joined: 04 May 2010 Posts: 5 Location: ballarpur

Hi Pedro,

Here the example is small, if we have a file of RL 500 and the disceprency is at the 250th colum and the same issue occurs with 50 rows, it is difficult to analyse the results of Super C.

It would be difficult to inform the user what data is missing.

Is SuperC output is restricted to RL 133 ??? (wanted to know)

Escapa · Posted: Tue May 11, 2010 1:45 pm