Reconcile two Identical tables

tomehta · New User Joined: 18 Aug 2008 Posts: 98 Location: India

Hi
How can we compare two Identical tables in DB2, say in two test regions I want to compare the same tables.

Db1.Employee and Db2.Employee.

I dont want to extract the tables on files and then compare to generate the reconcile report.

Do we have some way we can do it in DB2.

thanks
Rohit

Terry Heinze · Posted: Mon May 18, 2009 9:19 pm

Why don't you want to "extract the tables on files and then compare" the unloaded data sets? That would seem to be the quickest method. Utililies can be used for both steps.

agkshirsagar · Posted: Tue May 19, 2009 5:01 pm

Rohit,
Terry has raised a very valid point. Do think about that approach.

You may want to try something like this-

SELECT * FROM DB1.EMPLOYEE WHERE EMPID NOT IN(
SELECT EMPID FROM DB2.EMPLOYEE ) ;
UNION
SELECT * FROM DB2.EMPLOYEE WHERE EMPID NOT IN(
SELECT EMPID FROM DB1.EMPLOYEE ) ;

dick scherrer · Posted: Tue May 19, 2009 9:13 pm

Hello,

That query could identify key "mis-match"es, but would not compare the contents of the tables. . .

agkshirsagar · Posted: Tue May 19, 2009 11:49 pm

You are right Dick.
But it is my guess that OP wanted to know only the key mismatches.
Lets wait and watch..

tomehta · New User Joined: 18 Aug 2008 Posts: 98 Location: India

I am sorry guys for replying late,

Let me elaborate further, I have to two db , having same table (approx 150), have to compare the contents.
Abt compare here, compare the contents of all columns of table except the primary keys, bit cryptic but thats the way it is. Actually primary key is generated from time stamp, so no point in comparing that.

let me know your thoughts.

dbzTHEdinosauer · Posted: Thu Jun 11, 2009 8:17 pm

tomehta · New User Joined: 18 Aug 2008 Posts: 98 Location: India

We have created a new application. A replica for the current data base is created. Now to make sure that the new application is working fine in parallel run , the new database which is a replica of existing database, needs to be compared to the existing DB. This will ensure that new application is working as fine as the existing one.

Both the current and new application will be running in parallel. Problem at hand is how to compare the contents of two databases. The rows has to be sorted as the primary keys wont be same. Depending on some timestamp selection criteria, sorted rows need to be compared. We are talking about 150 tables here having huge data.

Hope I am able to formulate the problem.

dick scherrer · Posted: Sat Jun 13, 2009 9:57 am

Hello,

Is the data in all of the tables "driven" by this "timestamp selection criteria"?

As the 2 systems run in parallel, will all rows in all tables be inserted in the same sequences of processes. Is this timestamp part of the primary key?

If some arbitrary value is the uniqueness key and these are inserted ascending, i believe you could unload 2 of the "same" tables and (excluding the uniqueness key) compare the remainder of each row against the "same" row from the other table.

As i re-read this i may have added confusion. . .

Hopefully, this will make what i want to say more clear.

Let's pick on tableA on both sides - the original and the replica. Thru the real process and the parallel process will the 100th row added be the same row in both tableAs. The timestamp identifier would be different, but should all of the rows be added serially?

If this is true, suggest you try an experiment unloading tableA from both environments, copying the unloaded file strippinig off the unequal timestamp info, and then comparing the entire remainder of the record using superc. This would not be the final solution, but rather a proof of concept that you could identify any differences (again other than the timestamp) in the 2 tables.

I understand that you'd prefer to avoid the unloads, but if this works, it would only be set up once and all of the processes would be clones, not taking much "real" work.

Please post the outcome of a test of this or clarify why this would not accomplish what you need. Good luck

tomehta · New User Joined: 18 Aug 2008 Posts: 98 Location: India

thanks Dick for the solution, I am also want a similar kind of solution, where coding effort has to be minimal.

Yes this time stamp is more or less primary key in the tables..,

I want to reconfirm with you my understanding. We have to unload the tables excluding the primary key, on asc(desc) on some column and then do the superc. is it correct ?

Serial part, i guess shud be fine, As both the batch job application will be working on same set of input files. I am really not confident, that I have understood 100% what you have suggested.
But i will definitely going to go to DB2 support and ask them why we cant implement this.

thanks dick..

dick scherrer · Posted: Sat Jun 27, 2009 12:55 am

Hello,

You want to unload both in timestamp sequence - you just don't want to include the timestamp in the compare for differences. . .

If i understand correctly, the timestamps will be different, but all of the other columns should be the same.

tomehta · New User Joined: 18 Aug 2008 Posts: 98 Location: India

Hi Dick
two jobs will be processing on the same set of files, doing same kind of processing.
from the job timings, i can give a between time stamp criteria for unload of the rows ( i hope so ) from two tables.

But now the question is, will db2 give us rows back serially if I dont give any ASC/DESC criteria in unload . Its possible that the value of the column on which we are doing ASC/DESC is populated wrongly by new set of program.
Whole aim to reconcile, and prove that new application is working same as the old one.

dick scherrer · Posted: Sat Jul 11, 2009 2:13 am

Hello,

sushanth bobby · Posted: Mon Jul 13, 2009 11:28 am

Hello,