we have had a deadlock among several new batch BMP jobs in development and are looking for a solution in the design to minimise this. these are new jobs being built as one-off to be run on implementation day only to add around 3 million new records to a primary IMS DB and less or almost same number to other two IMS DB's.
to be specific the three databases are
DB1 - with 10 physical key-ranged partitions
DB2 - with 10 physical key-ranged partitions
DB3 - with only 1 physical partition
these are 10 BMP jobs built to run in parallel with input specific to each DB1 keyed partitions. the jobs will only add new records and will not update any existing records(segments) to any of the DB's. now for each input, a new record will be added to DB1 and then DB2 will be checked if a corresponding pointer exists for the DB1 record. if not, a record will be added to DB2 and similarly to DB3. The checkpoint frequency is set to 30000. the deadlock is between DB2 and DB3.
on a discussion it has been advised to use ENQ/DEQ (not DLI Q and DEQ) on DB2 and DB3 resources from application program to minimise the deadlock which i do not think is the optimal approach. our shop uses IRLM and from the Deadlock analysis report, the LOCK request type was GRIDX and state 06-P.
my idea is to reduce the checkpoint freq to around 3000 and experiment. i am yet to get the input from a different system to experiment and meanwhile looking for some suggestions here. i understand the best thing to do is to run the jobs one after other but just worried that it will increase the application downtime window which is not preferred.
my understanding is that IMS IRLM uses ENQ/DEQ for locking at different level based on PSB PROCOPT. now if we use ENQ/DEQ on top of it from application prog's, what is the impact? i do not think our DEQ request will release the lock placed by IMS. Would someone help me?
Alternately if we use DLI calls Q/DEQ, my understanding is that the DEQ will release the IMS lock if and only if it's not updated between syncpoints. i am not quite sure. some experts opinions are welcome here to correct my understanding
Your keys for DB2 and DB3 are not split along the same lines as DB1, so your partition A job is using partition B in DB2 and DB3. This is going to get locked out no matter how small you make your checkpoint.
If you have the time to do this, remove the inserts to DB2 and DB3 and instead, write the updates to flat files. Yes, to ten flat files. That means 100 total flat files from the DB1 load program for DB2, and another 100 for DB3.
Then take the DB2 updates and split/sort them to ten input files so they are one partition each. Do the same for DB3 updates.
Then you can run DB2/DB3 jobs all at once.
To make it easier, you use specific generations of a GDG for the DB1 job. For example, job A can have
Which will read in all 10 generations in one gulp.
Frankly, if you have the time, I would replace that first job with a few extracts and match/merge jobs so that you don't need to read DB2 and DB3 in the first place.
As a middle ground, you could eliminate the read of DB2 and DB3 and just write the KEYS out in one flat file, then change the DB2 / DB3 inserts to be flexible enough to know if they need to do the inserts.
One thing you have going for you is they are HIDAM, so a normal sort by the primary key will get you in the correct sequence.
well my attempt to keep this short didn't really help as i didn't do a good job in that. although the jobs are new, the process which it runs is existing, online and quite complex. the existing online process is being used in BMP as one-off to migrate some records (actually customers) from a different platform. it does a lot of validation b/w DB's, build pointers, compliance, reporting, publish data to various other systems etc etc. all we are doing is feeding data in batch files to this process. so there is no option to change things to write into files etc with this process unless we build the same duplicate which is obviously a no go. to keep it short we are using online process in batch.
Having said that, i do not think we will write them to any files for DB2/DB3. i am guessing running each job one after other would have almost the same run time??
i am quite interested to understand how using assembler services ENQ/DEQ macros will have an affect. also does DLI Q/DEQ help? at least for a good argument in the shop.
If you start locking databases, you will not fix anything. IMS is already doing that for you and guess what...IT'S getting locked out.
Since you gave a little more detail, let me ask this. Does the on-line version run in ten parts? If not, why did someone decide to run it in ten parts in batch? I've seen this happen here quite a bit: "Just split it!" and all of our problems are solved.
It sounds like you're stuck with it though, so you want to start figuring out how long it will run as one big job.
Excellent! Thanks for the confirmation Ed. Online version does not run in 10 parts. There is an API which identifies the physical DB based on key range table. The reason to run them as 10 jobs is purely to reduce the migration window. Well, based on some estimation, running everything in one job is expected to run for more than 5 hours. We are yet to perform volume testing and have kept the option to run everything in one job open if it runs fast. Else we are also thinking of turning off DB3 update in batch and run an existing integrity job later. The DB3 is just a cross reference and is not a significant update for this migration.