By running a BMP Job on an Online DBCTL Environment with about 450000 records to be updated and setting checkpoint after 500 updates, We could see that the performance of our online environment becomes worse. On the other hand, sometimes we see locking on some shared resources(for example bank account). So it seems there is a trade-off between performance and locking. I have three question:
1) Does checkpoint processing really have this huge overhead effect on the online system? and should we decrease the frequency?
2) By decreasing checkpoint frequency how can we overcome with the locking problem? Can Application Programmers do Committing instead of using BMP checkpoints?
3) Can we improve checkpoint process performance using any IMS Sysadmin tuning(parameters, ...)? Or do you know any best practice for BMP?
Joined: 17 Dec 2007 Posts: 59 Location: Victoria, BC, Canada
First, a short overview of locking and two phase commit.
Each time your BMP job updates an IMS database, or uses SQL to updates rows in a DB2 table, or calls MQ Series, each of these subysystems
will create locks. This is to prevent other applications from using the database records, table rows or MQ messages until the BMP commits the updates.
A BMP would normally do this on a boundary where a complete business transaction has been completed.
IMS may also hold a lock implicitly on a segment that is read by your BMP, until the next DLI call using the same database PCB.
This is dependent upon the PROCOPT specified on the database PCB and/or segment.
Any online MPP transaction that attempts to read or update a database segment or DB2 table/row that is subject to a lock, will wait.
During this wait, the IMS message processing region is occupied by the waiting transaction. IMS Sysprogs don't like to see this.
When the BMP takes a checkpoint, IMS as the lock coordinator, will poll MQ and DB2 to see if they are ready to commit.
This is Phase 1 of Two Phase Commit.
When MQ and DB2 respond to IMS that they are ready to commit, IMS will signal MQ and DB2 to commit their updates.
IMS will also commit the DLI updates and release the locks. This is Phase 2 of Two Phase Commit.
At this point, any online MPP transactions that were waiting for locks held by the BMP job, will be able to proceed to completion.
If these transactions had to wait too long, for example longer than the timeout value specified for IMS Connect-submitted transactions,
you will see timeout messages in your IMS Connect regions. This is not good.
Now on to your questions.
>1) Does checkpoint processing really have this huge overhead effect on the online system? and should we decrease the frequency?
1. The question here should be, "How long can an important online transaction wait for a locked record".
If your BMP makes 500 updates between checkpoints, and the duration of that activity is 10 minutes between checkpoints,
then you should adjust your checkpoint logic to take checkpoints at a frequency that respects the needs of online transaction
response and IMS Connect timeouts, if applicable.
The overhead of taking frequent checkpoints (example 1 per second) is low.
On the other hand, if your BMP takes checkpoints 20+ times per second, then overhead is high and you will cause other problems.
Do this and you should expect a visit from your un-friendly IMS Sysprog.
>2) By decreasing checkpoint frequency how can we overcome with the locking problem? Can Application Programmers do Committing instead of using BMP checkpoints?
2. My rule of thumb is to have a BMP job take checkpoints every 1 and 20 seconds. The sweet spot is 1-4 seconds.
You should follow your own shop standards for checkpointing, rather than have application programmers invent something new.
That something new might not correctly invoke two phase commit. And this could break things like your disaster recovery plan.
>3) Can we improve checkpoint process performance using any IMS Sysadmin tuning(parameters, ...)? Or do you know any best practice for BMP?
3. There is an option in your BMP program PSB source to specify LOCKMAX. This option will prevent your BMP from exceeding a specified
number of DLI database locks per checkpoint. The BMP jobstep will abend if this number is exceeded.
Your IMS Sysprog may also impose a system wide LOCKMAX limit.
Best practice, take a BMP checkpoint every 1 to 4 seconds, if it makes sense to do this within the confines of the application logical unit of work.
Ask your IMS Sysprog for a copy of the IMS Control region console log and the reports from IMS log archive jobs.
These will show you how often your BMP is checkpointing, so that you can fine tune your BMP based upon actual results.
First of all our BMP job is only makes update on IMS(DBCTL) and our online transactions run on CICS. So it seems in our scenario BMP checkpoints only do commit for BMP updates. Secondly, I've checked BMP checkpoint frequency, It starts with checkpoint in each 1-3 seconds but after one hour it starts producing one checkpoint per minute(our online transaction rate increased). So it seems it is going to run slowly and our online transactions response time will get worse too.
I have two more questions for clarification:
1) Since in our BMP we don't need 2PC, Is there any difference between CKPT and SYNC? and does CKPT have more overhead than SYNC?
2) We use IRLM TIMEOUT=10seconds and after that we see DXR162I message and we will purge top blocker transaction automatically. during two hours running of BMP we only see 3 messages.hence I've concluded that when the time between two BMP checkpoints increased from 1-3 seconds to about one minutes and we didn't see DXR162I there shouldn't have been locking issue. Is there anything shared IMS resources between BMP and online transaction environment that we can monitor or tune?
Joined: 17 Dec 2007 Posts: 59 Location: Victoria, BC, Canada
1. IMS handles 2 phase commit if there are DB2 and/or MQ subsystems involved. No need to change to SYNC.
2. If you run a deadlock report against the SLDS log tapes that were created during your BMP execution, you can observe whether deadlocks occurred and if your BMP was in conflict with online transactions.
I suggest you also talk to your performance admin about where your BMP jobs fall within the WLM Work Load Manager policy. Your BMPs should be running just below the IMS DLISAS region so that if the system becomes busy, they will continue to be given "cycles". Otherwise if they don't get cycles, they take fewer checkpoints during a given time range, and this could adversely affect online transactions.
Thank you Gary Jacek for your reply.
We are trying to reproduce the problem again(and I will give the feedback as soon as we could reach any conclusion). We didn't see a huge number of locking during running BMP. However gradually the speed of running BMP was reduced and simultaneously online transaction's response time increased. After canceling BMP everything became normal. If there isn't anything in IMS system(buffers,...) it is more likely we should focus on WLM.