Tivoli job scheduler issue

John Poulakos · Posted: Sat Mar 23, 2019 12:32 am

Here's one everyone is scratching their heads over, even IBM:

1. We changed a Proc in our Proclib
2. The old version of the proc was moved to a backup library, just in case.
3. We added the changed proc to a job that has been scheduled for several months.
4. The job ran from the schedule but picked up the old version of the proc.

The jcl expansion clearly shows the proc was picked up from the prod proclib where it was supposed to be.

The job clearly shows it executing the proc, so it had to pick up the changed job JCL.

The job has never specified this proc name before.

So, how did it use the old proc?

Old JCL (excluding comments, etc:

//FS267D JOB (WG1158,FIS),ADM,CLASS=C,MSGCLASS=M
//FS253D01 EXEC FS267

NewJCL:

//FS267D JOB (WG1158,FIS),ADM,CLASS=C,MSGCLASS=M
//FS253D01 EXEC FS253

Proc FS253 is the proc that was changed.

expat · Posted: Mon Mar 25, 2019 12:37 pm

Was this a once off baffler, or does it continue each run ?

Maybe the timing was just unfortunate where OPC had already prepped the job ready to run after the PROCLIB change - Just a suggestion

David Robinson · Active User Joined: 21 Dec 2011 Posts: 199 Location: UK

I've seen this before, but never got to the bottom of it. In our case the attempted exec of the proc was immediately after the change, so we assumed it must have been something in storage somewhere that had not been refreshed.

Would be interesting to hear what the cause was if you find it.

John Poulakos · Posted: Tue Mar 26, 2019 9:42 pm

Problem appears to be JES2 is holding an old version of PROCLIB. TWS may be holding an enqueue, but no "smoking gun" found. IBM called and several attempts have been made to fix the issue by recycling various components and reinstalling TWS. This has caused other problems. Now getting IEF344IF and IGD17358I errors on GDGs that are used only in one job that has never failed and no reclaim should ever be needed.

We rewrote the JCL to eliminate the failing PROC expansion, so I don't know if the original problem has been fixed.

steve-myers · Posted: Tue Mar 26, 2019 10:27 pm

Mr. Poulakos analysis is the right general idea, but possibly not correct.

When a job is submitted to JES2, it is "converted" (usually) almost immediately. "Conversion" analyzes the JCL. Any procedures in the job are read and analyzed during "conversion." Any proc changes made between "conversion" and actual execution are not reflected in the execution environment as the JCL has already been processed.

Another possible problem is in multi-access SPOOL if it is being used. The job may be "converted" on a different physical system than the execution system, so a different physical proclib may be used for conversion than on the execution system, and this can result in problems if the production control analysts are not wary.

John Poulakos · Posted: Tue Mar 26, 2019 10:48 pm

The current and highly speculative belief, is the person responsible for space management was working on expanding PROCLIB when the failing PROC update was made. Exactly what was being done to PROCLIB is unclear.

The IEF344IF and IGD17358I errors appear to be the result of somebody manually submitting a job that is normally controlled by TWS. TWS was taken out of service at the time. I can see the GDG was correct and OK in the previous run of the job and bad in the following run of the job. But, the JES log is missing for the job that was submitted manually and caused the GDG error.

I think this is all I will ever know about this event. I doubt anyone will be forthcoming with better information.

expat · Posted: Wed Mar 27, 2019 12:34 pm

I always believed that if a physical change, (e.g. Deleted and redefined or moved to a different volume) was made to a PROCLIB library that JES had to be bounced.

I recall many many years ago when my colleague moved a PROCLIB and all hell broke loose until JES was bounced.

John Poulakos · Posted: Thu Mar 28, 2019 7:01 pm

I think that is exactly what happened, but I will never be able to prove it. My job is "fixer". I get all significant problems, identify the cause and do whatever is needed to prevent them from occurring again. It requires co-operation from all areas of IT, which isn't always forthcoming; especially when the problem is human error by tech support. So, sometimes I get the oats after they have been through the horse.

expat · Posted: Fri Mar 29, 2019 12:34 pm

Just as another possible add on ............. senior moments seem more frequent threse days

You were saying about increasing the space for the PROCLIB, so it could also be possible that he has ran a silly little IEFBR14 against the PROCLIB to allow it secondary allocations, which unfortunately may be used by jobs to update the PROCLIB but will not be recognised by JES until it gets bounced.

I think that if you really did fancy trolling through SMF that you could prove it

It's amazing the little snippets of old that traverse the grey matter sitting in the pub.

John Poulakos · Posted: Sat Apr 06, 2019 1:45 am

Bingo. That turned out to be exactly what happened. I had completely forgotten about allowing secondary extents. But, I have an excuse... I'm even older than you, I'll bet. I'm 75.

steve-myers · Posted: Sat Apr 06, 2019 9:28 am

If the new member is in the new extent, the converter will barf trying to read the member. .. BUT the TS claimed it got the old member .. so something else happened.

There are ways to reopen the proclib concatenation if you got this extent problem. The easiest is to have a handy dandy job that specifies /*JOBPARM PROC=xx, where XX is an identifier for a non standard PROC in the JES2 procedure. The job doesn't even have to use a proc in the library! The next job will reopen PROC00.

steve-myers · Posted: Sat Apr 06, 2019 10:54 am

John Poulakos · Posted: Fri Apr 12, 2019 7:56 pm

The underlying problem WAS the PROCLIB expansion. But it messed up TWS pretty bad. I did not get all of the details, but IBM had to provide a resolution and it took two days to get the scheduled jobs running cleanly again.