I work in a shop which has several gdg bases that many jobs write to on a daily basis. A nightly batch job copies the gdg base, processes the copy, and deletes the generations at the end of the job.
This works fine except for the following scenario, which results in lost transactions.
1. Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step.
2. While JOBB is down, JOBX, which creates a new generation that JOBB should process, runs and creates a new generation.
3. The abend for JOBB is resolved and JOBB is restarted from the step that abended. In other words doesn't process the new generation JOBX created while JOBB was down.
4. JOBB runs to eoj and thus deletes the unprocessed generation in it's delete step at the end of the job.
So far it's up to us to watch the abends and catch these situations and reprocess the missing transactions.
But I think there's got to be a better way. My idea is:
1. Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code.
2. Code DISP=(OLD,DELETE,KEEP) on the copy step.
I think this should work as if JOBB abends and JOBX makes a new generation while JOBB is down, the new generation won't be processed by the restart of JOBB but it won't be deleted either. That way the next run of JOBB will pick of the generation JOBX made while JOBB was down.
And if JOBB is reran from the top instead of restarted, the new generation will be picked from the RESTART of JOBB.
This seems like such an obvious solution to me I'm guessing there must be something I'm missing or they would have coded it this way to begin with. So I'm asking here to see what you guys think.
I did search for this before posting. But sometimes my search skills aren't the best. If this is covered somewhere else, kindly point me in the right direction. Better yet, post where you searched and the keywords you used.
Joined: 14 Aug 2006 Posts: 20 Location: Pune,India
Hi,
How is the GDG base specified in the JOB, JOBB , is it the current base of the GDG created by JOBX, or the entire GDG generation, please provide with more details...
Joined: 18 Nov 2006 Posts: 3156 Location: Tucson AZ
Quote:
Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step.
Your best bet would be to slide the processing to after the base delete and treat the copy/delete as a single entity, either they both run or they are both rerun together. Once the delete is done, process the copy and don't worry about it......
Quote:
Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code.
Set the cond on the process to fail if the IDCAMS returns anything greater than zero. If your schedular can't handle that, add a step after the delete that will produce an abend and set it to be skipped if cond = zero.
How is the GDG base specified in the JOB, JOBB , is it the current base of the GDG created by JOBX, or the entire GDG generation, please provide with more details...
Thanks & Regards,
----------------------
Biju
The gdg base in JOBB is just that. All generations.
Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step.
Your best bet would be to slide the processing to after the base delete and treat the copy/delete as a single entity, either they both run or they are both rerun together. Once the delete is done, process the copy and don't worry about it......
I figured I'd get this response. We talked about it but as long as it's in two different steps, the same issue exists, albeit to a lesser degree. That's the reason for my suggesting disp=new,delete,keep. That why they aren't 'treated as one'. They actually are one.
Quote:
Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code.
Set the cond on the process to fail if the IDCAMS returns anything greater than zero. If your schedular can't handle that, add a step after the delete that will produce an abend and set it to be skipped if cond = zero.
If we do that, the delete step is still in a different step and issue still exists as noted above.
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
Hello,
In a similar situation, we ran a job that copied all of the generations to a "processing" dataset and deleted all of the cataloged versions before the process job ever started. It also cataloged a "new" empty generation.
The nightly process job was started and was able to run independent of the sales/distribution runs that created new generations as the gdg was never mentioned in the process job. We needed this because there were something like 422 sales/distribution centers and we could not control when they would complete their daily processing and upload the sales and shipping info.
Your best bet would be to slide the processing to after the base delete and treat the copy/delete as a single entity, either they both run or they are both rerun together. Once the delete is done, process the copy and don't worry about it......
One of the goals is to have it so the job can restart in the same step it abended in. This isn't a requirement proper so it's a possible solution. But so far nobody has addressed whether disp=(old,delete,keep) works here.
In a similar situation, we ran a job that copied all of the generations to a "processing" dataset and deleted all of the cataloged versions before the process job ever started. It also cataloged a "new" empty generation.
[quote]
But the same question applies, just for the job that does the copy, or no?
Errr, I forgot an important point
There are hundreds of jobs that write to the gdg bases in question. And, while a goal is to prevent generations from being overwritten as I've stated. The main goal is to avoid coding conflcts in ca-7. The reason is that we created and delete jobs rapidly and updating the scheduler with the conflicts is too burdensome. So, I guess the question of 'Is this a scheduling issue?' has merit.
Since throughput manager handles the dataset conflicts, I figured that coding disp=old,delete,keep would handle the restart situation and we'd not have to code the conflicts.
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
Hello,
If the scheduling system is told to manage that particular dataset, it surely does look like a "scheduling issue".
What if that dataset was removed from being "scheduled"? Big-OZ will not let anything "bad" happen if the DISPs are proper for the job that initially copies all of the accumulated generations, deletes them, and catalogs the new "empty" starter generation. The worst thing we had happen was that one or more of the processes that needed to catalog the +1's had to wait until the pre-processor completed. We only let the jobs that created +1's run single thread anyway - they all had the same jobname. They were often a spin-off of a more involved process and existed only to copy the needed data to a new +1 of the common gdg.
If the scheduling system is told to manage that particular dataset, it surely does look like a "scheduling issue".
What if that dataset was removed from being "scheduled"? Big-OZ will not let anything "bad" happen if the DISPs are proper for the job that initially copies all of the accumulated generations, deletes them, and catalogs the new "empty" starter generation. The worst thing we had happen was that one or more of the processes that needed to catalog the +1's had to wait until the pre-processor completed. We only let the jobs that created +1's run single thread anyway - they all had the same jobname. They were often a spin-off of a more involved process and existed only to copy the needed data to a new +1 of the common gdg.
Currently, the scheduling system is just told not to let any of the +1 jobs run while the copy job is running. That's what we want to aviod, defining that to the scheduler. It's tedious and we have lots of jobs and there is high 'job turnover'. In this case, the jobs that create the +1's aren't the same. They're totally different. And there isn't an issue if there are no abends with the copy job. The issue comes in if there is an abend after the copy step but before the delete step finishes... if a new generation is created during the downtime and the job is restarted from the point of abend, thus, not picking up the new generation in the copy but deleting it.
I proposed the solution given here of moving the delete step to the one right after the copy step. The response I got was we couldn't do that because if the delete step abended the same situation exists just not as likely to happen. So even if we had the delete step right after the copy step, we'd still need to define the job conflicts to the scheduler.
It's slightly more complicated since there are about a dozen of these gdg bases that get copied in this job and the abend might happen on the 3rd one. And part of the goal is to make this bullet proof in the event of the job being restarted incorrectly.
That's why I figured we could put a sort step in that copied the input to both the temp file to get processed and the backup and code a disp=old,delete,keep on the input. So if the sort completes successfully, we have the backup to revert to if need be in case the job is restarted improperly and at the same time we don't have to worry about missing a new generation created by a +1 job if the copy job abends.
So, is there an issue with coding disp=old,delete,keep? I mean why have the delete step in a different step in the first place? That's what I'm trying to get at: What's the rationale behind creating a separate delete step vs old,delete,keep?
Joined: 31 Aug 2007 Posts: 5 Location: Milwaukee WI
jasorn;
Try the "switch dataset" technique. Here's how it works.
In your JOBB that currently has 3 steps: copies, processes, and deletes the GDG files, add a step ahead of those that deletes a dummy file. Call it file A. After the copy, process and delete steps add a new last step to catalog that dummy file, file A. So JOBB will now have 5 steps.
Now add a new first step to JOBX that uses file A. The way this works is when JOBB starts, the first thing that happens is file A is deleted. If a JOBX starts, it will try to use file A and immediately abend with a dataset not found. JOBX will only run after the 5th step in JOBB runs to re-create file A. You can take all the time in the world to resolve problems with JOBB because JOBX will not run until that 5th step runs to re-create file A. All JOBX abends can easily be restarted from the top after JOBB finishes.
To get this all working, you will, one time, have to create a file A. You can catalog the dummy file A with IEBGENER with SYSUT1 dummied. For your delete step, I would recommend using IDCAMS with a DELETE A sysin card.
I'm not sure how to read the fact that nobody addressed whether there is an issue with using the disp=old,delete,keep in the copy step and forgoing the delete step altogether. Not sure if that means everyone thinks it's problematic or if nobody's considered it?
All of the suggestions given here are good, appreciated, and ones we're considering but are more 'invloved' than changing the disp on the copy step and eliminating the delete step. This seems to work in the testing I've done so far but I wanted to see if anyone here knew if there were issues with that approach.
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
Hello,
It might help if you post a "mini" set of the jcl - wouldn't need all of the jcl for every step, just the step exec and the "problem" dd(s).
The reason i didn't mention the copy step versus the delete step was that i was focused on the process rather than that specific.
I'm not a fan of IDCAMS for this type of work.
Without seeing your "mini" set of jcl, i believe the old,delete,keep in the copy step will be basically the same as an old,delete,keep in a following step.
I'll be checking over the weekend, so if you post again, you'll hopefully get replies before nexe week
Joined: 28 Jul 2006 Posts: 1702 Location: Australia
Hi Jasonr,
personally I think you should handle this thru your scheduling tool.
Having said that, I'm against using DISP=(OLD,DELETE,KEEP) in the COPY step especially using IEBGENER , this very is dangerous, an incorrect or missing DDNAME such as SYSPRINT will result in a RC of 12 thus blowing away all the datasets without copying them.
Separate steps will always be proned to loss of data.
I have a suggestion which maybe a bit ugly
Code:
//RENAME EXEC PGM=IDCAMS
//DD1 DD DSN=CSCSGLC.FILE01,DISP=OLD ** LOCKS OUT OTHER JOBS **
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
ALTER CSCSGLC.FILE01.G0001V00 -
NEWNAME(CSCSGLC.FILE02.G0001V00)
ALTER CSCSGLC.FILE01.G0002V00 -
NEWNAME(CSCSGLC.FILE02.G0002V00)
ALTER CSCSGLC.FILE01.G0003V00 -
NEWNAME(CSCSGLC.FILE02.G0003V00)
etc
The job that reads and deletes the GDG base will need to run against the renamed files.
The solution I posted was to address the many jobs that write to the same gdg base while this thread is to address the job that processes the gdg base and deletes the generations. We still haven't decided how to handle this case and I'll check your suggestion out.
Thanks.
Moderators: You can remove my post dated "Fri May 16, 2008 9:35 am" if you please. (Done)
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
Jason,
you will always have the problem: if the copy job goes bust and abends, leaving the GDGs unprotected (NOT exclusively locked ), the +1 jobs are going pee on your parade.
your reluctance to 'program' the scheduler - job dependencies - leaves you in a no-win situation. Schedulers were created to solve this (and many other) production processing problems. Using job dependencies (relying on the EOJ return code of a previous job) provides you with a map of how things are interrelated.
Most shops i have been in are of the
'as few as possible steps per job mentality'
- which means you have a gazillion jobs but the scheduler does not care, and if you fully use all the options of the scheduler, you can really lock down your shop and prevent this ripple type event from occurring. Even if you had a 1000 jobs, two people could schedule them in a couple of days. how long have you been playing around trying to solve this with disp parms/intermediate dependent jobs ?
plus, once you have it done, adding or deleting one new job is not that resource intensive.
you will always have the problem: if the copy job goes bust and abends, leaving the GDGs unprotected (NOT exclusively locked ), the +1 jobs are going pee on your parade.
Yes the copy job has this risk but since we might have one copy job abend in a year, resolving a copy job abend is fast, and the many jobs which write to the gdg base don't typically run during the time the copy step runs, and we have a process to detect when generations were dropped, it seems to be worthwhile verses trying to manage 300-500 jobs turning over constantly in a scheduler.
Quote:
your reluctance to 'program' the scheduler - job dependencies - leaves you in a no-win situation. Schedulers were created to solve this (and many other) production processing problems. Using job dependencies (relying on the EOJ return code of a previous job) provides you with a map of how things are interrelated.
There is no reluctance. The 300-500 jobs that write to the gdg base can't be coded as conflicts with the copy job because the limit of conflicts is 255.
And as far as the 300-500 being part of some schedule where they're dependent upon one another, that isn't an option because they absolutely are not dependent upon one another. They have totally independent schedules are run based on when outside firms send us files.
But this isn't an issue as modding the +1 generation and then checking to see if it's empty works perfectly and is contained with the job and isn't dependent upone external forces. The copy job that runs once a day being the exception. But that's not an issue as discussed above.
Quote:
Even if you had a 1000 jobs, two people could schedule them in a couple of days. how long have you been playing around trying to solve this with disp parms/intermediate dependent jobs ?
I've posted a bit and asked around but I've spent about a day working out a solution once I finally concluded none of the solutions anyone offered solved the problem.
Quote:
plus, once you have it done, adding or deleting one new job is not that resource intensive.
That's not true at our shop. Even if there is a way to put these into the scheduler and that would prevent the generations from being overwritten, it takes us half a day to modify one job in the schedule. Given our JOB turnover, that's too expensive.
But all of the solutions using a scheduler all were based on holding up the other jobs while the one that abended was down. Not only do we not want to hold the other jobs up, we can't as they're indenedent jobs and have data from other firms that need to be processed.
Joined: 17 Aug 2007 Posts: 562 Location: Iowa, USA
Use DISP=(OLD,DELETE,KEEP) in the copy and get rid of the separate DELETE step. Just be sure the job will actually ABEND on an error. For example, if you use SORT to do the copy/delete it might just pass CC=16 on an error and the files could delete! Two copy steps would even be better, OLD,KEEP on the first and OLD,DELETE,KEEP on the next.
Use DISP=(OLD,DELETE,KEEP) in the copy and get rid of the separate DELETE step. Just be sure the job will actually ABEND on an error. For example, if you use SORT to do the copy/delete it might just pass CC=16 on an error and the files could delete! Two copy steps would even be better, OLD,KEEP on the first and OLD,DELETE,KEEP on the next.
Yes, this is the original solution I proposed for the copy job. Not clear to me if we're going to do this or not.