Multiple jobs writing to same gdg. Best practice?

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

I work in a shop which has several gdg bases that many jobs write to on a daily basis. A nightly batch job copies the gdg base, processes the copy, and deletes the generations at the end of the job.

This works fine except for the following scenario, which results in lost transactions.

1. Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step.

2. While JOBB is down, JOBX, which creates a new generation that JOBB should process, runs and creates a new generation.

3. The abend for JOBB is resolved and JOBB is restarted from the step that abended. In other words doesn't process the new generation JOBX created while JOBB was down.

4. JOBB runs to eoj and thus deletes the unprocessed generation in it's delete step at the end of the job.

So far it's up to us to watch the abends and catch these situations and reprocess the missing transactions.

But I think there's got to be a better way. My idea is:

1. Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code.

2. Code DISP=(OLD,DELETE,KEEP) on the copy step.

I think this should work as if JOBB abends and JOBX makes a new generation while JOBB is down, the new generation won't be processed by the restart of JOBB but it won't be deleted either. That way the next run of JOBB will pick of the generation JOBX made while JOBB was down.

And if JOBB is reran from the top instead of restarted, the new generation will be picked from the RESTART of JOBB.

This seems like such an obvious solution to me I'm guessing there must be something I'm missing or they would have coded it this way to begin with. So I'm asking here to see what you guys think.

What's the right way to handle this situation?

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

I did search for this before posting. But sometimes my search skills aren't the best. If this is covered somewhere else, kindly point me in the right direction. Better yet, post where you searched and the keywords you used.

bijumon · Posted: Thu Aug 23, 2007 5:38 pm

Hi,

How is the GDG base specified in the JOB, JOBB , is it the current base of the GDG created by JOBX, or the entire GDG generation, please provide with more details...

Thanks & Regards,
----------------------
Biju

IQofaGerbil · Posted: Thu Aug 23, 2007 6:51 pm

Isn't this a scheduling problem?

William Thompson · Posted: Thu Aug 23, 2007 7:28 pm

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

dick scherrer · Posted: Fri Aug 24, 2007 7:34 am

Hello,

In a similar situation, we ran a job that copied all of the generations to a "processing" dataset and deleted all of the cataloged versions before the process job ever started. It also cataloged a "new" empty generation.

The nightly process job was started and was able to run independent of the sales/distribution runs that created new generations as the gdg was never mentioned in the process job. We needed this because there were something like 422 sales/distribution centers and we could not control when they would complete their daily processing and upload the sales and shipping info.

Please let me know if i've not been clear.

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

[quote="dick scherrer"]Hello,

In a similar situation, we ran a job that copied all of the generations to a "processing" dataset and deleted all of the cataloged versions before the process job ever started. It also cataloged a "new" empty generation.
[quote]
But the same question applies, just for the job that does the copy, or no?

Errr, I forgot an important point

There are hundreds of jobs that write to the gdg bases in question. And, while a goal is to prevent generations from being overwritten as I've stated. The main goal is to avoid coding conflcts in ca-7. The reason is that we created and delete jobs rapidly and updating the scheduler with the conflicts is too burdensome. So, I guess the question of 'Is this a scheduling issue?' has merit.

Since throughput manager handles the dataset conflicts, I figured that coding disp=old,delete,keep would handle the restart situation and we'd not have to code the conflicts.

dick scherrer · Posted: Fri Aug 24, 2007 8:20 am

Hello,

If the scheduling system is told to manage that particular dataset, it surely does look like a "scheduling issue".

What if that dataset was removed from being "scheduled"? Big-OZ will not let anything "bad" happen if the DISPs are proper for the job that initially copies all of the accumulated generations, deletes them, and catalogs the new "empty" starter generation. The worst thing we had happen was that one or more of the processes that needed to catalog the +1's had to wait until the pre-processor completed. We only let the jobs that created +1's run single thread anyway - they all had the same jobname. They were often a spin-off of a more involved process and existed only to copy the needed data to a new +1 of the common gdg.

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

MtClimber · Posted: Sat Sep 01, 2007 12:21 am

jasorn;
Try the "switch dataset" technique. Here's how it works.
In your JOBB that currently has 3 steps: copies, processes, and deletes the GDG files, add a step ahead of those that deletes a dummy file. Call it file A. After the copy, process and delete steps add a new last step to catalog that dummy file, file A. So JOBB will now have 5 steps.

Now add a new first step to JOBX that uses file A. The way this works is when JOBB starts, the first thing that happens is file A is deleted. If a JOBX starts, it will try to use file A and immediately abend with a dataset not found. JOBX will only run after the 5th step in JOBB runs to re-create file A. You can take all the time in the world to resolve problems with JOBB because JOBX will not run until that 5th step runs to re-create file A. All JOBX abends can easily be restarted from the top after JOBB finishes.

To get this all working, you will, one time, have to create a file A. You can catalog the dummy file A with IEBGENER with SYSUT1 dummied. For your delete step, I would recommend using IDCAMS with a DELETE A sysin card.

I hope this helps.
MtClimber

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

I'm not sure how to read the fact that nobody addressed whether there is an issue with using the disp=old,delete,keep in the copy step and forgoing the delete step altogether. Not sure if that means everyone thinks it's problematic or if nobody's considered it?

All of the suggestions given here are good, appreciated, and ones we're considering but are more 'invloved' than changing the disp on the copy step and eliminating the delete step. This seems to work in the testing I've done so far but I wanted to see if anyone here knew if there were issues with that approach.

dick scherrer · Posted: Sat Sep 01, 2007 7:46 am

Hello,

It might help if you post a "mini" set of the jcl - wouldn't need all of the jcl for every step, just the step exec and the "problem" dd(s).

The reason i didn't mention the copy step versus the delete step was that i was focused on the process rather than that specific.

I'm not a fan of IDCAMS for this type of work.

Without seeing your "mini" set of jcl, i believe the old,delete,keep in the copy step will be basically the same as an old,delete,keep in a following step.

I'll be checking over the weekend, so if you post again, you'll hopefully get replies before nexe week

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

Sure, I'll post jcl in the morning.

gcicchet · Posted: Fri May 16, 2008 12:48 pm

Hi Jasonr,
personally I think you should handle this thru your scheduling tool.

Having said that, I'm against using DISP=(OLD,DELETE,KEEP) in the COPY step especially using IEBGENER , this very is dangerous, an incorrect or missing DDNAME such as SYSPRINT will result in a RC of 12 thus blowing away all the datasets without copying them.

Separate steps will always be proned to loss of data.

I have a suggestion which maybe a bit ugly

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

gcicchet,
Actually, I mixed up two of my threads that are related to this problem. I should have posted that to this thread instead of here.
ibmmainframes.com/viewtopic.php?t=29062&highlight=

The solution I posted was to address the many jobs that write to the same gdg base while this thread is to address the job that processes the gdg base and deletes the generations. We still haven't decided how to handle this case and I'll check your suggestion out.

Thanks.

Moderators: You can remove my post dated "Fri May 16, 2008 9:35 am" if you please. (Done)

gcicchet · Posted: Fri May 16, 2008 6:10 pm

Hi Jasonr,

another way is before running JOBB is to run another job to LISTC the gdg's and build 2 members.
Member1 will look something along these lines

dbzTHEdinosauer · Posted: Fri May 16, 2008 8:19 pm

Jason,

you will always have the problem: if the copy job goes bust and abends, leaving the GDGs unprotected (NOT exclusively locked ), the +1 jobs are going pee on your parade.

your reluctance to 'program' the scheduler - job dependencies - leaves you in a no-win situation. Schedulers were created to solve this (and many other) production processing problems. Using job dependencies (relying on the EOJ return code of a previous job) provides you with a map of how things are interrelated.
Most shops i have been in are of the
'as few as possible steps per job mentality'
- which means you have a gazillion jobs but the scheduler does not care, and if you fully use all the options of the scheduler, you can really lock down your shop and prevent this ripple type event from occurring. Even if you had a 1000 jobs, two people could schedule them in a couple of days. how long have you been playing around trying to solve this with disp parms/intermediate dependent jobs ?

plus, once you have it done, adding or deleting one new job is not that resource intensive.

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA

Bill Dennis · Posted: Mon May 19, 2008 6:41 pm

Use DISP=(OLD,DELETE,KEEP) in the copy and get rid of the separate DELETE step. Just be sure the job will actually ABEND on an error. For example, if you use SORT to do the copy/delete it might just pass CC=16 on an error and the files could delete! Two copy steps would even be better, OLD,KEEP on the first and OLD,DELETE,KEEP on the next.

jasorn · Active User Joined: 12 Jul 2006 Posts: 191 Location: USA