IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Multiple jobs writing to same gdg. Best practice?


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Thu Aug 23, 2007 5:06 pm
Reply with quote

I work in a shop which has several gdg bases that many jobs write to on a daily basis. A nightly batch job copies the gdg base, processes the copy, and deletes the generations at the end of the job.

This works fine except for the following scenario, which results in lost transactions.

1. Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step.

2. While JOBB is down, JOBX, which creates a new generation that JOBB should process, runs and creates a new generation.

3. The abend for JOBB is resolved and JOBB is restarted from the step that abended. In other words doesn't process the new generation JOBX created while JOBB was down.

4. JOBB runs to eoj and thus deletes the unprocessed generation in it's delete step at the end of the job.

So far it's up to us to watch the abends and catch these situations and reprocess the missing transactions.

But I think there's got to be a better way. My idea is:

1. Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code.

2. Code DISP=(OLD,DELETE,KEEP) on the copy step.

I think this should work as if JOBB abends and JOBX makes a new generation while JOBB is down, the new generation won't be processed by the restart of JOBB but it won't be deleted either. That way the next run of JOBB will pick of the generation JOBX made while JOBB was down.

And if JOBB is reran from the top instead of restarted, the new generation will be picked from the RESTART of JOBB.

This seems like such an obvious solution to me I'm guessing there must be something I'm missing or they would have coded it this way to begin with. So I'm asking here to see what you guys think.

What's the right way to handle this situation?
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Thu Aug 23, 2007 5:08 pm
Reply with quote

I did search for this before posting. But sometimes my search skills aren't the best. If this is covered somewhere else, kindly point me in the right direction. Better yet, post where you searched and the keywords you used.
Back to top
View user's profile Send private message
bijumon

New User


Joined: 14 Aug 2006
Posts: 20
Location: Pune,India

PostPosted: Thu Aug 23, 2007 5:38 pm
Reply with quote

Hi,

How is the GDG base specified in the JOB, JOBB , is it the current base of the GDG created by JOBX, or the entire GDG generation, please provide with more details...

Thanks & Regards,
----------------------
Biju
Back to top
View user's profile Send private message
IQofaGerbil

Active User


Joined: 05 May 2006
Posts: 183
Location: Scotland

PostPosted: Thu Aug 23, 2007 6:51 pm
Reply with quote

Isn't this a scheduling problem?
Back to top
View user's profile Send private message
William Thompson

Global Moderator


Joined: 18 Nov 2006
Posts: 3156
Location: Tucson AZ

PostPosted: Thu Aug 23, 2007 7:28 pm
Reply with quote

Quote:
Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step.
Your best bet would be to slide the processing to after the base delete and treat the copy/delete as a single entity, either they both run or they are both rerun together. Once the delete is done, process the copy and don't worry about it......

Quote:
Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code.
Set the cond on the process to fail if the IDCAMS returns anything greater than zero. If your schedular can't handle that, add a step after the delete that will produce an abend and set it to be skipped if cond = zero.
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Fri Aug 24, 2007 7:08 am
Reply with quote

bijumon wrote:
Hi,

How is the GDG base specified in the JOB, JOBB , is it the current base of the GDG created by JOBX, or the entire GDG generation, please provide with more details...

Thanks & Regards,
----------------------
Biju

The gdg base in JOBB is just that. All generations.
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Fri Aug 24, 2007 7:11 am
Reply with quote

IQofaGerbil wrote:
Isn't this a scheduling problem?

I don't think so. There is no reason that the jobs that create new generations should run while the job that processes them is down.
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Fri Aug 24, 2007 7:19 am
Reply with quote

William Thompson wrote:
Quote:
Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step.
Your best bet would be to slide the processing to after the base delete and treat the copy/delete as a single entity, either they both run or they are both rerun together. Once the delete is done, process the copy and don't worry about it......

I figured I'd get this response. We talked about it but as long as it's in two different steps, the same issue exists, albeit to a lesser degree. That's the reason for my suggesting disp=new,delete,keep. That why they aren't 'treated as one'. They actually are one.

Quote:
Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code.
Set the cond on the process to fail if the IDCAMS returns anything greater than zero. If your schedular can't handle that, add a step after the delete that will produce an abend and set it to be skipped if cond = zero.

If we do that, the delete step is still in a different step and issue still exists as noted above.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Fri Aug 24, 2007 7:34 am
Reply with quote

Hello,

In a similar situation, we ran a job that copied all of the generations to a "processing" dataset and deleted all of the cataloged versions before the process job ever started. It also cataloged a "new" empty generation.

The nightly process job was started and was able to run independent of the sales/distribution runs that created new generations as the gdg was never mentioned in the process job. We needed this because there were something like 422 sales/distribution centers and we could not control when they would complete their daily processing and upload the sales and shipping info.

Please let me know if i've not been clear.
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Fri Aug 24, 2007 7:38 am
Reply with quote

Quote:

Your best bet would be to slide the processing to after the base delete and treat the copy/delete as a single entity, either they both run or they are both rerun together. Once the delete is done, process the copy and don't worry about it......

One of the goals is to have it so the job can restart in the same step it abended in. This isn't a requirement proper so it's a possible solution. But so far nobody has addressed whether disp=(old,delete,keep) works here.
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Fri Aug 24, 2007 7:50 am
Reply with quote

[quote="dick scherrer"]Hello,

In a similar situation, we ran a job that copied all of the generations to a "processing" dataset and deleted all of the cataloged versions before the process job ever started. It also cataloged a "new" empty generation.
[quote]
But the same question applies, just for the job that does the copy, or no?

Errr, I forgot an important point icon_sad.gif

There are hundreds of jobs that write to the gdg bases in question. And, while a goal is to prevent generations from being overwritten as I've stated. The main goal is to avoid coding conflcts in ca-7. The reason is that we created and delete jobs rapidly and updating the scheduler with the conflicts is too burdensome. So, I guess the question of 'Is this a scheduling issue?' has merit.

Since throughput manager handles the dataset conflicts, I figured that coding disp=old,delete,keep would handle the restart situation and we'd not have to code the conflicts.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Fri Aug 24, 2007 8:20 am
Reply with quote

Hello,

If the scheduling system is told to manage that particular dataset, it surely does look like a "scheduling issue".

What if that dataset was removed from being "scheduled"? Big-OZ will not let anything "bad" happen if the DISPs are proper for the job that initially copies all of the accumulated generations, deletes them, and catalogs the new "empty" starter generation. The worst thing we had happen was that one or more of the processes that needed to catalog the +1's had to wait until the pre-processor completed. We only let the jobs that created +1's run single thread anyway - they all had the same jobname. They were often a spin-off of a more involved process and existed only to copy the needed data to a new +1 of the common gdg.
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Fri Aug 24, 2007 3:46 pm
Reply with quote

dick scherrer wrote:
Hello,

If the scheduling system is told to manage that particular dataset, it surely does look like a "scheduling issue".

What if that dataset was removed from being "scheduled"? Big-OZ will not let anything "bad" happen if the DISPs are proper for the job that initially copies all of the accumulated generations, deletes them, and catalogs the new "empty" starter generation. The worst thing we had happen was that one or more of the processes that needed to catalog the +1's had to wait until the pre-processor completed. We only let the jobs that created +1's run single thread anyway - they all had the same jobname. They were often a spin-off of a more involved process and existed only to copy the needed data to a new +1 of the common gdg.

Currently, the scheduling system is just told not to let any of the +1 jobs run while the copy job is running. That's what we want to aviod, defining that to the scheduler. It's tedious and we have lots of jobs and there is high 'job turnover'. In this case, the jobs that create the +1's aren't the same. They're totally different. And there isn't an issue if there are no abends with the copy job. The issue comes in if there is an abend after the copy step but before the delete step finishes... if a new generation is created during the downtime and the job is restarted from the point of abend, thus, not picking up the new generation in the copy but deleting it.

I proposed the solution given here of moving the delete step to the one right after the copy step. The response I got was we couldn't do that because if the delete step abended the same situation exists just not as likely to happen. So even if we had the delete step right after the copy step, we'd still need to define the job conflicts to the scheduler.

It's slightly more complicated since there are about a dozen of these gdg bases that get copied in this job and the abend might happen on the 3rd one. And part of the goal is to make this bullet proof in the event of the job being restarted incorrectly.

That's why I figured we could put a sort step in that copied the input to both the temp file to get processed and the backup and code a disp=old,delete,keep on the input. So if the sort completes successfully, we have the backup to revert to if need be in case the job is restarted improperly and at the same time we don't have to worry about missing a new generation created by a +1 job if the copy job abends.

So, is there an issue with coding disp=old,delete,keep? I mean why have the delete step in a different step in the first place? That's what I'm trying to get at: What's the rationale behind creating a separate delete step vs old,delete,keep?
Back to top
View user's profile Send private message
MtClimber

New User


Joined: 31 Aug 2007
Posts: 5
Location: Milwaukee WI

PostPosted: Sat Sep 01, 2007 12:21 am
Reply with quote

jasorn;
Try the "switch dataset" technique. Here's how it works.
In your JOBB that currently has 3 steps: copies, processes, and deletes the GDG files, add a step ahead of those that deletes a dummy file. Call it file A. After the copy, process and delete steps add a new last step to catalog that dummy file, file A. So JOBB will now have 5 steps.

Now add a new first step to JOBX that uses file A. The way this works is when JOBB starts, the first thing that happens is file A is deleted. If a JOBX starts, it will try to use file A and immediately abend with a dataset not found. JOBX will only run after the 5th step in JOBB runs to re-create file A. You can take all the time in the world to resolve problems with JOBB because JOBX will not run until that 5th step runs to re-create file A. All JOBX abends can easily be restarted from the top after JOBB finishes.

To get this all working, you will, one time, have to create a file A. You can catalog the dummy file A with IEBGENER with SYSUT1 dummied. For your delete step, I would recommend using IDCAMS with a DELETE A sysin card.

I hope this helps.
MtClimber
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Sat Sep 01, 2007 4:43 am
Reply with quote

I'm not sure how to read the fact that nobody addressed whether there is an issue with using the disp=old,delete,keep in the copy step and forgoing the delete step altogether. Not sure if that means everyone thinks it's problematic or if nobody's considered it?

All of the suggestions given here are good, appreciated, and ones we're considering but are more 'invloved' than changing the disp on the copy step and eliminating the delete step. This seems to work in the testing I've done so far but I wanted to see if anyone here knew if there were issues with that approach.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Sat Sep 01, 2007 7:46 am
Reply with quote

Hello,

It might help if you post a "mini" set of the jcl - wouldn't need all of the jcl for every step, just the step exec and the "problem" dd(s).

The reason i didn't mention the copy step versus the delete step was that i was focused on the process rather than that specific.

I'm not a fan of IDCAMS for this type of work.

Without seeing your "mini" set of jcl, i believe the old,delete,keep in the copy step will be basically the same as an old,delete,keep in a following step.

I'll be checking over the weekend, so if you post again, you'll hopefully get replies before nexe week icon_smile.gif
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Sat Sep 01, 2007 11:02 am
Reply with quote

Sure, I'll post jcl in the morning.
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Fri May 16, 2008 12:48 pm
Reply with quote

Hi Jasonr,
personally I think you should handle this thru your scheduling tool.

Having said that, I'm against using DISP=(OLD,DELETE,KEEP) in the COPY step especially using IEBGENER , this very is dangerous, an incorrect or missing DDNAME such as SYSPRINT will result in a RC of 12 thus blowing away all the datasets without copying them.

Separate steps will always be proned to loss of data.

I have a suggestion which maybe a bit ugly

Code:
//RENAME   EXEC PGM=IDCAMS                                           
//DD1      DD DSN=CSCSGLC.FILE01,DISP=OLD ** LOCKS OUT OTHER JOBS ** 
//SYSPRINT DD SYSOUT=*                                               
//SYSIN    DD *                                                       
   ALTER   CSCSGLC.FILE01.G0001V00                       -           
   NEWNAME(CSCSGLC.FILE02.G0001V00)                                   
   ALTER   CSCSGLC.FILE01.G0002V00                       -           
   NEWNAME(CSCSGLC.FILE02.G0002V00)                                   
   ALTER   CSCSGLC.FILE01.G0003V00                       -           
   NEWNAME(CSCSGLC.FILE02.G0003V00)                                   
   etc                                                       



The job that reads and deletes the GDG base will need to run against the renamed files.


Gerry
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Fri May 16, 2008 4:13 pm
Reply with quote

gcicchet,
Actually, I mixed up two of my threads that are related to this problem. I should have posted that to this thread instead of here.
ibmmainframes.com/viewtopic.php?t=29062&highlight=

The solution I posted was to address the many jobs that write to the same gdg base while this thread is to address the job that processes the gdg base and deletes the generations. We still haven't decided how to handle this case and I'll check your suggestion out.

Thanks.

Moderators: You can remove my post dated "Fri May 16, 2008 9:35 am" if you please. (Done)
Back to top
View user's profile Send private message
gcicchet

Senior Member


Joined: 28 Jul 2006
Posts: 1702
Location: Australia

PostPosted: Fri May 16, 2008 6:10 pm
Reply with quote

Hi Jasonr,

another way is before running JOBB is to run another job to LISTC the gdg's and build 2 members.
Member1 will look something along these lines

Code:
//SYSUT1   DD DSN=File1.G0001V00,DISP=SHR
//         DD DSN=File1.G0002V00,DISP=SHR
//         DD DSN=File1.G0003V00,DISP=SHR
  etc



Member2 will delete the files
Code:
//KILLFILE DD DSN=File1.G0001V00,DISP=(OLD,DELETE,DELETE)
//         DD DSN=File1.G0002V00,DISP=(OLD.DELETE,DELETE)
//         DD DSN=File1.G0003V00,DISP=(OLD,DELETE,DELETE)
  etc



JOBB will use the INCLUDE statement for above members.

First INCLUDE Member1 to copy the files, and second INCLUDE to Delete the files.

If JOBB now fails it can be rerun without the worry of new files having been created.

Gerry
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Fri May 16, 2008 8:19 pm
Reply with quote

Jason,

you will always have the problem: if the copy job goes bust and abends, leaving the GDGs unprotected (NOT exclusively locked ), the +1 jobs are going pee on your parade.

your reluctance to 'program' the scheduler - job dependencies - leaves you in a no-win situation. Schedulers were created to solve this (and many other) production processing problems. Using job dependencies (relying on the EOJ return code of a previous job) provides you with a map of how things are interrelated.
Most shops i have been in are of the
'as few as possible steps per job mentality'
- which means you have a gazillion jobs but the scheduler does not care, and if you fully use all the options of the scheduler, you can really lock down your shop and prevent this ripple type event from occurring. Even if you had a 1000 jobs, two people could schedule them in a couple of days. how long have you been playing around trying to solve this with disp parms/intermediate dependent jobs ?

plus, once you have it done, adding or deleting one new job is not that resource intensive.
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Sat May 17, 2008 4:07 am
Reply with quote

dbzTHEdinosauer wrote:
Jason,

you will always have the problem: if the copy job goes bust and abends, leaving the GDGs unprotected (NOT exclusively locked ), the +1 jobs are going pee on your parade.

Yes the copy job has this risk but since we might have one copy job abend in a year, resolving a copy job abend is fast, and the many jobs which write to the gdg base don't typically run during the time the copy step runs, and we have a process to detect when generations were dropped, it seems to be worthwhile verses trying to manage 300-500 jobs turning over constantly in a scheduler.
Quote:

your reluctance to 'program' the scheduler - job dependencies - leaves you in a no-win situation. Schedulers were created to solve this (and many other) production processing problems. Using job dependencies (relying on the EOJ return code of a previous job) provides you with a map of how things are interrelated.

There is no reluctance. The 300-500 jobs that write to the gdg base can't be coded as conflicts with the copy job because the limit of conflicts is 255.

And as far as the 300-500 being part of some schedule where they're dependent upon one another, that isn't an option because they absolutely are not dependent upon one another. They have totally independent schedules are run based on when outside firms send us files.

But this isn't an issue as modding the +1 generation and then checking to see if it's empty works perfectly and is contained with the job and isn't dependent upone external forces. The copy job that runs once a day being the exception. But that's not an issue as discussed above.
Quote:

Even if you had a 1000 jobs, two people could schedule them in a couple of days. how long have you been playing around trying to solve this with disp parms/intermediate dependent jobs ?

I've posted a bit and asked around but I've spent about a day working out a solution once I finally concluded none of the solutions anyone offered solved the problem.

Quote:

plus, once you have it done, adding or deleting one new job is not that resource intensive.

That's not true at our shop. Even if there is a way to put these into the scheduler and that would prevent the generations from being overwritten, it takes us half a day to modify one job in the schedule. Given our JOB turnover, that's too expensive.

But all of the solutions using a scheduler all were based on holding up the other jobs while the one that abended was down. Not only do we not want to hold the other jobs up, we can't as they're indenedent jobs and have data from other firms that need to be processed.
Back to top
View user's profile Send private message
Bill Dennis

Active Member


Joined: 17 Aug 2007
Posts: 562
Location: Iowa, USA

PostPosted: Mon May 19, 2008 6:41 pm
Reply with quote

Use DISP=(OLD,DELETE,KEEP) in the copy and get rid of the separate DELETE step. Just be sure the job will actually ABEND on an error. For example, if you use SORT to do the copy/delete it might just pass CC=16 on an error and the files could delete! Two copy steps would even be better, OLD,KEEP on the first and OLD,DELETE,KEEP on the next.
Back to top
View user's profile Send private message
jasorn
Warnings : 1

Active User


Joined: 12 Jul 2006
Posts: 191
Location: USA

PostPosted: Tue May 20, 2008 4:15 am
Reply with quote

Bill Dennis wrote:
Use DISP=(OLD,DELETE,KEEP) in the copy and get rid of the separate DELETE step. Just be sure the job will actually ABEND on an error. For example, if you use SORT to do the copy/delete it might just pass CC=16 on an error and the files could delete! Two copy steps would even be better, OLD,KEEP on the first and OLD,DELETE,KEEP on the next.

Yes, this is the original solution I proposed for the copy job. Not clear to me if we're going to do this or not.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts purge jobs with return code 0 and ret... JCL & VSAM 4
No new posts combine multiple unique records into ... DFSORT/ICETOOL 2
No new posts SORT JCL to merge multiple tow into s... DFSORT/ICETOOL 6
No new posts Using Multiple IFTHEN and WHEN condit... SYNCSORT 12
No new posts How to turn off 'ACTION' SDSF output ... TSO/ISPF 2
Search our Forums:

Back to Top