Reduce CPU Time in a job

sivasaras · New User Joined: 29 Sep 2007 Posts: 93 Location: chenna/i-

Hi,

one of my Weekly job is consuming high CPU time and it takes nearly 12 hrs to complete.

The job collects all the 5 days reject files and produces a report.

I checked the query and the program with the DBA it works fine.

PLease advice me whether to use Checkpoint in the files to reduce the time.

Akatsukami · Posted: Wed Nov 02, 2011 10:28 pm

Why do you think that checkpointing the job would make it more performant?

Bill Woodger · Posted: Wed Nov 02, 2011 10:36 pm

How many rejects do you get weekly that it takes 12 hours to report on them?

You are taking files/datasets, what are you then doing with the DB?

Unless you have an enormous amount of rejects, despite being "fine" there is something wrong with the program or the way it is accessing the DB.

Can you try to fill in some of the gaps in our knowledge of this?

dbzTHEdinosauer · Posted: Wed Nov 02, 2011 10:56 pm

if you are doing fetches/selects against a db without commits
you are lucky that anything runs.
always issue commits in batch, even if it is readonly selects/fetches

if you are only reading/writing files (qsam or vsam),
sounds as if a little tuning may be in order.

but, as usual, all you have said is that your job runs a long time.

too bad!

dick scherrer · Posted: Thu Nov 03, 2011 10:08 am

Hello,

How is this a JCL question

Suggest you do more investigation than the dba has done. Just because a query gives the correct result does NOT mean that the query "works fine".

How many rows have to be touched to resolve the query. It a row is touched more than one time, it counts as mulotiple rows. I have seen some "good queries" that read the same several rows hundreds of millions of times in a process.

There is often (among less proficient developers/dbas) a desire to get everything into one query. For extremely high volume of repetitive work, usually this hurts performance considerably - indeed, it may make the process un able to be executed. Several of my clients have asked me to look into their main daily batch process that took more than 24 wll hours to run. One had a data conversion process that as built would have run about 9 days. . .

To repeat what DBZ posted - it runs a long time - too bad.

If you bother to provide some particulars, someone may have a suggestion to improve the performance.

Anuj Dhawan · Posted: Thu Nov 03, 2011 3:23 pm

sivasaras · New User Joined: 29 Sep 2007 Posts: 93 Location: chenna/i-

Hi,
These are the 2 ways i have identified to reduce the CPU Time.

1. change the programt to use Binary search condition.

2. To delete all the old data in the file and make the job to read only the last 5 days of file.

Please advice me anyother ways to reduce.

Thanks
Siva

dick scherrer · Posted: Mon Nov 07, 2011 10:39 pm

Hello,

expat · Posted: Mon Nov 07, 2011 10:46 pm

Dick, maybe the daily files are just that, daily files where new data is appended to any existing data

Obviously without confirmation from the OP, that was merely a guess, but wouldn't surprise me in the least.

Bill Woodger · Posted: Mon Nov 07, 2011 11:47 pm

sivasaras · New User Joined: 29 Sep 2007 Posts: 93 Location: chenna/i-

Hi,

I have processed only the last 30 days records by using include statement in the job but it is temporary fix only and i want to process all the 2 million records and i need to handle that in the program.

kindly advice me.

Thank you
siva

Bill Woodger · Posted: Tue Nov 22, 2011 9:00 pm

How can we provide advice without a much more detailed description of what you are doing. How does 30 days' of data relate to the original five days', for instance.

If you can explain the situation well, then I'm sure there'll be some ideas.

sivasaras · New User Joined: 29 Sep 2007 Posts: 93 Location: chenna/i-

hi,

Daily we are receiving reject records from the daily job and those 5 days rejected records will be used as an input in the weekly job which runs on saturday.

1. The reject file has 2 million records with 2002,2010,2003.2011 years of
data.

2. The customer told to process only 1 month data say 2011-11 which it
has taken the records for this month and i have given in the include st.

3. Now they are saying i want to process all the 2 million records with all years of data. i processed all the 2 million records and the job is running nearly 12 hours and it takes the CPU time as 212 mins.

Thank you
Siva

Bill Woodger · Posted: Tue Nov 22, 2011 9:32 pm

Ok, we can get a handle on how the figures come together from that.

What about the processing?

What file structures/DBs are used?

If you have concerns about the run-time when there are 2m records, you are doing something with them each week? What, and why?

What sort of processing is done with the data? Are the 2m going to reduce over time, or generally increase?

When you say they are "rejects" what does this mean?

Any thing else you can think of which might reduce the to-and-fro of question-and-answer.

sivasaras · New User Joined: 29 Sep 2007 Posts: 93 Location: chenna/i-

Ok, we can get a handle on how the figures come together from that.

What about the processing?

process flow

the program process the data that it receives from the bank.

Any record that receives from the bank that is unable to place in the database gets put into the reject file.

Reject reasons
Acct number incorrect
Acct number not in the Master Bank database
Transaction is incorrect

those rejected records should again be processed weekly to see if any of the reasons that the records were rejected have been resolved.

while processing the reject file, if the reject record is updated in the database, it should then be deleted from the reject file.

The only records that should be in the reject file are those that cannot be resolved and place in the file.

What file structures/DBs are used?

Gsam file strucures and 2 tables used.

the 2million will increase further.

When you say they are "rejects" what does this mean?

If you have concerns about the run-time when there are 2m records, you are doing something with them each week? What, and why?

What sort of processing is done with the data? Are the 2m going to reduce over time, or generally increase?

For all these please see the answers in the question no: 1

Any thing else you can think of which might reduce the to-and-fro of question-and-answer.

i can use Array or Search conditions or Sort

dbzTHEdinosauer · Posted: Tue Nov 22, 2011 10:04 pm

often these huge merges of account activity triggers and static data from the account,
can be accomplished much easier in a db2 table unload,
the qsam unload-table-rows and the account activity triggers (rejects in this case)
can be JOINKEYed or SPLICed together as input to a report generator.
and if it is an easy (or rational) report, sort can generate this also.
but the report logic could easily be in a COBOL program.

buffer up the qsam file - can be processed much faster than db2 access.

you have a data/time when you unload the db2 table,
so the report is accurate (account balances, etc...) as of the date/time.

the unload would take 5-10 minutes for 2m rows,
the sort for 4 (or even 10) million records 1/2 hour? an hour???
the COBOL program for the report - 1/2 hour at most.

now you have reduced your time frame from 12 hours to less than 2.

added since siasaras's last post.

even if the db2 table row is 1000 bytes long
(all necessary info to make the determination)
it would be faster to dump/JOINKEY
than to read each db2 row individually.

Bill Woodger · Posted: Tue Nov 22, 2011 11:44 pm

If there are 2m records, something to do with a bank, so estimate only five dollars a pop, that's up to 10 million dollars that has been sitting around, lost, totally erroneous, etc.

Anyway, I suppose that is not your problem at the moment.

Go along dbz's lines and you'll get something manageable. Design it, make it tight.

Along the way, when someone starts worrying about all these "rejects", you need processing which deals with potential updates in a more timely and direct way. Running 2m records past your db every week just on the offchance of getting a hit (I can't work out if you have rejects that are only two years, or up to 10 years old) is a bit... unusual, let's say.

kratos86 · Posted: Wed Nov 23, 2011 9:35 am

As bill said... its very unusual to keep checking the rejects of more than a year to see whether it has been resolved. Check with your client group and come up with a time period by which these rejects can be scrubbed off from the rejects file. Otherwise every day to day the rejects gonna come and will increase the processing even if you change the code.

You said your job is weekly... so create a log of all the activities of the database for the entire week and match it with the rejects accordingly. This way i feel you can reduce the amount of workload in data handling.