High CPU consumption Job using IAM files as input

aswinir · New User Joined: 01 Dec 2016 Posts: 6 Location: India

I have a high CPU consumption job that uses IAM files as input. When profiled the job using strobe, it says ‘Innovation Access Method (IAMBPC01)’ takes 33.36 % of CPU SOLO time and VSAM Record Management takes 27.28 % of CPU Solo time in the most extensively used procedures.
The program is driven by an IAM file which has 10 million records. This reads another two IAM files in random mode. One file has close to 70 K records and the other file has nearly 90 million records. But from the 90 million records, we fetch only half of its total count (i.e., 45 million) based on specific record type. Since these records that are in need are not contiguous, sequential or skip sequential reads are not of much significant as it would take more CPU.
Would like to understand what IAMBPC01 section is and why nearly 33 % is utilized.
Please find the strobe report for reference.

Listcat report is attached for reference. Would also like to know what is ‘STORAGE REQUIRED FOR COMPRESSED PRIME INDEX’ and ‘NUMBER OF IAM DATA BLOCKS’

Robert Sample · Posted: Thu Dec 01, 2016 9:30 pm

Have you talked to Innovation Data Processing yet? If not, then you should -- the vendor should be your FIRST point of contact when you have questions about products they provide and support.

I found a 2013 Share presentation on IAM at IAM: Imprvoing Transaction and Batch VSAM Applications. Have you looked at the IAMINFO output (suggested in the Share presentation)?

It is entirely possible that you won't be able to improve CPU consumption for this application -- but the vendor would be the one to tell you if any improvement is possible.

Robert Sample · Posted: Thu Dec 01, 2016 10:11 pm

One additional comment: asking ANY vendor about what any specific CSECT does is, most likely, a waste of time. Vendors generally consider such information to be intellectual property and hence they are not likely to share any details with you. At most they will tell you a very high level view of the module. Asking such questions on a forum such as this has pretty much a ZERO percent chance of uncovering information.

Bill Woodger · Posted: Thu Dec 01, 2016 10:32 pm

Well, getting 45m records from a 90m record data set by random access is a very poor design (not blaming you for the design, until you admit to it).

How many hits on the 70k-record file? Consider loading the data you need into a COBOL table, and access from there.

You'd have to explain how the 10m file causes effects to the 90m file, and the relationship of the keys between the two.

Bill Woodger · Posted: Thu Dec 01, 2016 10:43 pm

The listcat output you provided is not much use. What volumes things reside on is irrelevant to any diagnosis. If information about inserts, updates, deletes, including detail on the index, is useful (but not included).

What is the "pattern" for updates to the 90m file, the actual one you are showing? Are you sure you need 10% CA freespace and 10% CI fresspace across the entire file? From the creation date, it is "recent" - do you delete/define each run? If your random access is widely distributed, there's a lot of overhead with your large blocks. What buffers have you used for index and data (and what is recommended for this type of access in your IAM documentation)?

vasanthz · Posted: Fri Dec 02, 2016 12:12 am

When we were using IAM, IAMINFO is the starting point. There would be lot of useful information about the reads/writes, one easy thing to check was the number of buffers that IAM decided to use for that particular file.
IAM would also provide recommendation about buffers for the file on IAMINFO dataset.

There might be other causes, but this is one quick thing to check.

prino · Posted: Fri Dec 02, 2016 5:04 am

aswinir · New User Joined: 01 Dec 2016 Posts: 6 Location: India

Thanks for your quick response. Here are the details.

10m file is the parent file in which id is key record. each id record in parent file has 6 sub records in the 90m file. out of which, the program picks only 3 sub records which are not contiguous.

Also, this program does not do any update, insert or delete to the IAM files. Delete define is not happening in this job.

prino · Posted: Fri Dec 02, 2016 3:30 pm

aswinir · New User Joined: 01 Dec 2016 Posts: 6 Location: India

Thanks all.

This was an existing program. I have sorted the input file to filter the records that are in need and read the file sequentially.

Bill Woodger · Posted: Mon Dec 12, 2016 2:12 pm

That's the way to do it. I'd expect around 90% reduction in resources if you are able to apply the whole thing as a two-file match.

aswinir · New User Joined: 01 Dec 2016 Posts: 6 Location: India

Yes that what I am doing. join on keys that i have to process and include only the record types that are needed.

Bill Woodger · Posted: Mon Dec 12, 2016 3:18 pm

To a KSDS and leaving the other program unchanged? Or you've changed the other program as well? What reductions did you get in CPU, IO, and elapsed time?

aswinir · New User Joined: 01 Dec 2016 Posts: 6 Location: India

Its only 1 program.

10m file is the driver file in which id is key record. each id record in parent file has 6 sub records in the 90m file. out of which, the program picks only 3 sub records which are not contiguous. Also the driver file reads 70k records
file to pick few data from this.

Joined the keys of 70K file and 10M file based on the keys and removed duplicates from 70K file. (consider file A)

Joined keys of 10M file and 90M file based on keys and include only the records that are in need. (Consider file B)

Changed the program to drive with new file A and read the 10M file in sequential mode and then reading file B sequentially based on the records in 10 M file.

Note: I have not tested with full set of records. for POC purpose, only 8M records were considered for testing out of 90M. i.e. 8 Million before program changes and with new code. this gave more than 40% improvement in CPU.

Bill Woodger · Posted: Mon Dec 12, 2016 3:56 pm

That sounds about right, without knowing fuller details. Expect larger savings on full-sized files. Huge reduction in IO.

Looking back through the topic, I see the post I thought I'd made isn't there, so you weren't even taking my advice :-) Good work.

I'd also consider whether keeping the 70K file in a COBOL table might save, but it depends entirely on your situation.

aswinir · New User Joined: 01 Dec 2016 Posts: 6 Location: India

I am sorry about it. But the existing program logic was driven thru 10M records file. Reading 70K and 90M file based on keys in random mode.

Since I joined 10M with 70K (also removing duplicates) I gets only less than 3k records (out of 70K). I changed the logic of the program to read 3k file and based on the keys, read the 10M file and 90M file sequentially. hence did not load into cobol table.

Thanks