Hi,
I have a requirement where I need to find delta records between two files having around 20M data.
The file has LRECL of 1700 with 25 distinct fields.
I am using JOINKEYS to match all the fields and writing the not matched to delta file.
The sort works fine but it is taking a lot of CPU time for processing around 3 minutes.
Is there any alternative to JOINKEYS that can be implemented to reduce CPU consumption.
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
Unless you develop a method of getting the results you want without using JOINKEYS, it is unlikely you are going to be able to reduce CPU much -- if any. SORT is already a highly optimized product and hence there is rarely anything an applications programmer can do to reduce its resource usage. Usually the only way to improve performance is to move to a newer machine (assuming your site is not running a z13/z13s already).
Furthermore, 3 minutes of CPU time for 20 million records works out to something like 110,000 records per second of CPU time -- which is pretty good in itself. Why do you think 3 minutes of CPU time is excessive for what you are doing?
Sorry I could not reply earlier. I had access issues on mainframes.
This job is one of more expensive jobs so we are checking if any alternative can be implemented to reduce CPU times.
Both the files are unloads for tables so they are sorted with a default order.
Below is the SORT which compares and send delta:
Joined: 10 May 2007 Posts: 2454 Location: Hampshire, UK
By default JOINKEYS sorts both data sets (they are not "files") unless told not to by the SORTED keyword - which I do not see in your control statements.
1.Can't it be possible to try unload delta by some sql query using batch spufi or BMC unload?
2. Try COMPAREX
3.as suggested add SORTED keyword during the JOINKEYS.
1. There are only two gaps in the whole record which are not used as join keys: bytes 100 to 101 (=2 bytes), and bytes 1485 to 1510 (=26 bytes)
It might give a minor performance improvement if all adjacent join keys were combined into three groups each considered as long join key:
Code:
FIELDS=(1,99,A,
102,1362,A,
1511,113,A)
2. A more significant improvement in performance might be expected only by providing at least one (better two) of input datasets to appear as pre-sorted before this join. Then extra keyword needs to be specified for the sorted field(s):
Joined: 28 Aug 2007 Posts: 1745 Location: Tirupur, India
Quote:
1.Can't it be possible to try unload delta by some sql query using batch spufi or BMC unload?
2. Try COMPAREX
On what basis are you suggesting that option 1 or 2 is efficient than JOINKEYS?
Also BMC unload and comparex are licensed products and are not available on all shops.
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
Quote:
This job is one of more expensive jobs so we are checking if any alternative can be implemented to reduce CPU times.
If you're looking at a job using 3 minutes of CPU because it is "one of more expensive jobs" (sic) at your site, then you're pretty much wasting your time. If it was using 60 or 90 minutes of CPU time, then you'd be justified in looking at the job; for 3 minutes of CPU time, why bother? Even if you cut CPU time in half (which is most likely impossible based on what you've posted so far), and the job runs daily, you have saved 90 seconds of CPU time or 0.1% of the CPU available for the day (if your site has more than one CP processor in your CEC, the percentage goes down) -- hardly worth spending much time on! And if your site bought the machine, then you're not saving any money until a new machine is purchased (and not a lot then).
On what basis are you suggesting that option 1 or 2 is efficient than JOINKEYS?
Also BMC unload and comparex are licensed products and are not available on all shops.
I would leave that to try and acknowledge you by TS. We have them using at my shop and if TS don't have it then he is out of choice(s).
By default JOINKEYS sorts both data sets (they are not "files") unless told not to by the SORTED keyword - which I do not see in your control statements.
...
This job is one of more expensive jobs so we are checking if any alternative can be implemented to reduce CPU times.
Both the files are unloads for tables so they are sorted with a default order.
....
Adding my two pennies worth!
a. Kindly explain 'expensive' as per 'your site standards'; by that what I meant was, 3 min. CPU time - may not really be very expensive when we are talking about 'large amount of data' running into millions/billions of records. For fewer records - yes, that can be called as expensive.
b. If you really want to dig deeper, look at the SORTMSG output for SYNCSORT, the WER messages will be segregated into 3 parts-
i. Processing for first JOINKEYS statement
ii. Processing for second JOINKEYS statement
iii. Processing for SORT statements (COPY, OUTFIL etc.)
On looking through them you should be able to make out 'how much' resources were used at each leg of *SORT processing. I do not have a SYNCSORT manual at hand currently, but I am pretty sure there is a keyword that can help generate additional diagnostic information; and if you're feeling chivalrous, dig through the SMF records, they will give you even more information on processing data (SMF logging for *SORT should be active for this data to be written). Skimming through it should give you a clear idea on which strip of *SORT is consuming 'more', is it just the data or is it something in the code.
If you're not happy with *SORT JOINKEYS, AND the data is already sorted, go ahead, write a COBOL file-balancing code; in either case - JOINKEYS/COBOL, you're reading both datasets top-down, only difference being - in case of COBOL you expect the data to be SORTED, whereas JOINKEYS does just that for you.
Btw, looking at the way the JOINKEYS statement has been set-up - I have a strong feeling that your data is not SORTED.
If none of it works out (and you still think your SORT is costly), write to Alissa (SYNCSORT/MFX development team). She will surely be able to guide you.
Edited: Remove reference to Dfsort team as it is a competing product.