|
View previous topic :: View next topic
|
| Author |
Message |
santoshn
New User
Joined: 01 Jul 2010 Posts: 5 Location: india
|
|
|
|
Hi,
I have a requirement where I need to find delta records between two files having around 20M data.
The file has LRECL of 1700 with 25 distinct fields.
I am using JOINKEYS to match all the fields and writing the not matched to delta file.
The sort works fine but it is taking a lot of CPU time for processing around 3 minutes.
Is there any alternative to JOINKEYS that can be implemented to reduce CPU consumption.
Thanks. |
|
| Back to top |
|
 |
Robert Sample
Global Moderator

Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
Unless you develop a method of getting the results you want without using JOINKEYS, it is unlikely you are going to be able to reduce CPU much -- if any. SORT is already a highly optimized product and hence there is rarely anything an applications programmer can do to reduce its resource usage. Usually the only way to improve performance is to move to a newer machine (assuming your site is not running a z13/z13s already).
Furthermore, 3 minutes of CPU time for 20 million records works out to something like 110,000 records per second of CPU time -- which is pretty good in itself. Why do you think 3 minutes of CPU time is excessive for what you are doing? |
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
| What's your expectations here ? How much time you think it should take ? Show us the SORT card? Are the datasets sorted already? |
|
| Back to top |
|
 |
santoshn
New User
Joined: 01 Jul 2010 Posts: 5 Location: india
|
|
|
|
Sorry I could not reply earlier. I had access issues on mainframes.
This job is one of more expensive jobs so we are checking if any alternative can be implemented to reduce CPU times.
Both the files are unloads for tables so they are sorted with a default order.
Below is the SORT which compares and send delta:
| Code: |
SORT FIELDS=COPY
JOINKEYS FILES=F1,FIELDS=(01,99,A,102,105,A,207,139,A,346,240,A,
586,47,A,633,254,A,887,254,A,1141,83,A,
1224,240,A,1464,21,A,1511,93,A,1604,3,A,
1607,17,A)
JOINKEYS FILES=F2,FIELDS=(01,99,A,102,105,A,207,139,A,346,240,A,
586,47,A,633,254,A,887,254,A,1141,83,A,
1224,240,A,1464,21,A,1511,93,A,1604,3,A,
1607,17,A)
JOIN UNPAIRED,F2,ONLY
REFORMAT FIELDS=(F2:01,1623)
OUTFIL FNAMES=SORTOUT,
BUILD=(1:1,1623) |
Coded for you
Do it yourself next time |
|
| Back to top |
|
 |
Nic Clouston
Global Moderator
Joined: 10 May 2007 Posts: 2454 Location: Hampshire, UK
|
|
|
|
| By default JOINKEYS sorts both data sets (they are not "files") unless told not to by the SORTED keyword - which I do not see in your control statements. |
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
1.Can't it be possible to try unload delta by some sql query using batch spufi or BMC unload?
2. Try COMPAREX
3.as suggested add SORTED keyword during the JOINKEYS. |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2264 Location: USA
|
|
|
|
1. There are only two gaps in the whole record which are not used as join keys: bytes 100 to 101 (=2 bytes), and bytes 1485 to 1510 (=26 bytes)
It might give a minor performance improvement if all adjacent join keys were combined into three groups each considered as long join key:
| Code: |
FIELDS=(1,99,A,
102,1362,A,
1511,113,A) |
2. A more significant improvement in performance might be expected only by providing at least one (better two) of input datasets to appear as pre-sorted before this join. Then extra keyword needs to be specified for the sorted field(s):
| Code: |
| FIELDS=(...........),SORTED |
|
|
| Back to top |
|
 |
vasanthz
Global Moderator

Joined: 28 Aug 2007 Posts: 1750 Location: Tirupur, India
|
|
|
|
| Quote: |
1.Can't it be possible to try unload delta by some sql query using batch spufi or BMC unload?
2. Try COMPAREX |
On what basis are you suggesting that option 1 or 2 is efficient than JOINKEYS?
Also BMC unload and comparex are licensed products and are not available on all shops. |
|
| Back to top |
|
 |
Robert Sample
Global Moderator

Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
| Quote: |
| This job is one of more expensive jobs so we are checking if any alternative can be implemented to reduce CPU times. |
If you're looking at a job using 3 minutes of CPU because it is "one of more expensive jobs" (sic) at your site, then you're pretty much wasting your time. If it was using 60 or 90 minutes of CPU time, then you'd be justified in looking at the job; for 3 minutes of CPU time, why bother? Even if you cut CPU time in half (which is most likely impossible based on what you've posted so far), and the job runs daily, you have saved 90 seconds of CPU time or 0.1% of the CPU available for the day (if your site has more than one CP processor in your CEC, the percentage goes down) -- hardly worth spending much time on! And if your site bought the machine, then you're not saving any money until a new machine is purchased (and not a lot then). |
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
| Quote: |
On what basis are you suggesting that option 1 or 2 is efficient than JOINKEYS?
Also BMC unload and comparex are licensed products and are not available on all shops. |
I would leave that to try and acknowledge you by TS. We have them using at my shop and if TS don't have it then he is out of choice(s). |
|
| Back to top |
|
 |
santoshn
New User
Joined: 01 Jul 2010 Posts: 5 Location: india
|
|
|
|
| Nic Clouston wrote: |
| By default JOINKEYS sorts both data sets (they are not "files") unless told not to by the SORTED keyword - which I do not see in your control statements. |
Thanks let me try using Sorted keyword. |
|
| Back to top |
|
 |
Abid Hasan
New User
Joined: 25 Mar 2013 Posts: 88 Location: India
|
|
|
|
Hello,
| santoshn wrote: |
...
This job is one of more expensive jobs so we are checking if any alternative can be implemented to reduce CPU times.
Both the files are unloads for tables so they are sorted with a default order.
.... |
Adding my two pennies worth!
a. Kindly explain 'expensive' as per 'your site standards'; by that what I meant was, 3 min. CPU time - may not really be very expensive when we are talking about 'large amount of data' running into millions/billions of records. For fewer records - yes, that can be called as expensive.
b. If you really want to dig deeper, look at the SORTMSG output for SYNCSORT, the WER messages will be segregated into 3 parts-
i. Processing for first JOINKEYS statement
ii. Processing for second JOINKEYS statement
iii. Processing for SORT statements (COPY, OUTFIL etc.)
On looking through them you should be able to make out 'how much' resources were used at each leg of *SORT processing. I do not have a SYNCSORT manual at hand currently, but I am pretty sure there is a keyword that can help generate additional diagnostic information; and if you're feeling chivalrous, dig through the SMF records, they will give you even more information on processing data (SMF logging for *SORT should be active for this data to be written). Skimming through it should give you a clear idea on which strip of *SORT is consuming 'more', is it just the data or is it something in the code.
If you're not happy with *SORT JOINKEYS, AND the data is already sorted, go ahead, write a COBOL file-balancing code; in either case - JOINKEYS/COBOL, you're reading both datasets top-down, only difference being - in case of COBOL you expect the data to be SORTED, whereas JOINKEYS does just that for you.
Btw, looking at the way the JOINKEYS statement has been set-up - I have a strong feeling that your data is not SORTED.
If none of it works out (and you still think your SORT is costly), write to Alissa (SYNCSORT/MFX development team). She will surely be able to guide you.
Edited: Remove reference to Dfsort team as it is a competing product. |
|
| Back to top |
|
 |
magesh23586
Active User

Joined: 06 Jul 2009 Posts: 213 Location: Chennai
|
|
|
|
Remove following statements.
| Code: |
OUTFIL FNAMES=SORTOUT,
BUILD=(1:1,1623)
|
I dont think the data is in sorted order, if it is in sorted order, specify SORTED,NOSEQCK in Joinkeys statement. |
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|