IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Reduce CPU Times for Join Sort


IBM Mainframe Forums -> SYNCSORT
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
santoshn

New User


Joined: 01 Jul 2010
Posts: 5
Location: india

PostPosted: Sat Jun 10, 2017 1:40 pm
Reply with quote

Hi,
I have a requirement where I need to find delta records between two files having around 20M data.
The file has LRECL of 1700 with 25 distinct fields.
I am using JOINKEYS to match all the fields and writing the not matched to delta file.
The sort works fine but it is taking a lot of CPU time for processing around 3 minutes.
Is there any alternative to JOINKEYS that can be implemented to reduce CPU consumption.

Thanks.
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8700
Location: Dubuque, Iowa, USA

PostPosted: Sat Jun 10, 2017 7:11 pm
Reply with quote

Unless you develop a method of getting the results you want without using JOINKEYS, it is unlikely you are going to be able to reduce CPU much -- if any. SORT is already a highly optimized product and hence there is rarely anything an applications programmer can do to reduce its resource usage. Usually the only way to improve performance is to move to a newer machine (assuming your site is not running a z13/z13s already).

Furthermore, 3 minutes of CPU time for 20 million records works out to something like 110,000 records per second of CPU time -- which is pretty good in itself. Why do you think 3 minutes of CPU time is excessive for what you are doing?
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3076
Location: NYC,USA

PostPosted: Sun Jun 11, 2017 11:22 am
Reply with quote

What's your expectations here ? How much time you think it should take ? Show us the SORT card? Are the datasets sorted already?
Back to top
View user's profile Send private message
santoshn

New User


Joined: 01 Jul 2010
Posts: 5
Location: india

PostPosted: Wed Jun 14, 2017 7:18 pm
Reply with quote

Sorry I could not reply earlier. I had access issues on mainframes.
This job is one of more expensive jobs so we are checking if any alternative can be implemented to reduce CPU times.
Both the files are unloads for tables so they are sorted with a default order.
Below is the SORT which compares and send delta:
Code:
SORT FIELDS=COPY                                                   
JOINKEYS FILES=F1,FIELDS=(01,99,A,102,105,A,207,139,A,346,240,A,   
                          586,47,A,633,254,A,887,254,A,1141,83,A, 
                          1224,240,A,1464,21,A,1511,93,A,1604,3,A,
                          1607,17,A)                               
                                                                   
JOINKEYS FILES=F2,FIELDS=(01,99,A,102,105,A,207,139,A,346,240,A,   
                          586,47,A,633,254,A,887,254,A,1141,83,A, 
                          1224,240,A,1464,21,A,1511,93,A,1604,3,A,
                          1607,17,A)                               
                                                                   
JOIN UNPAIRED,F2,ONLY                                             
REFORMAT FIELDS=(F2:01,1623)                                       
OUTFIL FNAMES=SORTOUT,                                             
BUILD=(1:1,1623)


Coded for you
Do it yourself next time
Back to top
View user's profile Send private message
Nic Clouston

Global Moderator


Joined: 10 May 2007
Posts: 2454
Location: Hampshire, UK

PostPosted: Wed Jun 14, 2017 7:49 pm
Reply with quote

By default JOINKEYS sorts both data sets (they are not "files") unless told not to by the SORTED keyword - which I do not see in your control statements.
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3076
Location: NYC,USA

PostPosted: Wed Jun 14, 2017 8:00 pm
Reply with quote

1.Can't it be possible to try unload delta by some sql query using batch spufi or BMC unload?
2. Try COMPAREX
3.as suggested add SORTED keyword during the JOINKEYS.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2147
Location: USA

PostPosted: Wed Jun 14, 2017 9:48 pm
Reply with quote

1. There are only two gaps in the whole record which are not used as join keys: bytes 100 to 101 (=2 bytes), and bytes 1485 to 1510 (=26 bytes)
It might give a minor performance improvement if all adjacent join keys were combined into three groups each considered as long join key:
Code:
FIELDS=(1,99,A,
        102,1362,A,
        1511,113,A)


2. A more significant improvement in performance might be expected only by providing at least one (better two) of input datasets to appear as pre-sorted before this join. Then extra keyword needs to be specified for the sorted field(s):
Code:
FIELDS=(...........),SORTED
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1745
Location: Tirupur, India

PostPosted: Wed Jun 14, 2017 10:58 pm
Reply with quote

Quote:
1.Can't it be possible to try unload delta by some sql query using batch spufi or BMC unload?
2. Try COMPAREX

On what basis are you suggesting that option 1 or 2 is efficient than JOINKEYS?
Also BMC unload and comparex are licensed products and are not available on all shops.
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8700
Location: Dubuque, Iowa, USA

PostPosted: Thu Jun 15, 2017 6:21 pm
Reply with quote

Quote:
This job is one of more expensive jobs so we are checking if any alternative can be implemented to reduce CPU times.
If you're looking at a job using 3 minutes of CPU because it is "one of more expensive jobs" (sic) at your site, then you're pretty much wasting your time. If it was using 60 or 90 minutes of CPU time, then you'd be justified in looking at the job; for 3 minutes of CPU time, why bother? Even if you cut CPU time in half (which is most likely impossible based on what you've posted so far), and the job runs daily, you have saved 90 seconds of CPU time or 0.1% of the CPU available for the day (if your site has more than one CP processor in your CEC, the percentage goes down) -- hardly worth spending much time on! And if your site bought the machine, then you're not saving any money until a new machine is purchased (and not a lot then).
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3076
Location: NYC,USA

PostPosted: Thu Jun 15, 2017 9:30 pm
Reply with quote

Quote:
On what basis are you suggesting that option 1 or 2 is efficient than JOINKEYS?
Also BMC unload and comparex are licensed products and are not available on all shops.
I would leave that to try and acknowledge you by TS. We have them using at my shop and if TS don't have it then he is out of choice(s).
Back to top
View user's profile Send private message
santoshn

New User


Joined: 01 Jul 2010
Posts: 5
Location: india

PostPosted: Fri Jun 16, 2017 11:09 am
Reply with quote

Nic Clouston wrote:
By default JOINKEYS sorts both data sets (they are not "files") unless told not to by the SORTED keyword - which I do not see in your control statements.


Thanks let me try using Sorted keyword.
Back to top
View user's profile Send private message
Abid Hasan

New User


Joined: 25 Mar 2013
Posts: 88
Location: India

PostPosted: Fri Jun 16, 2017 12:12 pm
Reply with quote

Hello,

santoshn wrote:
...
This job is one of more expensive jobs so we are checking if any alternative can be implemented to reduce CPU times.
Both the files are unloads for tables so they are sorted with a default order.
....


Adding my two pennies worth!

a. Kindly explain 'expensive' as per 'your site standards'; by that what I meant was, 3 min. CPU time - may not really be very expensive when we are talking about 'large amount of data' running into millions/billions of records. For fewer records - yes, that can be called as expensive.

b. If you really want to dig deeper, look at the SORTMSG output for SYNCSORT, the WER messages will be segregated into 3 parts-
i. Processing for first JOINKEYS statement
ii. Processing for second JOINKEYS statement
iii. Processing for SORT statements (COPY, OUTFIL etc.)

On looking through them you should be able to make out 'how much' resources were used at each leg of *SORT processing. I do not have a SYNCSORT manual at hand currently, but I am pretty sure there is a keyword that can help generate additional diagnostic information; and if you're feeling chivalrous, dig through the SMF records, they will give you even more information on processing data (SMF logging for *SORT should be active for this data to be written). Skimming through it should give you a clear idea on which strip of *SORT is consuming 'more', is it just the data or is it something in the code.

If you're not happy with *SORT JOINKEYS, AND the data is already sorted, go ahead, write a COBOL file-balancing code; in either case - JOINKEYS/COBOL, you're reading both datasets top-down, only difference being - in case of COBOL you expect the data to be SORTED, whereas JOINKEYS does just that for you.

Btw, looking at the way the JOINKEYS statement has been set-up - I have a strong feeling that your data is not SORTED.


If none of it works out (and you still think your SORT is costly), write to Alissa (SYNCSORT/MFX development team). She will surely be able to guide you.

Edited: Remove reference to Dfsort team as it is a competing product.
Back to top
View user's profile Send private message
magesh23586

Active User


Joined: 06 Jul 2009
Posts: 213
Location: Chennai

PostPosted: Sat Jun 17, 2017 11:10 pm
Reply with quote

Remove following statements.
Code:

OUTFIL FNAMES=SORTOUT,                                             
BUILD=(1:1,1623)


I dont think the data is in sorted order, if it is in sorted order, specify SORTED,NOSEQCK in Joinkeys statement.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> SYNCSORT

 


Similar Topics
Topic Forum Replies
No new posts Help with SORT - I need to validate d... DFSORT/ICETOOL 10
No new posts JCL sort to compare dates in two file... DFSORT/ICETOOL 2
No new posts Is this possible via sort (in one pass)? SYNCSORT 4
No new posts GDG generation name to GDG Base name ... DFSORT/ICETOOL 3
No new posts SORT on detail record, then repeat he... DFSORT/ICETOOL 3
Search our Forums:

Back to Top