In my organization, the same file is sorted in different order in different steps to create multiple output files which are used for further processing by COBOL programs.
The input file contains million of records. We use DFSORT to achieve the above sorting.
My question is, will it be better in terms of cost and performance to use ICETOOL and multiple control statements to sort the same file in the same step to create multiple files.
I read in this in forum that, using a single step parsing is not advisable as in case the SORT fails on the second control statement, then restarting will perform the first control statement again.
Joined: 22 Apr 2006 Posts: 6250 Location: Mumbai, India
The short answer is - it depends. Actually, it depends what the sort steps are doing with the data. I'm not sure where did you read this:
Quote:
I read in this in forum that, using a single step parsing is not advisable as in case the SORT fails on the second control statement, then restarting will perform the first control statement again.
- multiple passes on the same data is usually not advised but it all depends if you need the pass or not.
Showing some sample records and the sort-statments in question might help you to get a better answer.
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
Indrajit,
There is usually no significant difference in performance between using multiple DFSORT steps, and using one ICETOOL step with multiple operators, providing they do the same thing.
Joined: 28 Jul 2006 Posts: 1702 Location: Australia
Hi,
go for separate steps, in the event of a failure, you only need to rerun the failed step. Also I assume that you would need 5 times the amount of work space is a single step so the more likelihood of a failure.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Quote:
the same file is sorted in different order in different steps to create multiple output files which are used for further processing by COBOL programs.
Are you saying you have one job which sorts the data several times, finishes, and then the jobs executing the Cobol programs kick in?
If yes, you could split the sort steps into seperate jobs. You have scheduler software I assume?
Run the first sort job. On completion,r elease the "Cobol Job" which uses the output of the sort and the next sort Job. Etc until end.
You could ask your storage people what they think about running more than one sort from the same input at the same time (bits might start falling off the drive if you try them all, I don't know, but they should).
Are the existing sort steps doing anything except sorting? (ie, OMIT/INCLUDE, any reformatting of data, anything except a straightforward sort).
If they are just sorts, once the first is complete, you have two datasets, so you can kick-off two sorts, then you'll have four datasets etc. Soon knock 'em away.
If you judiciously choose datasets which are "closest" to the order for the next sort, you'll possible-maybe-perhaps get a quicker sort anyway.
If you let us know a few more details, we might be able to provide clearer advice.
The sort steps are simple sort with INCLUDE/OMIT conditions wherein we sort the file based on different fields. In some cases a OUTREC statement is also being used.
There are instances wherein the same file is sorted as many as 20 times in different order in different sort steps.
So what I was thinking is, once the main file is created, have a single job to sort the file and split into different sorted files.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Well, there doesn't seem to be any/much benefit in terms of performance from jamming them all into one step. If you have 20 sorts, sort will read the data 20 times. If you put it into one step with ICETOOL, sort (which is what will be used from Icetool) will read the data 20 times.
If you are sorting a file 20 times, you have to read all the data 20 times - subject to an analysis of the keys/data used which could yield some savings.
If something in the step crashes, you would either have to amend the cards before re-running or run everything that is already finished again.
So, I think the seperate steps might suit better. Are these "steps" all in the same job? If yes, break them into seperate jobs so that the successor jobs can get going as soon as data is ready. If it is already like that, describe what problem you feel there is with the existing approach.
Indrajit_57,
It will be beneficial for all of us if you provide current sort cards and what you are trying to achieve. Without looking at sort card, its a total guess and you are not moving forward.
If you do decide to post sort cards, please provide RECFM/LRECL for all the file(s) and corresponding field position and length affecting sort card.
Here are the details. Note that the same set of files are getting used in PS116 and PS130.
The following output files are used by different COBOL programs for further processing: -
&SYSUID..FFB.TEST.FILE.X16 (Created in PS122)
&SYSUID..FFB.TEST.FILE.X01 (Created in PS124)
&SYSUID..FFB.TEST.FILE.X12.MG (Created in PS130)
&SYSUID.FILE.FINAL (Created in PS139)
Note this entire job takes around 1 hr to complete for 5257784 records.. Is there any way by which we can achieve the tuning by using ICETOOL or any other mechanism. Any direction in this regard will be useful.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Did you read the earlier responses?
It is unlikely that you will see much difference between SORT doing the job and ICETOOL doing the job when the job is the same thing, says Frank Yaeger, so you can rely on that.
If you rationalise the Job, you will be able to start your Cobol programs earlier.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
It would seem that you could at least split the job in two at 130. I'm definitely not going to check that. That at least could get your Cobol programs going about half-an-hour earlier.
Can any of the sort cards for one step be included in another, is that what you really want to know? You have to look at the most likely candidates, and then for a way to present the question so that someone might want to look into it, not as a 400-line lump.
However on sqlcode1 requests, I had provided, what the code is currently doing.
To myself, beware of what you ask for.
See if below UNTESTED works...
1) you still haven't provided LRECL but I am going to assume input is FB/500.
2) I changed/merged/removed some of the steps with some comments.
3) Why are you creating FFB.TEST.FILE.X12.MG file, as I don't see that being used in this job? Is it used somewhere else? In 1 or 2 places, I see some temp. intermediate files, if you don't use them in other jobs, better to combine them or remove them.
4) For efficiency, you may want to run this job, by your space management team for suggesion on creating output file(s).
Thanks sqlcode1. I used the above job and the time is reduced considerbaly. It takes now around 25 min to complete the job.
The only issue that I face now is with X12.MG. In this file the file header and trailer are coming at the end.
000000X120
000000X129
000000Y120
000000Y129
000000Z120
000000Z129
0000000000 -> This should be at the start of the file
001001X121DETAIL RECORDS
9999999999
The expected output is
0000000000
000000X120
000000X129
000000Y120
000000Y129
000000Z120
000000Z129
001001X121DETAIL RECORDS
9999999999