I have a very large input file which is already sorted on a given key. I'd like to use DFSORT with OPTION COPY to copy (and modify) some of those records, and write them to another dataset. I gather that I can stop reading the input file after a given number of records, but is there a way that I can stop reading the input file based on a condition? E.g., if the file is known to be presorted alphabetically on the first byte of each record, and I know that I'm not interested in any record where the first byte is > C'D', is there a way I can tell DFSORT to stop reading when it sees a key >= C'E' ? (If it matters, these are variable-length records.)
My goal, of course, is to speed up the process by not bothering to read records that I know will be discarded anyway.
Yes, I can certainly use OMIT to get the output file I want; what I'm wondering is whether I can speed the process up, by not reading unwanted input records in the first place. Suppose my input file contains 500 million records. If I know that the file is presorted, and if I could tell DFSORT to stop the COPY operation when it sees a particular key, then I might be able to get away with reading only 1 million records. So what I'm wondering is whether I can do something like that.
Joined: 03 Oct 2009 Posts: 1788 Location: Bloomington, IL
Ah; well, I think you'll need to write an E15 exit routine, but I think that Sri Kolusu or Mr. Woodger will be much better resources than I for such an endeavor.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Mmmm.... tricky. I don't know of a way to stop on a key, except by using an EXIT which is going to insert all the records for SORTING/COPYING (ie, no SORTIN specified) in which case, for a COPY operation, why bother with an EXIT rather than just a program?
If you set the RETURN-CODE to 8 from an input EXIT, the EXIT won't be called again, but the rest of the file will be processed by DFSORT.
If you set the RETURN-CODE to 16, DFSORT will end, with an RC of 16. This, or another way to get RC=16 from a MERGE (destroy the key after the one you want as your last) would be messy ways to do it, and you'd need confirmation that any output files are valid (I've never checked).
As far as I'm aware, if you give DFSORT a SORTIN, and you don't have STOPAFT, and you don't limit the number of records on all OUTFILs then DFSORT will read the entire SORTIN.
Joined: 07 Dec 2007 Posts: 2205 Location: San Jose
David Sde wrote:
Yes, I can certainly use OMIT to get the output file I want; what I'm wondering is whether I can speed the process up, by not reading unwanted input records in the first place. Suppose my input file contains 500 million records. If I know that the file is presorted, and if I could tell DFSORT to stop the COPY operation when it sees a particular key, then I might be able to get away with reading only 1 million records. So what I'm wondering is whether I can do something like that.
David
David,
There is no way to stop reading the other records just based on condition. However you can stop reading after a certain number of records are picked based on the condition.
ex:
Code:
//STEP0100 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD *
A
B
D
A
E
A
A
H
//SORTOUT DD SYSOUT=*
//SYSIN DD *
INCLUDE COND=(1,1,CH,EQ,C'A')
OPTION COPY,STOPAFT=2
//*
DFSORT would read only 4 records as the INCLUDE cond is executed first and we pick only 2 records.
If you don't specify the STOPAFT , then you are reading the entire file as DFSORT doesn't know which records match the condition until it read the entire file.
ex:
Code:
//STEP0150 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD *
A
A
A
A
A
D
E
H
//SORTOUT DD SYSOUT=*
//SYSIN DD *
INCLUDE COND=(1,1,CH,EQ,C'A')
OPTION COPY
//*
If you don't specify the STOPAFT , then you are reading the entire file as DFSORT doesn't know which records match the condition until it read the entire file.
Thank you, Kolusu... I see the issue. To do what I'm looking for, I would actually need two things: 1) the ability to specify a stop condition, and b) a way to tell DFSORT that SORTIN is presorted, so that it would reasonably know that it didn't have to read the entire file. Looks like I'll have to waste some cycles!
Thank you for your responses, everyone; I appreciate it.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Whether SORTIN is in sequence or not is irrelevant to a COPY operation. It is relevant to your task, of course.
Generally, if a file is already "in sequence", you'd not SORT it, so DFSORT never needs to know, for a SORT, that the data is already in sequence.
For MERGE, the "in sequence" is a necessity, so DFSORT tells you in no uncertain terms when it is not.
You can't do what you want with DFSORT unless you go for the insecure clunkiness of getting an RC=16 and DFSORT stopping dead.
So, write a program.
You'll lose out on DFSORT's superior IO performance, but gain by being able to elegantly stop where you want.
If you have 500,000,000 records, I believe you will have considerable resource savings with a random distribution of "stop keys". If all your "stop keys" are in the last 100,000,000 records, you'd have to do some comparisons of the approaches to determine the most effective.
Is the file on DASD? You could consider "splitting" it into multiples, perhaps, and have knowledge of the key ranges. When the entire file is needed, concatenate all.
The JCL/control cards to accomplish a particular extract could be generated, and either sent to the INTRDR or submitted separately for execution.