Stop reading input file based on a condition?

David Sde · New User Joined: 24 Apr 2011 Posts: 23 Location: USA

I hope someone can help...

I have a very large input file which is already sorted on a given key. I'd like to use DFSORT with OPTION COPY to copy (and modify) some of those records, and write them to another dataset. I gather that I can stop reading the input file after a given number of records, but is there a way that I can stop reading the input file based on a condition? E.g., if the file is known to be presorted alphabetically on the first byte of each record, and I know that I'm not interested in any record where the first byte is > C'D', is there a way I can tell DFSORT to stop reading when it sees a key >= C'E' ? (If it matters, these are variable-length records.)

My goal, of course, is to speed up the process by not bothering to read records that I know will be discarded anyway.

Thanks so much,

David

Akatsukami · Posted: Fri Feb 22, 2013 1:33 am

You should be able to use INCLUDE or OMIT to select records to be output (or not).

Binop B · Posted: Fri Feb 22, 2013 1:33 am

Hi David,

If, as you say, the file is sorted... why dont you just do a OMIT option with condition as OMIT if key > '<last value of concern>'

David Sde · New User Joined: 24 Apr 2011 Posts: 23 Location: USA

Yes, I can certainly use OMIT to get the output file I want; what I'm wondering is whether I can speed the process up, by not reading unwanted input records in the first place. Suppose my input file contains 500 million records. If I know that the file is presorted, and if I could tell DFSORT to stop the COPY operation when it sees a particular key, then I might be able to get away with reading only 1 million records. So what I'm wondering is whether I can do something like that.

David

Akatsukami · Posted: Fri Feb 22, 2013 3:33 am

Ah; well, I think you'll need to write an E15 exit routine, but I think that Sri Kolusu or Mr. Woodger will be much better resources than I for such an endeavor.

Bill Woodger · Posted: Fri Feb 22, 2013 5:33 am

Mmmm.... tricky. I don't know of a way to stop on a key, except by using an EXIT which is going to insert all the records for SORTING/COPYING (ie, no SORTIN specified) in which case, for a COPY operation, why bother with an EXIT rather than just a program?

If you set the RETURN-CODE to 8 from an input EXIT, the EXIT won't be called again, but the rest of the file will be processed by DFSORT.

If you set the RETURN-CODE to 16, DFSORT will end, with an RC of 16. This, or another way to get RC=16 from a MERGE (destroy the key after the one you want as your last) would be messy ways to do it, and you'd need confirmation that any output files are valid (I've never checked).

As far as I'm aware, if you give DFSORT a SORTIN, and you don't have STOPAFT, and you don't limit the number of records on all OUTFILs then DFSORT will read the entire SORTIN.

One for Kolusu :-)

Skolusu · Posted: Fri Feb 22, 2013 6:22 am

David Sde · New User Joined: 24 Apr 2011 Posts: 23 Location: USA

Bill Woodger · Posted: Fri Feb 22, 2013 7:45 am

Whether SORTIN is in sequence or not is irrelevant to a COPY operation. It is relevant to your task, of course.

Generally, if a file is already "in sequence", you'd not SORT it, so DFSORT never needs to know, for a SORT, that the data is already in sequence.

For MERGE, the "in sequence" is a necessity, so DFSORT tells you in no uncertain terms when it is not.

You can't do what you want with DFSORT unless you go for the insecure clunkiness of getting an RC=16 and DFSORT stopping dead.

So, write a program.

You'll lose out on DFSORT's superior IO performance, but gain by being able to elegantly stop where you want.

If you have 500,000,000 records, I believe you will have considerable resource savings with a random distribution of "stop keys". If all your "stop keys" are in the last 100,000,000 records, you'd have to do some comparisons of the approaches to determine the most effective.

Is the file on DASD? You could consider "splitting" it into multiples, perhaps, and have knowledge of the key ranges. When the entire file is needed, concatenate all.

The JCL/control cards to accomplish a particular extract could be generated, and either sent to the INTRDR or submitted separately for execution.

Alain Benveniste · New User Joined: 14 Feb 2005 Posts: 88

It was an enhancement I talked with Frank many years ago...

Alain