I need to Sort a file of 800 Length. My conditions are:
1. Include the records meeting the condition INCLUDE COND=(10,3,CH,C'025') and SORT the fields meeting the above critera i.e. SORT FIELDS=(1,10,CH,A,20,5,CH,A) and eliminate the duplicate records.
2. Include the records meeting the condition INCLUDE COND=(310,3,CH,C'025') and SORT the fields meeting the above critera i.e. SORT FIELDS=(100,10,CH,A,200,5,CH,A,251,CH,3,A) and eliminate the duplicate records.
I want to achieve both aforementioned in Single Sort Card. Please help if its achievable.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
You can't do a SORT and INCLUDE in a single "sort card", let alone two pairs of them. Plus you want to SUM. You want three different operations, twice each, on one "card". Nope.
You want to do two different sorts in the same step? Use the TOOL. You'll still read the data twice, so not a great deal of point, but there you go.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
OK, having received a swift kick and a hint the size of a small range of mountains, there is a way. Serves me right for being cheeky.
Are alll your records the same format? If so, you can make one INCLUDE with an OR and that will give you all the selection of records that youwant.
If the records are of different types, then you'd need to include a check on the type to know that the check for the include was valid for that record type, but basically the same thing would work.
The INCLUDE/OMIT is before the INREC stage, which is where we go next.
You have keys in two different places, so you need the key in one place. You can use INREC with IFTHENs for the same selction details you have for the INCLUDE and build a sort key with OVERLAY, that is in the same place for both selections and which contains an indicator in a fixed position as to which criteria that reocrd matched (to prevent the possibility of accidental "duplicates" in error). The SORT key can just be a chunk of bytes that happens to have one set of values for one selection, another set of values for the other.
SORT on the new sort key.
SUM on the new sort key.
Distribute records to appropriate seperate files without the extra key (or whatver you want to do with them, you didn't say).
Where you create your key depends on whether the selected records are the same length or not, the front of the record (shoving everything else along), if not else the back of the record.
There would be one little tickly thing, which can be dealt with, which is if one input record can match both the include criteria. The second IFTHEN, with HIT=NEXT, would have to write an additional record for its criteria.
All this is extra processing, but saves two passes of the file, which unless the data is a trivial amount would likely cost more.
You have to do it well, not mess up anything, even bullet-proof it though you think it unnecessary.
For further assistance we'd need the usual details (input and output recfm/lrecll) plus everything to fill in the gaps above answered. Oh, and which SORT product do you use.
I hope I haven't fumbled the hint :-) Probably have.
I spit out my coffee reading this one! What this does is still only one sort. Following up on the keep it simple comment, I'd not want to work in a shop that had man sort cards like this.
Bill Woodger wrote:
OK, having received a swift kick and a hint the size of a small range of mountains, there is a way. Serves me right for being cheeky.
Are alll your records the same format? If so, you can make one INCLUDE with an OR and that will give you all the selection of records that youwant.
If the records are of different types, then you'd need to include a check on the type to know that the check for the include was valid for that record type, but basically the same thing would work.
The INCLUDE/OMIT is before the INREC stage, which is where we go next.
You have keys in two different places, so you need the key in one place. You can use INREC with IFTHENs for the same selction details you have for the INCLUDE and build a sort key with OVERLAY, that is in the same place for both selections and which contains an indicator in a fixed position as to which criteria that reocrd matched (to prevent the possibility of accidental "duplicates" in error). The SORT key can just be a chunk of bytes that happens to have one set of values for one selection, another set of values for the other.
SORT on the new sort key.
SUM on the new sort key.
Distribute records to appropriate seperate files without the extra key (or whatver you want to do with them, you didn't say).
Where you create your key depends on whether the selected records are the same length or not, the front of the record (shoving everything else along), if not else the back of the record.
There would be one little tickly thing, which can be dealt with, which is if one input record can match both the include criteria. The second IFTHEN, with HIT=NEXT, would have to write an additional record for its criteria.
All this is extra processing, but saves two passes of the file, which unless the data is a trivial amount would likely cost more.
You have to do it well, not mess up anything, even bullet-proof it though you think it unnecessary.
For further assistance we'd need the usual details (input and output recfm/lrecll) plus everything to fill in the gaps above answered. Oh, and which SORT product do you use.
I hope I haven't fumbled the hint :-) Probably have.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Sorry about the coffee... hupe your keyboard still works... and that you're not stuck with a dry-cleaning bill for yourself plus even-more-unfortunate colleagues :-)
When you say it is "still" only one sort, the requirement from the TS is that it be done in one sort (if possible).
For an unexceptional amount of data, I'd agree that an element of complexity has been introduced, although not as much as might be thought.
Say you have a 100m records or so. May be different. Personally I'd do the includes in one step, out to two seperate files and carry on from there, but what if 95% of the data is to be extracted (or whatever % anyone feels starts to become uncomfortable reading it twice)?
I'm a big fan, myself, of KISS, but I also like to bear in mind NISS (Not if Solution Sucks) for the actual situation to hand.
Say you have a 100m records or so. May be different. Personally I'd do the includes in one step, out to two seperate files and carry on from there, but what if 95% of the data is to be extracted (or whatever % anyone feels starts to become uncomfortable reading it twice)?
I'm a big fan, myself, of KISS, but I also like to bear in mind NISS (Not if Solution Sucks) for the actual situation to hand.
Point taken. I'm curious if at sufficient size that reading the data twice is an issue whether the resources to do the sort would make it a wash. I think I do some tests.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Doesn't look too bad? This is assuming no "conflicts" in the records, and does not split to two files if that is what is wanted (no mention from TS, I made that bit up myself).
Obviously the symnames should be more meaningful, but I don't know what the data might be.
Code:
//SRTINONE EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD *
123 RECORD 0 LONELY, BUT INCLUDE
123 RECORD 1 DUP TO STAY
123 RECORD 1 DUP TO GO
EEE RECORD 8
DDD RECORD 6
REC # 3 456 LONELY, BUT INCLUDE
REC # 2 456 DUP TO STAY
REC # 2 456 DUP TO GO
REC # 1 456 LONELY, BUT INCLUDE
REC # 6 789
REC # 4 123
//SORTOUT DD SYSOUT=*
//SYMNOUT DD SYSOUT=*
//SYMNAMES DD *
SELECTION-ONE-FIELD,1,3,CH
SELECTION-ONE-VALUE,C'123'
SELECTION-ONE-2NDKEY,12,1,CH
SELECTION-TWO-FIELD,10,2,CH
SELECTION-TWO-VALUE,C'456'
SELECTION-TWO-2NDKEY,1,9,CH
OVERLAID-KEY,50,12,CH
//SYSIN DD *
INCLUDE COND=(SELECTION-ONE-FIELD,CH,EQ,SELECTION-ONE-VALUE,
OR,
SELECTION-TWO-FIELD,CH,EQ,SELECTION-TWO-VALUE)
123 RECORD 0 LONELY, BUT INCLUDE 1230
123 RECORD 1 DUP TO STAY 1231
REC # 1 456 LONELY, BUT INCLUDE 45REC # 1
REC # 2 456 DUP TO STAY 45REC # 2
REC # 3 456 LONELY, BUT INCLUDE 45REC # 3
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
I've extended this, only slightly, so that non-mutually exclusive selections can be coped with. In this example, if both selection criteria are met, the sort key for the first selection is used. Also fully "symbolled it up".
I've put some comments in. I'll show first without the comments:
Code:
/SRTINONE EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD *
123 RECOR456 LONELY, BUT INCLUDE IN 123 GROUP
123 RECORD 1 DUP TO STAY
123 RECORD 1 DUP TO GO
EEE RECORD 8
DDD RECORD 6
REC # 3 456 LONELY, BUT INCLUDE
REC # 2 456 DUP TO STAY
REC # 2 456 DUP TO GO
REC # 1 456 LONELY, BUT INCLUDE
REC # 6 789
REC # 4 123
//SORTOUT DD SYSOUT=*
//SYMNOUT DD SYSOUT=*
//SYMNAMES DD *
SELECTION-ONE-FIELD,1,3,CH
SELECTION-ONE-VALUE,C'123'
SELECTION-ONE-2NDKEY,12,1,CH
SELECTION-TWO-FIELD,10,2,CH
SELECTION-TWO-VALUE,C'456'
SELECTION-TWO-2NDKEY,1,9,CH
OVERLAID-KEY,52,13,CH
BOTH-SELECTION-VALUES,50,2,CH
OVERLAY-ONE-START,50
OVERLAY-TWO-START,51
OVERLAY-ONE-AND-TWO-START,52
INDICATE-OVERLAY-ONE,C'1'
INDICATE-OVERLAY-TWO,C'2'
OVERLAY-ONE-AND-TWO,C'12'
//SYSIN DD *
INCLUDE COND=(SELECTION-ONE-FIELD,
EQ,
SELECTION-ONE-VALUE,
OR,
SELECTION-TWO-FIELD,
EQ,
SELECTION-TWO-VALUE)
//SRTINONE EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD *
123 RECOR456 LONELY, BUT INCLUDE IN 123 GROUP
123 RECORD 1 DUP TO STAY
123 RECORD 1 DUP TO GO
EEE RECORD 8
DDD RECORD 6
REC # 3 456 LONELY, BUT INCLUDE
REC # 2 456 DUP TO STAY
REC # 2 456 DUP TO GO
REC # 1 456 LONELY, BUT INCLUDE
REC # 6 789
REC # 4 123
//SORTOUT DD SYSOUT=*
//SYMNOUT DD SYSOUT=*
//SYMNAMES DD *
* SYMNAMES USED TO DEFINE TWO DIFFERENT SORT KEYS, FOR SORTING
* AT THE SAME TIME.
* THE DEFINITION OF THE FIRST FIELD REQUIRED - CHANGE 1,3,CH AS
* NEEDED
*
SELECTION-ONE-FIELD,1,3,CH
* THE VALUE TO BE USED FOR THE FIRST SELECTION FROM INPUT - CHANGE
* C'123' AS NEEDED
*
SELECTION-ONE-VALUE,C'123'
* THE SECOND ELEMENT OF THE FIRST SELECTION KEY, IF REQUIRED - CHANGE
* 12,1,CH AS NEEDED, INCLUDE ADDITIONAL KEY PART DEFINITIONS AS NEEDED
*
SELECTION-ONE-2NDKEY,12,1,CH
* FOR THE SECOND SELECTION, READ FOR THE FIRST ABOVE AN APPLY TO YOUR
* SITUATION
*
SELECTION-TWO-FIELD,10,2,CH
SELECTION-TWO-VALUE,C'456'
SELECTION-TWO-2NDKEY,1,9,CH
* THIS IS THE KEY WHICH IS CREATED, IN THIS CASE WITH OVERLAY, AT THIS
* POSITION, WITH THIS LENGTH AND OF THIS DATA TYPE - CHANGE AS NEEDED
*
OVERLAID-KEY,52,13,CH
* IN THIS EXAMPLE TWO DIFFERENT SETS OF DATA ARE SELECTED, AND THIS
* DEFINES AN AREA WHICH CONTAINS '1 ', ' 2' OR '12' AT THE TIME THIS
* FIELD IS USED - CAN BE REMOVED (ALONG WITH CODE BELOW) IF NOT RELEVANT
*
BOTH-SELECTION-VALUES,50,2,CH
* THREE DIFFERENT OVERLAYS ARE DONE: FOR THE FIRST AND SECOND SETS OF
* DATA AND FOR THE SITUATION WHERE ARE KEY NEEDS TO BE SUBSTITUTED WHEN
* THE SELECTIONS ARE NOT MUTUALLY EXCLUSIVE
*
OVERLAY-ONE-START,50
OVERLAY-TWO-START,51
OVERLAY-ONE-AND-TWO-START,52
* THE VALUES USED TO INDICATE THE FIRST OR SECOND SET OF DATA
*
INDICATE-OVERLAY-ONE,C'1'
INDICATE-OVERLAY-TWO,C'2'
* THE VALUE USED TO TEST IF THE RECORD MATCHES BOTH SELECTION CRITERIA
*
OVERLAY-ONE-AND-TWO,C'12'
//SYSIN DD *
* INCLUDE THE RECORDS WANTED, NO NEED TO RESTRICT TO ONLY ONE PER
* SET OF DATA, OR TWO SETS OF DATA - AS MANY OF EITHER, WITHIN REASON
*
INCLUDE COND=(SELECTION-ONE-FIELD,
EQ,
SELECTION-ONE-VALUE,
OR,
SELECTION-TWO-FIELD,
EQ,
SELECTION-TWO-VALUE)
* AT THIS POINT WE HAVE OUR UNSORTED SELECTION OF DATA, SO NOW TAKE
* THE TWO DIFFERENT KEYS AND PUT THEM IN THE SAME PLACE, TO ALLOW BOTH
* TO BE SORTED
* THE FIRST IFTHEN DEALS WITH THE FIRST SET OF DATA, THE SECOND, WELL,
* YOU GET THE PICTURE
* THE THIRD IFTHEN IS ONLY NEEDED WHEN THE SELECTIONS ARE NOT MUTUALLY
* EXCLUSIVE. IF NOT NEEDED, THE HIT=NEXTS ARE NOT NEEDED EITHER
*
INREC IFTHEN=(WHEN=(SELECTION-ONE-FIELD,
EQ,
SELECTION-ONE-VALUE),
* INCLUDE INDICATOR FOR WHICH SELECTION MATCHED, A PAD FOR THE SECOND
* SELECTION (THESE ARE ONLY NEEDED FOR NON-MUTUALLY EXCLUSIVE SETS)
OVERLAY=(OVERLAY-ONE-START:INDICATE-OVERLAY-ONE,X,
SELECTION-ONE-FIELD,
SELECTION-ONE-2NDKEY),
HIT=NEXT),
IFTHEN=(WHEN=(SELECTION-TWO-FIELD,
EQ,
SELECTION-TWO-VALUE),
* START THE OVERLAY ONE BYTE LATER, IN CASE SET ABOVE
OVERLAY=(OVERLAY-TWO-START:INDICATE-OVERLAY-TWO,
SELECTION-TWO-FIELD,
SELECTION-TWO-2NDKEY),
HIT=NEXT),
IFTHEN=(WHEN=(BOTH-SELECTION-VALUES,EQ,OVERLAY-ONE-AND-TWO),
* START THE OVERLAY TWO BYTES LATER TO PRESERVE THE INDICATORS (ALTHOUGH
* NOT NEEDED IN THIS EXAMPLE)
OVERLAY=(OVERLAY-ONE-AND-TWO-START:SELECTION-ONE-FIELD,
SELECTION-ONE-2NDKEY,
8X))
* SORT ON THE KEY THAT HAS BEEN CREATED
*
SORT FIELDS=(OVERLAID-KEY,A),EQUALS
* DROP THE DUPLICATES
*
SUM FIELDS=NONE
//*
For anyone who can't live without the normal-looking control cards, they are generated after the symbol substitution:
Code:
INCLUDE COND=(1,3,CH,EQ,C'123',OR,10,2,CH,EQ,C'456')
INREC IFTHEN=(WHEN=(1,3,CH,EQ,C'123'),OVERLAY=(50:C'1',X,1,3,12,1),HIT*
=NEXT),IFTHEN=(WHEN=(10,2,CH,EQ,C'456'),OVERLAY=(51:C'2'*
,10,2,1,9),HIT=NEXT),IFTHEN=(WHEN=(50,2,CH,EQ,C'12'),OVE*
RLAY=(52:1,3,12,1,8X))
SORT FIELDS=(52,13,CH,A),EQUALS
SUM FIELDS=NONE
As well as self-documentation and extended readability the symnames add flexibility. To entirely change the position, length and type of all the keys only requires changes to the sybols. Where the same "thing" is used in more than one place (like 1,3,CH) the actual location is only specified in the symbol, and there is only one place to change (that cuts down on the dumb errors).
The symnames symbols can be named identically to, for instance, a Cobol copybook, again reducing dumb errors.
I was making up a reason for the requirement last night, a big dataset. However, if you happen to want different sorted sets for any purpose, why not like this, rather than extract/sort twice, then concatenate? I had to win myself over to this, but I'd be happy to use this if/when needed.
All we need now is the TS to find out if this is useful for what is wanted.
I haven't included code to actually extend the record. For a fixed-length record, extend at the end of the record, and cut-back later. For variable-length records (which, becaue of the selection, means that of those selected there are varying lengths) extend at the beginning of the record and dropping that extension later, which saves having to worry about having extended all records in the selection to the same length.