I assume Data Set2 (Range DS) should not be huge and you know how to get SYMNAMES Dataset using simple BUILD. The Key is 6 bytes long but in your sample data it is shorted to 5 only.
2) Who, and where is going to create/generate the SYMNAMES list?
If you would have paid attention then I said TS should use BUILD in prior to generate the SYMNAMES before using this solution. If don't know then TS can come back with it and I will help.
Quote:
Does it make any difference in approach?
Yes, I wouldn't prefer to have very few records here which makes INCLUDE simpler and it was a correction to TS to refer the right keys with the sample data provided.
Quote:
What if the number of ranges is varying, let's say from 1 to 1000?
Right now its not. Even if it is then simply divide RANGE1 to two records in SYMNAMES and modify the INCLUDE COND accordingly.
Yes, I wouldn't prefer to have very few records here which makes INCLUDE simpler and it was a correction to TS to refer the right keys with the sample data provided.
If the TS is not able to adjust the general approach to his own needs and specification, that means he's at the level of copy-and-paste-ready-to-use-solution.
Discussion at this level must continue at the Beginners Forum.
2) Who, and where is going to create/generate the SYMNAMES list?
If you would have paid attention then I said TS should use BUILD in prior to generate the SYMNAMES before using this solution. If don't know then TS can come back with it and I will help.
TS has clearly explained that an existing file with required ranges must be used. Who is going to convert it to SYMNAMES for no reason???
What if the keys are long (100 bytes?), and/or non-character, etc.???
The specific of the original topic (which is critical compared to other sort tasks): it requires some implementation of many-to-many record comparison, one by one, with varying number of records in each part, and (desirable): easy to change key type, and key size, and key position within each dataset.
The solution involving SYMNAMES doesn't make any difference from simple sequential comparison of ONE dataset to the LIMITED and FIXED number of keys, either those keys are coded as part of SORT control statements, or "hidden" inside the SYMNAMES list.
Using SYMNAMES doesn't make any difference or improvement except shortening a bit the SORT control statements themselves; it doesn't help to achieve the original specific task of many-to-many comparison.
You are making your own assumption and I can not code for the whole world.
What I have suggested will work with that TS wants at this point of time and not what you assume. If TS wants something else then let it come from TS not from you. I don't prefer to have cartesian product created without any reason here and it will have more EXCP when you hit millions of records.
No need to make own assumption and have a no meaning conversation further. If you continue to replay back and forth (off topic talks since already solutions are posted) someone will lock this post soon and TS have no chance to communicate further and that unfair.
I don't prefer to have cartesian product created without any reason here and it will have more EXCP when you hit millions of records.
In order to get the desired result one mandatory needs to perform pairs of comparisons as many as
(size of input) * (number of ranges)
either they are coded inside the original datasets, or "converted" via SYMNAMES into sequential fixed logical comparisons listed as SORT control statements.
Whatever solution is used, exactly the same "millions of records" must be handled!
The straightforward one-step solution given in previous example works fine for reasonable size of input dataset (let's say, up to several hundreds of thousand records, 100,000-300,000). The total number of ranges is also not expected to be very huge, but up to 1,000 ranges can be considered as acceptable amount.
That method does really produce the cartesian product in between, but it does not mean that we must avoid this thing as if it was a devil's creation. This way works fine assuming we are within a reasonable amount of data to handle; so, why not to use the simple method when it really works?
Definitely, there are other cases in this world when huge amount of records must be handled. Let's say, starting from 100M and more input records, and/or starting from 100K of valid ranges. In that case straightforward method with cartesian product highly likely would fail. Another method is needed, to process huge input dataset only once. This can be done; the only requirement: the input must be presorted by the key field.
The range file also needs to be sorted, but it can be done as part of processing, since one extra manipulation with range file is needed by this method. Two JCL steps needed, and most likely it cannot be avoided AFAIK.
Code:
//**********************************************************************
//* SELECT BETWEEN - FROM A HUGE FILE
//**********************************************************************
//*
//*=====================================================================
//* SORT AND SPLIT RANGE LIST INTO TWO SEPARATE LISTS: FROM/TO VALUES
//*=====================================================================
//*
//RANGES EXEC PGM=SYNCSORT
//*
//SYSOUT DD SYSOUT=*
//*
//SORTIN DD *
999999777777
222222444444
DDDDDDDDDDDD
AAAAAABBBBBB
PPPPPPQQQQQQ
$$$$$$$$$$$$
XXXXXXYYYYYY
KKKKKKLLLLLL
//*
//FROMLIST DD DISP=(NEW,PASS),
// UNIT=SYSDA,SPACE=(TRK,(50,50),RLSE),
// DSN=&&FROMLIST
//TOLIST DD DISP=(NEW,PASS),
// UNIT=SYSDA,SPACE=(TRK,(50,50),RLSE),
// DSN=&&TOLIST
//*
//SYSIN DD *
*
INCLUDE COND=(1,6,CH,LE,7,12,CH) eliminate bad ranges
*
SORT FIELDS=(1,6,CH,A) sort left range ascending
*
OUTFIL FNAMES=(FROMLIST), create left range list
BUILD=(1,6,74X'00') provide merging before any real data
*
OUTFIL FNAMES=(TOLIST), create right range list
BUILD=(7,6,74X'FF') provide merging after any real data
*
END
//*
//*=====================================================================
//* MERGE HUGE INPUT, AND TWO RANGE FILES; ALL MUST BE SORTED
//*=====================================================================
//BETWEEN EXEC PGM=SYNCSORT
//*
//SYSOUT DD SYSOUT=*
//*
//SORTIN01 DD DISP=(OLD,DELETE),DSN=&&FROMLIST left ranges
//SORTIN02 DD DISP=(OLD,DELETE),DSN=&&TOLIST right ranges
//SORTIN03 DD * input data (huge)
$$$$$$123456
######234567
@@@@@@345678
AAAAAA123456
BBBBBB234567
CCCCCC345678
DDDDDD456789
EEEEEE123456
FFFFFF234567
GGGGGG345678
HHHHHH456789
IIIIII123456
JJJJJJ234567
KKKKKK345678
LLLLLL456789
MMMMMM123456
NNNNNN234567
OOOOOO345678
PPPPPP456789
QQQQQQ123456
RRRRRR234567
SSSSSS345678
TTTTTT456789
UUUUUU123456
VVVVVV234567
WWWWWW345678
XXXXXX456789
YYYYYY123456
ZZZZZZ234567
000000456789
111111345678
222222234567
333333123456
444444456789
555555345678
666666234567
777777123456
888888456789
999999345678
//*
//SELECTED DD SYSOUT=*
//*
//SYSIN DD *
*
MERGE FIELDS=(1,6,CH,A, merge using record keys
7,1,CH,A) place range records before/after data groups
*
OUTREC IFTHEN=(WHEN=GROUP, detect groups of allowed ranges
BEGIN=(7,1,CH,EQ,X'00'), marker of left range record
END=(7,1,CH,EQ,X'FF'), marker of right range record
PUSH=(81:7,1)) mark group records as valid
*
OUTFIL FNAMES=(SELECTED),
INCLUDE=(7,1,CH,NE,X'00', exclude left markers
AND,7,1,CH,NE,X'FF', exclude right markers
AND,81,1,CH,EQ,X'00') include only valid records
*
END
//*
"Two added to one--if that could but be done,"
It said, "with one's fingers and thumbs!"
Recollecting with tears how, in earlier years,
It had taken no pains with its sums.
"The thing can be done," said the Butcher, "I think.
The thing must be done, I am sure.
The thing shall be done! Bring me paper and ink,
The best there is time to procure."