|
View previous topic :: View next topic
|
| Author |
Message |
vamsimul
New User
Joined: 07 Oct 2005 Posts: 6
|
|
|
|
Hi,
I've 2 files where 1 st file has data. 2nd file has range of data.
Based on the 2nd file range, I've to extract the data from the first file.
I/p file 1: First 6 bytes is Key
AAAAA123456
BBBBB234567
CCCCC345678
DDDDD456789
I/p File2: First 6 bytes is start of the range, 2nd 6 bytes is end of the range.
AAAAABBBBBB
DDDDDDDDDD
O/p:
AAAAA123456
BBBBB234567
DDDDD456789
Request your help on the same. |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2263 Location: USA
|
|
|
|
If none of those datasets is expected to be huge, then a straightforward solution can be used, though it is quite stupid, but simple and short.
| Code: |
//*========================================
//* SELECT BETWEEN
//*========================================
//BETWEEN EXEC PGM=SYNCSORT
//*
//SYSOUT DD SYSOUT=*
//*
//INPUT DD *
AAAAA123456
BBBBB234567
CCCCC345678
DDDDD456789
//*
//SELECTOR DD *
AAAAABBBBB
DDDDDDDDDD
//*
//SORTOUT DD SYSOUT=*
//*
//SYSIN DD *
JOINKEYS F1=INPUT,
FIELDS=(20,1,A)
JOINKEYS F2=SELECTOR,
FIELDS=(20,1,A)
REFORMAT FIELDS=(F1:1,80,
F2:1,10)
INCLUDE COND=(1,5,CH,GE,81,5,CH,
AND,1,5,CH,LE,86,5,CH)
SORT FIELDS=COPY
OUTREC BUILD=(1,80)
END
//* |
| Code: |
AAAAA123456
BBBBB234567
DDDDD456789 |
For huge data some better solution needs to be found. |
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
I assume Data Set2 (Range DS) should not be huge and you know how to get SYMNAMES Dataset using simple BUILD. The Key is 6 bytes long but in your sample data it is shorted to 5 only.
| Code: |
//S1 EXEC PGM=SORT
//SYMNAMES DD *
RANGE1,'AAAAAABBBBBB'
RANGE2,'DDDDDDDDDDDD'
//SORTLIST DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SORTOUT DD SYSOUT=*
//SORTIN DD *
AAAAAA123456
BBBBBB234567
CCCCCC345678
DDDDDD456789
//SYSIN DD *
OPTION COPY
INCLUDE COND=(1,6,SS,EQ,RANGE1,OR,1,6,SS,EQ,RANGE2) |
Output:
| Code: |
AAAAAA123456
BBBBBB234567
DDDDDD456789 |
|
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2263 Location: USA
|
|
|
|
| Rohit Umarjikar wrote: |
| I assume Data Set2 (Range DS) should not be huge and you know how to get SYMNAMES Dataset using simple BUILD. |
1) Given SYMNAMES syntax sample is wrong
2) Who, and where is going to create/generate the SYMNAMES list?
| Rohit Umarjikar wrote: |
The Key is 6 bytes long but in your sample data it is shorted to 5 only.
|
Does it make any difference in approach?
| Rohit Umarjikar wrote: |
| Code: |
INCLUDE COND=(1,6,SS,EQ,RANGE1,OR,1,6,SS,EQ,RANGE2) |
|
What if the number of ranges is varying, let's say from 1 to 1000? |
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
| Quote: |
| 2) Who, and where is going to create/generate the SYMNAMES list? |
If you would have paid attention then I said TS should use BUILD in prior to generate the SYMNAMES before using this solution. If don't know then TS can come back with it and I will help.
| Quote: |
| Does it make any difference in approach? |
Yes, I wouldn't prefer to have very few records here which makes INCLUDE simpler and it was a correction to TS to refer the right keys with the sample data provided.
| Quote: |
| What if the number of ranges is varying, let's say from 1 to 1000? |
Right now its not. Even if it is then simply divide RANGE1 to two records in SYMNAMES and modify the INCLUDE COND accordingly.
| Code: |
//S1 EXEC PGM=SORT
//SYMNAMES DD *
RANGE1,'AAAAAA'
RANGE2,'BBBBBB'
RANGE3,'CCCCCC'
RANGE4,'DDDDDD'
//SORTLIST DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SORTOUT DD SYSOUT=*
//SORTIN DD *
AAAAAA123456
BBBBBB234567
LLLLLL345678
DDDDDD456789
//SYSIN DD *
OPTION COPY
INCLUDE COND=((1,6,CH,GE,RANGE1,AND,1,6,CH,LE,RANGE2),OR,
(1,6,CH,GE,RANGE3,AND,1,6,CH,LE,RANGE4)) |
Output:
| Code: |
AAAAAA123456
BBBBBB234567
DDDDDD456789 |
|
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
The Complete JCL with SYMNAMES creation.
| Code: |
//S1 EXEC PGM=SYNCTOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//IN DD *
AAAAAABBBBBB
CCCCCCDDDDDD
//OUT DD DISP=(,PASS),DSN=&&SYM,LRECL=80,RECFM=FB,
// UNIT=SYSDA,SPACE=(1,(1),RLSE)
//TOOLIN DD *
RESIZE FROM(IN) TO(OUT) TOLEN(6) USING(CTL1)
//CTL1CNTL DD *
OUTFILE BUILD=(C'RANGE',SEQNUM,4,ZD,C',',C'''',1,6,C'''',62X),
OMIT=(1,6,CH,EQ,C' ')
//S2 EXEC PGM=SORT
//SYMNAMES DD DSN=&&SYM,DISP=SHR
//SORTLIST DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SORTOUT DD SYSOUT=*
//SORTIN DD *
AAAAAA123456
BBBBBB234567
LLLLLL345678
DDDDDD456789
//SYSIN DD *
OPTION COPY
INCLUDE COND=((1,6,CH,GE,RANGE0001,AND,1,6,CH,LE,RANGE0002),OR,
(1,6,CH,GE,RANGE0003,AND,1,6,CH,LE,RANGE0004))
|
Output:
| Code: |
AAAAAA123456
BBBBBB234567
DDDDDD456789 |
|
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2263 Location: USA
|
|
|
|
| Code: |
//*============================
//* SELECT BETWEEN
//*============================
//BETWEEN EXEC PGM=SYNCSORT
//*
//SYSOUT DD SYSOUT=*
//*
//INPUT DD *
AAAAAA123456
BBBBBB234567
CCCCCC345678
DDDDDD456789
EEEEEE123456
FFFFFF234567
GGGGGG345678
HHHHHH456789
IIIIII123456
JJJJJJ234567
KKKKKK345678
LLLLLL456789
MMMMMM123456
NNNNNN234567
OOOOOO345678
PPPPPP456789
QQQQQQ123456
RRRRRR234567
SSSSSS345678
TTTTTT456789
UUUUUU123456
VVVVVV234567
WWWWWW345678
XXXXXX456789
YYYYYY123456
ZZZZZZ234567
999999345678
888888456789
777777123456
666666234567
555555345678
444444456789
333333123456
222222234567
111111345678
000000456789
$$$$$$123456
######234567
@@@@@@345678
//*
//SELECTOR DD *
AAAAAABBBBBB
DDDDDDDDDDDD
PPPPPPQQQQQQ
$$$$$$$$$$$$
222222333333
XXXXXXYYYYYY
KKKKKKLLLLLL
888888999999
//*
//SORTOUT DD SYSOUT=*
//*
//SYSIN DD *
JOINKEYS F1=INPUT,
FIELDS=(20,1,A)
JOINKEYS F2=SELECTOR,
FIELDS=(20,1,A)
REFORMAT FIELDS=(F1:1,80,
F2:1,12)
INCLUDE COND=(1,6,CH,GE,81,6,CH,
AND,1,6,CH,LE,87,6,CH)
SORT FIELDS=COPY
OUTREC BUILD=(1,80)
END
//* |
| Code: |
AAAAAA123456
BBBBBB234567
DDDDDD456789
KKKKKK345678
LLLLLL456789
PPPPPP456789
QQQQQQ123456
XXXXXX456789
YYYYYY123456
999999345678
888888456789
333333123456
222222234567
$$$$$$123456 |
|
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2263 Location: USA
|
|
|
|
| Rohit Umarjikar wrote: |
| Quote: |
| Does it make any difference in approach? |
Yes, I wouldn't prefer to have very few records here which makes INCLUDE simpler and it was a correction to TS to refer the right keys with the sample data provided. |
If the TS is not able to adjust the general approach to his own needs and specification, that means he's at the level of copy-and-paste-ready-to-use-solution.
Discussion at this level must continue at the Beginners Forum. |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2263 Location: USA
|
|
|
|
| Rohit Umarjikar wrote: |
| Quote: |
| 2) Who, and where is going to create/generate the SYMNAMES list? |
If you would have paid attention then I said TS should use BUILD in prior to generate the SYMNAMES before using this solution. If don't know then TS can come back with it and I will help. |
TS has clearly explained that an existing file with required ranges must be used. Who is going to convert it to SYMNAMES for no reason???
What if the keys are long (100 bytes?), and/or non-character, etc.???
What if multiple ranges (1000?) need to be used? |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2263 Location: USA
|
|
|
|
The specific of the original topic (which is critical compared to other sort tasks): it requires some implementation of many-to-many record comparison, one by one, with varying number of records in each part, and (desirable): easy to change key type, and key size, and key position within each dataset.
The solution involving SYMNAMES doesn't make any difference from simple sequential comparison of ONE dataset to the LIMITED and FIXED number of keys, either those keys are coded as part of SORT control statements, or "hidden" inside the SYMNAMES list.
Using SYMNAMES doesn't make any difference or improvement except shortening a bit the SORT control statements themselves; it doesn't help to achieve the original specific task of many-to-many comparison. |
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
You are making your own assumption and I can not code for the whole world.
What I have suggested will work with that TS wants at this point of time and not what you assume. If TS wants something else then let it come from TS not from you. I don't prefer to have cartesian product created without any reason here and it will have more EXCP when you hit millions of records.
No need to make own assumption and have a no meaning conversation further. If you continue to replay back and forth (off topic talks since already solutions are posted) someone will lock this post soon and TS have no chance to communicate further and that unfair. |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2263 Location: USA
|
|
|
|
| Rohit Umarjikar wrote: |
| I don't prefer to have cartesian product created without any reason here and it will have more EXCP when you hit millions of records. |
In order to get the desired result one mandatory needs to perform pairs of comparisons as many as
(size of input) * (number of ranges)
either they are coded inside the original datasets, or "converted" via SYMNAMES into sequential fixed logical comparisons listed as SORT control statements.
Whatever solution is used, exactly the same "millions of records" must be handled! |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2263 Location: USA
|
|
|
|
The straightforward one-step solution given in previous example works fine for reasonable size of input dataset (let's say, up to several hundreds of thousand records, 100,000-300,000). The total number of ranges is also not expected to be very huge, but up to 1,000 ranges can be considered as acceptable amount.
That method does really produce the cartesian product in between, but it does not mean that we must avoid this thing as if it was a devil's creation. This way works fine assuming we are within a reasonable amount of data to handle; so, why not to use the simple method when it really works?
Definitely, there are other cases in this world when huge amount of records must be handled. Let's say, starting from 100M and more input records, and/or starting from 100K of valid ranges. In that case straightforward method with cartesian product highly likely would fail. Another method is needed, to process huge input dataset only once. This can be done; the only requirement: the input must be presorted by the key field.
The range file also needs to be sorted, but it can be done as part of processing, since one extra manipulation with range file is needed by this method. Two JCL steps needed, and most likely it cannot be avoided AFAIK.
| Code: |
//**********************************************************************
//* SELECT BETWEEN - FROM A HUGE FILE
//**********************************************************************
//*
//*=====================================================================
//* SORT AND SPLIT RANGE LIST INTO TWO SEPARATE LISTS: FROM/TO VALUES
//*=====================================================================
//*
//RANGES EXEC PGM=SYNCSORT
//*
//SYSOUT DD SYSOUT=*
//*
//SORTIN DD *
999999777777
222222444444
DDDDDDDDDDDD
AAAAAABBBBBB
PPPPPPQQQQQQ
$$$$$$$$$$$$
XXXXXXYYYYYY
KKKKKKLLLLLL
//*
//FROMLIST DD DISP=(NEW,PASS),
// UNIT=SYSDA,SPACE=(TRK,(50,50),RLSE),
// DSN=&&FROMLIST
//TOLIST DD DISP=(NEW,PASS),
// UNIT=SYSDA,SPACE=(TRK,(50,50),RLSE),
// DSN=&&TOLIST
//*
//SYSIN DD *
*
INCLUDE COND=(1,6,CH,LE,7,12,CH) eliminate bad ranges
*
SORT FIELDS=(1,6,CH,A) sort left range ascending
*
OUTFIL FNAMES=(FROMLIST), create left range list
BUILD=(1,6,74X'00') provide merging before any real data
*
OUTFIL FNAMES=(TOLIST), create right range list
BUILD=(7,6,74X'FF') provide merging after any real data
*
END
//*
//*=====================================================================
//* MERGE HUGE INPUT, AND TWO RANGE FILES; ALL MUST BE SORTED
//*=====================================================================
//BETWEEN EXEC PGM=SYNCSORT
//*
//SYSOUT DD SYSOUT=*
//*
//SORTIN01 DD DISP=(OLD,DELETE),DSN=&&FROMLIST left ranges
//SORTIN02 DD DISP=(OLD,DELETE),DSN=&&TOLIST right ranges
//SORTIN03 DD * input data (huge)
$$$$$$123456
######234567
@@@@@@345678
AAAAAA123456
BBBBBB234567
CCCCCC345678
DDDDDD456789
EEEEEE123456
FFFFFF234567
GGGGGG345678
HHHHHH456789
IIIIII123456
JJJJJJ234567
KKKKKK345678
LLLLLL456789
MMMMMM123456
NNNNNN234567
OOOOOO345678
PPPPPP456789
QQQQQQ123456
RRRRRR234567
SSSSSS345678
TTTTTT456789
UUUUUU123456
VVVVVV234567
WWWWWW345678
XXXXXX456789
YYYYYY123456
ZZZZZZ234567
000000456789
111111345678
222222234567
333333123456
444444456789
555555345678
666666234567
777777123456
888888456789
999999345678
//*
//SELECTED DD SYSOUT=*
//*
//SYSIN DD *
*
MERGE FIELDS=(1,6,CH,A, merge using record keys
7,1,CH,A) place range records before/after data groups
*
OUTREC IFTHEN=(WHEN=GROUP, detect groups of allowed ranges
BEGIN=(7,1,CH,EQ,X'00'), marker of left range record
END=(7,1,CH,EQ,X'FF'), marker of right range record
PUSH=(81:7,1)) mark group records as valid
*
OUTFIL FNAMES=(SELECTED),
INCLUDE=(7,1,CH,NE,X'00', exclude left markers
AND,7,1,CH,NE,X'FF', exclude right markers
AND,81,1,CH,EQ,X'00') include only valid records
*
END
//* |
| Code: |
$$$$$$123456
AAAAAA123456
BBBBBB234567
DDDDDD456789
KKKKKK345678
LLLLLL456789
PPPPPP456789
QQQQQQ123456
XXXXXX456789
YYYYYY123456
222222234567
333333123456
444444456789 |
|
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2263 Location: USA
|
|
|
|
| Lewis Carroll wrote: |
"Two added to one--if that could but be done,"
It said, "with one's fingers and thumbs!"
Recollecting with tears how, in earlier years,
It had taken no pains with its sums.
"The thing can be done," said the Butcher, "I think.
The thing must be done, I am sure.
The thing shall be done! Bring me paper and ink,
The best there is time to procure."
|
|
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|