IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Select based on a range from a differnet file


IBM Mainframe Forums -> SYNCSORT
Post new topic   This topic is locked: you cannot edit posts or make replies.
View previous topic :: View next topic  
Author Message
vamsimul

New User


Joined: 07 Oct 2005
Posts: 6

PostPosted: Tue Aug 13, 2019 12:36 pm
Reply with quote

Hi,

I've 2 files where 1 st file has data. 2nd file has range of data.
Based on the 2nd file range, I've to extract the data from the first file.

I/p file 1: First 6 bytes is Key

AAAAA123456
BBBBB234567
CCCCC345678
DDDDD456789

I/p File2: First 6 bytes is start of the range, 2nd 6 bytes is end of the range.
AAAAABBBBBB
DDDDDDDDDD

O/p:

AAAAA123456
BBBBB234567
DDDDD456789

Request your help on the same.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2010
Location: USA

PostPosted: Tue Aug 13, 2019 6:55 pm
Reply with quote

If none of those datasets is expected to be huge, then a straightforward solution can be used, though it is quite stupid, but simple and short.
Code:
//*========================================
//* SELECT BETWEEN                         
//*========================================
//BETWEEN  EXEC PGM=SYNCSORT               
//*                                       
//SYSOUT   DD  SYSOUT=*                   
//*                                       
//INPUT    DD  *                           
AAAAA123456                               
BBBBB234567                               
CCCCC345678                               
DDDDD456789                               
//*                                       
//SELECTOR DD  *                           
AAAAABBBBB                                 
DDDDDDDDDD                                 
//*                                       
//SORTOUT  DD  SYSOUT=*                   
//*                                       
//SYSIN    DD  *                           
 JOINKEYS F1=INPUT,                       
          FIELDS=(20,1,A)                 
 JOINKEYS F2=SELECTOR,                     
          FIELDS=(20,1,A)                 
 REFORMAT FIELDS=(F1:1,80,                 
                  F2:1,10)                 
 INCLUDE COND=(1,5,CH,GE,81,5,CH,         
           AND,1,5,CH,LE,86,5,CH)         
 SORT FIELDS=COPY     
 OUTREC BUILD=(1,80)     
 END                                       
//*     

Code:
AAAAA123456 
BBBBB234567 
DDDDD456789 


For huge data some better solution needs to be found.
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3048
Location: NYC,USA

PostPosted: Tue Aug 13, 2019 7:37 pm
Reply with quote

I assume Data Set2 (Range DS) should not be huge and you know how to get SYMNAMES Dataset using simple BUILD. The Key is 6 bytes long but in your sample data it is shorted to 5 only.
Code:
//S1    EXEC  PGM=SORT                               
//SYMNAMES DD *                                     
RANGE1,'AAAAAABBBBBB'                               
RANGE2,'DDDDDDDDDDDD'                               
//SORTLIST  DD  SYSOUT=*                             
//SYSOUT    DD  SYSOUT=*                             
//SORTOUT   DD  SYSOUT=*                             
//SORTIN   DD *                                     
AAAAAA123456                                         
BBBBBB234567                                         
CCCCCC345678                                         
DDDDDD456789                                         
//SYSIN  DD *                                       
  OPTION COPY                                       
  INCLUDE COND=(1,6,SS,EQ,RANGE1,OR,1,6,SS,EQ,RANGE2)

Output:
Code:
AAAAAA123456
BBBBBB234567
DDDDDD456789
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2010
Location: USA

PostPosted: Tue Aug 13, 2019 8:59 pm
Reply with quote

Rohit Umarjikar wrote:
I assume Data Set2 (Range DS) should not be huge and you know how to get SYMNAMES Dataset using simple BUILD.
1) Given SYMNAMES syntax sample is wrong
2) Who, and where is going to create/generate the SYMNAMES list?

Rohit Umarjikar wrote:
The Key is 6 bytes long but in your sample data it is shorted to 5 only.
Does it make any difference in approach?

Rohit Umarjikar wrote:
Code:
                                     
  INCLUDE COND=(1,6,SS,EQ,RANGE1,OR,1,6,SS,EQ,RANGE2)
What if the number of ranges is varying, let's say from 1 to 1000?
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3048
Location: NYC,USA

PostPosted: Tue Aug 13, 2019 9:05 pm
Reply with quote

Quote:
2) Who, and where is going to create/generate the SYMNAMES list?
If you would have paid attention then I said TS should use BUILD in prior to generate the SYMNAMES before using this solution. If don't know then TS can come back with it and I will help.

Quote:
Does it make any difference in approach?

Yes, I wouldn't prefer to have very few records here which makes INCLUDE simpler and it was a correction to TS to refer the right keys with the sample data provided.

Quote:
What if the number of ranges is varying, let's say from 1 to 1000?
Right now its not. Even if it is then simply divide RANGE1 to two records in SYMNAMES and modify the INCLUDE COND accordingly.
Code:
//S1    EXEC  PGM=SORT                                     
//SYMNAMES DD *                                             
RANGE1,'AAAAAA'                                             
RANGE2,'BBBBBB'                                             
RANGE3,'CCCCCC'                                             
RANGE4,'DDDDDD'                                             
//SORTLIST  DD  SYSOUT=*                                   
//SYSOUT    DD  SYSOUT=*                                   
//SORTOUT   DD  SYSOUT=*                                   
//SORTIN   DD *                                             
AAAAAA123456                                               
BBBBBB234567                                               
LLLLLL345678                                               
DDDDDD456789                                               
//SYSIN  DD *                                               
  OPTION COPY                                               
  INCLUDE COND=((1,6,CH,GE,RANGE1,AND,1,6,CH,LE,RANGE2),OR,
                (1,6,CH,GE,RANGE3,AND,1,6,CH,LE,RANGE4))   

Output:
Code:
AAAAAA123456
BBBBBB234567
DDDDDD456789
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3048
Location: NYC,USA

PostPosted: Wed Aug 14, 2019 12:22 am
Reply with quote

The Complete JCL with SYMNAMES creation.
Code:
//S1       EXEC PGM=SYNCTOOL                                   
//TOOLMSG  DD SYSOUT=*                                         
//DFSMSG   DD SYSOUT=*                                         
//IN       DD *                                                 
AAAAAABBBBBB                                                   
CCCCCCDDDDDD                                                   
//OUT      DD DISP=(,PASS),DSN=&&SYM,LRECL=80,RECFM=FB,         
//            UNIT=SYSDA,SPACE=(1,(1),RLSE)                     
//TOOLIN   DD *                                                 
  RESIZE FROM(IN) TO(OUT) TOLEN(6) USING(CTL1)                 
//CTL1CNTL DD *                                                 
  OUTFILE BUILD=(C'RANGE',SEQNUM,4,ZD,C',',C'''',1,6,C'''',62X),
                 OMIT=(1,6,CH,EQ,C' ')                         
//S2    EXEC  PGM=SORT                                         
//SYMNAMES DD DSN=&&SYM,DISP=SHR                               
//SORTLIST  DD  SYSOUT=*                                       
//SYSOUT    DD  SYSOUT=*                                       
//SORTOUT   DD  SYSOUT=*                                       
//SORTIN   DD *                                                 
AAAAAA123456                                                   
BBBBBB234567                                                     
LLLLLL345678                                                     
DDDDDD456789                                                     
//SYSIN  DD *                                                   
  OPTION COPY                                                   
  INCLUDE COND=((1,6,CH,GE,RANGE0001,AND,1,6,CH,LE,RANGE0002),OR,
                (1,6,CH,GE,RANGE0003,AND,1,6,CH,LE,RANGE0004))   

Output:
Code:
AAAAAA123456
BBBBBB234567
DDDDDD456789
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2010
Location: USA

PostPosted: Wed Aug 14, 2019 12:32 am
Reply with quote

Code:
//*============================
//* SELECT BETWEEN             
//*============================
//BETWEEN  EXEC PGM=SYNCSORT   
//*                           
//SYSOUT   DD  SYSOUT=*       
//*                           
//INPUT    DD  *               
AAAAAA123456                   
BBBBBB234567                   
CCCCCC345678                   
DDDDDD456789                   
EEEEEE123456                   
FFFFFF234567                   
GGGGGG345678                   
HHHHHH456789                   
IIIIII123456                   
JJJJJJ234567                   
KKKKKK345678                   
LLLLLL456789                   
MMMMMM123456                   
NNNNNN234567                   
OOOOOO345678                   
PPPPPP456789                   
QQQQQQ123456                   
RRRRRR234567                   
SSSSSS345678                   
TTTTTT456789                   
UUUUUU123456                   
VVVVVV234567                   
WWWWWW345678                   
XXXXXX456789                   
YYYYYY123456                   
ZZZZZZ234567                   
999999345678                   
888888456789                   
777777123456                   
666666234567                   
555555345678                   
444444456789                           
333333123456                           
222222234567                           
111111345678                           
000000456789                           
$$$$$$123456                           
######234567                           
@@@@@@345678                           
//*                                     
//SELECTOR DD  *                       
AAAAAABBBBBB                           
DDDDDDDDDDDD                           
PPPPPPQQQQQQ                           
$$$$$$$$$$$$                           
222222333333                           
XXXXXXYYYYYY                           
KKKKKKLLLLLL                           
888888999999                           
//*                                     
//SORTOUT  DD  SYSOUT=*                 
//*                                     
//SYSIN    DD  *                       
 JOINKEYS F1=INPUT,                     
          FIELDS=(20,1,A)               
 JOINKEYS F2=SELECTOR,                 
          FIELDS=(20,1,A)               
 REFORMAT FIELDS=(F1:1,80,             
                  F2:1,12)             
 INCLUDE COND=(1,6,CH,GE,81,6,CH,       
           AND,1,6,CH,LE,87,6,CH)       
 SORT FIELDS=COPY                       
 OUTREC BUILD=(1,80)                   
 END                                   
//*     

Code:
AAAAAA123456
BBBBBB234567
DDDDDD456789
KKKKKK345678
LLLLLL456789
PPPPPP456789
QQQQQQ123456
XXXXXX456789
YYYYYY123456
999999345678
888888456789
333333123456
222222234567
$$$$$$123456
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2010
Location: USA

PostPosted: Wed Aug 14, 2019 12:37 am
Reply with quote

Rohit Umarjikar wrote:
Quote:
Does it make any difference in approach?

Yes, I wouldn't prefer to have very few records here which makes INCLUDE simpler and it was a correction to TS to refer the right keys with the sample data provided.

If the TS is not able to adjust the general approach to his own needs and specification, that means he's at the level of copy-and-paste-ready-to-use-solution.
Discussion at this level must continue at the Beginners Forum.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2010
Location: USA

PostPosted: Wed Aug 14, 2019 12:41 am
Reply with quote

Rohit Umarjikar wrote:
Quote:
2) Who, and where is going to create/generate the SYMNAMES list?
If you would have paid attention then I said TS should use BUILD in prior to generate the SYMNAMES before using this solution. If don't know then TS can come back with it and I will help.

TS has clearly explained that an existing file with required ranges must be used. Who is going to convert it to SYMNAMES for no reason???

What if the keys are long (100 bytes?), and/or non-character, etc.???

What if multiple ranges (1000?) need to be used?
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2010
Location: USA

PostPosted: Wed Aug 14, 2019 1:44 am
Reply with quote

The specific of the original topic (which is critical compared to other sort tasks): it requires some implementation of many-to-many record comparison, one by one, with varying number of records in each part, and (desirable): easy to change key type, and key size, and key position within each dataset.

The solution involving SYMNAMES doesn't make any difference from simple sequential comparison of ONE dataset to the LIMITED and FIXED number of keys, either those keys are coded as part of SORT control statements, or "hidden" inside the SYMNAMES list.

Using SYMNAMES doesn't make any difference or improvement except shortening a bit the SORT control statements themselves; it doesn't help to achieve the original specific task of many-to-many comparison.
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 3048
Location: NYC,USA

PostPosted: Wed Aug 14, 2019 1:44 am
Reply with quote

You are making your own assumption and I can not code for the whole world.

What I have suggested will work with that TS wants at this point of time and not what you assume. If TS wants something else then let it come from TS not from you. I don't prefer to have cartesian product created without any reason here and it will have more EXCP when you hit millions of records.

No need to make own assumption and have a no meaning conversation further. If you continue to replay back and forth (off topic talks since already solutions are posted) someone will lock this post soon and TS have no chance to communicate further and that unfair.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2010
Location: USA

PostPosted: Wed Aug 14, 2019 1:58 am
Reply with quote

Rohit Umarjikar wrote:
I don't prefer to have cartesian product created without any reason here and it will have more EXCP when you hit millions of records.


In order to get the desired result one mandatory needs to perform pairs of comparisons as many as
(size of input) * (number of ranges)
either they are coded inside the original datasets, or "converted" via SYMNAMES into sequential fixed logical comparisons listed as SORT control statements.

Whatever solution is used, exactly the same "millions of records" must be handled!
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2010
Location: USA

PostPosted: Wed Aug 14, 2019 7:56 pm
Reply with quote

The straightforward one-step solution given in previous example works fine for reasonable size of input dataset (let's say, up to several hundreds of thousand records, 100,000-300,000). The total number of ranges is also not expected to be very huge, but up to 1,000 ranges can be considered as acceptable amount.

That method does really produce the cartesian product in between, but it does not mean that we must avoid this thing as if it was a devil's creation. This way works fine assuming we are within a reasonable amount of data to handle; so, why not to use the simple method when it really works?

Definitely, there are other cases in this world when huge amount of records must be handled. Let's say, starting from 100M and more input records, and/or starting from 100K of valid ranges. In that case straightforward method with cartesian product highly likely would fail. Another method is needed, to process huge input dataset only once. This can be done; the only requirement: the input must be presorted by the key field.

The range file also needs to be sorted, but it can be done as part of processing, since one extra manipulation with range file is needed by this method. Two JCL steps needed, and most likely it cannot be avoided AFAIK.

Code:
//**********************************************************************
//* SELECT BETWEEN - FROM A HUGE FILE                                   
//**********************************************************************
//*                                                                     
//*=====================================================================
//* SORT AND SPLIT RANGE LIST INTO TWO SEPARATE LISTS: FROM/TO VALUES   
//*=====================================================================
//*                                                                     
//RANGES   EXEC PGM=SYNCSORT                                           
//*                                                                     
//SYSOUT   DD  SYSOUT=*                                                 
//*                                                                     
//SORTIN   DD  *                                                       
999999777777                                                           
222222444444                                                           
DDDDDDDDDDDD                                                           
AAAAAABBBBBB                                                           
PPPPPPQQQQQQ                                                           
$$$$$$$$$$$$                                                           
XXXXXXYYYYYY                                                           
KKKKKKLLLLLL                                                           
//*                                                                     
//FROMLIST DD  DISP=(NEW,PASS),                                         
//             UNIT=SYSDA,SPACE=(TRK,(50,50),RLSE),                     
//             DSN=&&FROMLIST                                           
//TOLIST   DD  DISP=(NEW,PASS),                                         
//             UNIT=SYSDA,SPACE=(TRK,(50,50),RLSE),                     
//             DSN=&&TOLIST                                             
//*                                                                     
//SYSIN    DD  *                                                       
*                                                                       
 INCLUDE COND=(1,6,CH,LE,7,12,CH)  eliminate bad ranges                 
*                                                                       
 SORT FIELDS=(1,6,CH,A)            sort left range ascending           
*                                                                       
 OUTFIL FNAMES=(FROMLIST),         create left range list               
        BUILD=(1,6,74X'00')        provide merging before any real data
*                                                                       
 OUTFIL FNAMES=(TOLIST),           create right range list             
        BUILD=(7,6,74X'FF')        provide merging after any real data 
*                                                                       
 END                                                                   
//*                                                                     
//*=====================================================================
//* MERGE HUGE INPUT, AND TWO RANGE FILES; ALL MUST BE SORTED           
//*=====================================================================
//BETWEEN  EXEC PGM=SYNCSORT                                           
//*                                                                     
//SYSOUT   DD  SYSOUT=*                                                 
//*                                                                     
//SORTIN01 DD  DISP=(OLD,DELETE),DSN=&&FROMLIST    left ranges         
//SORTIN02 DD  DISP=(OLD,DELETE),DSN=&&TOLIST      right ranges         
//SORTIN03 DD  *                                   input data (huge)   
$$$$$$123456                                                           
######234567                                                           
@@@@@@345678                                                           
AAAAAA123456                                                           
BBBBBB234567                                                           
CCCCCC345678                                                           
DDDDDD456789                                                           
EEEEEE123456                                                           
FFFFFF234567                                                           
GGGGGG345678                                                           
HHHHHH456789                                                           
IIIIII123456                                                           
JJJJJJ234567                                                           
KKKKKK345678                                                           
LLLLLL456789                                                           
MMMMMM123456                                                           
NNNNNN234567                                                           
OOOOOO345678                                                           
PPPPPP456789                                                           
QQQQQQ123456                                                           
RRRRRR234567                                                           
SSSSSS345678                                                           
TTTTTT456789                                                           
UUUUUU123456                                                           
VVVVVV234567                                                           
WWWWWW345678                                                           
XXXXXX456789                                                           
YYYYYY123456                                                           
ZZZZZZ234567                                                           
000000456789                                                           
111111345678                                                           
222222234567                                                           
333333123456                                                           
444444456789                                                           
555555345678                                                           
666666234567                                                           
777777123456                                                           
888888456789                                                           
999999345678                                                           
//*                                                                   
//SELECTED DD  SYSOUT=*                                               
//*                                                                   
//SYSIN    DD  *                                                       
*                                                                     
 MERGE FIELDS=(1,6,CH,A,  merge using record keys                     
               7,1,CH,A)  place range records before/after data groups
*                                                                     
 OUTREC IFTHEN=(WHEN=GROUP,            detect groups of allowed ranges
                BEGIN=(7,1,CH,EQ,X'00'),  marker of left range record 
                END=(7,1,CH,EQ,X'FF'),    marker of right range record
                PUSH=(81:7,1))            mark group records as valid 
*                                                                     
 OUTFIL FNAMES=(SELECTED),                                             
        INCLUDE=(7,1,CH,NE,X'00',         exclude left markers         
             AND,7,1,CH,NE,X'FF',         exclude right markers       
            AND,81,1,CH,EQ,X'00')         include only valid records   
*                                                                     
 END                                                                   
//*

Code:
$$$$$$123456 
AAAAAA123456 
BBBBBB234567 
DDDDDD456789 
KKKKKK345678 
LLLLLL456789 
PPPPPP456789 
QQQQQQ123456 
XXXXXX456789 
YYYYYY123456 
222222234567 
333333123456 
444444456789
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2010
Location: USA

PostPosted: Wed Aug 14, 2019 8:10 pm
Reply with quote

Lewis Carroll wrote:
"Two added to one--if that could but be done,"
It said, "with one's fingers and thumbs!"
Recollecting with tears how, in earlier years,
It had taken no pains with its sums.

"The thing can be done," said the Butcher, "I think.
The thing must be done, I am sure.
The thing shall be done! Bring me paper and ink,
The best there is time to procure."
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   This topic is locked: you cannot edit posts or make replies. View Bookmarks
All times are GMT + 6 Hours
Forum Index -> SYNCSORT

 


Similar Topics
Topic Forum Replies
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
No new posts To get the count of rows for every 1 ... DB2 3
No new posts SFTP Issue - destination file record ... All Other Mainframe Topics 2
No new posts Access to non cataloged VSAM file JCL & VSAM 18
No new posts Need help for File Aid JCL to extract... Compuware & Other Tools 23
Search our Forums:

Back to Top