IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Join files where value in one is between two values in other


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
sumannath

New User


Joined: 20 Mar 2017
Posts: 7
Location: India

PostPosted: Thu Aug 25, 2022 3:37 pm
Reply with quote

I have two files with the following data:

File 1:
Code:

----+----1----+----2
100101 110010       
150001 154999       
530000 550000       
130101 134999       
410010 419999       

and
File 2:
Code:

----+----1
146598   
356452   
225698   
531254   
100998   
413265   
654214   
418754   
151247   
544155   
236547   


I have to get records from file 2 which fall between any of the limits (values) of file 1.

I have written the following sort jcl and it seems to work:
Code:

//STEP10   EXEC PGM=SORT         
//SYSPRINT DD SYSOUT=*           
//SYSOUT   DD SYSOUT=*           
//SORTJNF1 DD *                 
100101 110010                   
150001 154999                   
530000 550000                   
130101 134999                   
410010 419999                   
/*                               
//SORTJNF2 DD *                 
146598                           
356452                           
225698                           
531254                           
100998                           
413265                           
654214                           
418754                           
151247                           
544155                           
236547                           
/*                               
//JNF1CNTL DD *                 
 INREC BUILD=(1,15,C'X')                                   
/*                                                         
//JNF2CNTL DD *                                           
 INREC BUILD=(1,7,C'X')                                   
/*                                                         
//SORTOUT  DD SYSOUT=*                                     
//SYSIN    DD *                                           
  JOINKEYS FILES=F1,FIELDS=(16,1,A)                       
  JOINKEYS FILES=F2,FIELDS=(8,1,A)                         
  REFORMAT FIELDS(F1:1,14,F2:1,6)                         
  SORT FIELDS=COPY                                         
  OUTFIL INCLUDE=(15,6,ZD,GT,1,6,ZD,AND,15,6,ZD,LT,8,6,ZD)
//                                                         


The sort first finds the cartesian product of the two files and then prints the records where the value is between the limits:
o/p:
Code:

100101 110010 100998
150001 154999 151247
530000 550000 531254
530000 550000 544155
410010 419999 413265
410010 419999 418754


My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach? If not, what can be a better approach to solve this.
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8651
Location: Dubuque, Iowa, USA

PostPosted: Thu Aug 25, 2022 6:31 pm
Reply with quote

Quote:
My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach? If not, what can be a better approach to solve this.
As long as you have unlimited time and memory, using the cartesian product will resolve the issue. But 100 million records is a lot to go through and can take a lot of time (and memory). And what happens if you have 500 records in file 1 and 2 million in file 2 -- now you're having to go through 1 billion records. Writing a program will take resources but you'll probably save enough by not using cartesian products that the programming time will be minor comparatively.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 1614

PostPosted: Thu Aug 25, 2022 6:45 pm
Reply with quote

sumannath wrote:

My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach?

Cartesian product is almost never a good solution.

sumannath wrote:
If not, what can be a better approach to solve this.

If F1 size is really not greater than 100 records, I would add an extra step before the main sort, to generate dynamically the required INCLUDE statement for further SORT step.

Any tool can be used for that purpose, even the SORT itself.
Code:
//*============================
//GENCOND EXEC PGM=SORT
// . . . . . . . .
//SORTIN DD *                 
100101 110010                   
150001 154999                   
530000 550000                   
130101 134999                   
410010 419999                   
//*
//SORTOUT DD DISP=(NEW,PASS),SPACE=(TRK,(10,10)),
//           DSN=&&INCLUDE
//SYSIN   DD  *
 SORT FIELDS=COPY
 OUTFIL REMOVECC,
        HEADER1=(C' INCLUDE COND=(1,6,CH,EQ,X''00'', DUMMY COND'),
        BUILD=(10:C'OR,1,6,ZD,GT,',1,6,
               C',AND,1,6,ZD,LT,',8,6,C',',
            80:X),           assure LRECL=80
        TRAILER1=(10:C'OR,8,6,CH,EQ,X''00'') DUMMY COND',
                /,C' SORT FIELDS=COPY')
//*
//*============================
//STEP10 EXEC PGM=SORT,COND=(0,NE,GENCOND)
// . . . . . . . . . . .
//SORTIN   DD *                 
146598                           
356452                           
225698                           
531254                           
100998                           
413265                           
654214                           
418754                           
151247                           
544155                           
236547                           
//*
//SORTOUT  DD SYSOUT=*       
//SYSIN   DD  DISP=(OLD,DELETE),DSN=&&INCLUDE
//*
//*============================
//
Back to top
View user's profile Send private message
sumannath

New User


Joined: 20 Mar 2017
Posts: 7
Location: India

PostPosted: Fri Aug 26, 2022 10:44 am
Reply with quote

Your solution is working perfectly and is generating the required sort card with the includes dynamically built.

sergeyken wrote:

If F1 size is really not greater than 100 records, I would add an extra step before the main sort, to generate dynamically the required INCLUDE statement for further SORT step.


Wanted to understand why do you state the really not greater than 100 records? Will it take a significant time if the records in file 1 are about 1000 ?
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 1614

PostPosted: Fri Aug 26, 2022 3:58 pm
Reply with quote

sumannath wrote:
Your solution is working perfectly and is generating the required sort card with the includes dynamically built.

sergeyken wrote:

If F1 size is really not greater than 100 records, I would add an extra step before the main sort, to generate dynamically the required INCLUDE statement for further SORT step.


Wanted to understand why do you state the really not greater than 100 records? Will it take a significant time if the records in file 1 are about 1000 ?


This is what you wrote:
Quote:
file1 goes to 100 limits

There should be a limit (not described explicitly) on the total size of SORT control statements. I doubt if SORT would accept 100,000 or more lines of comparison in INCLUDE statement.

But you can try.
Back to top
View user's profile Send private message
Rohit Umarjikar

Global Moderator


Joined: 21 Sep 2010
Posts: 2945
Location: NYC,USA

PostPosted: Fri Aug 26, 2022 5:21 pm
Reply with quote

Why complicate and design using DFSORT? But you can try one more approach.
1. Using resize split each range into two records and tag unique names.
2. In next step use the output from step1 in the sysnames and use these range tags to code in INCLUDE COND accordingly. The only problem is too many OR range in the INCLUDE.
3. This will eliminate Cartesian JOINKEYS.

However, I would agree with Robert , Easy to write program of your choice , simple for maintenance and provides clear understanding for someone who is new and code doesn’t have to be replaced if either data set size increases in future , think about 5 or 10 years later. You already asked if 1000 range to check against millions so it’s best to ahead programmatic way.

You can run both solution and compare testing stats and then decide based on the performance stats which one suits the need.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 1614

PostPosted: Fri Aug 26, 2022 5:51 pm
Reply with quote

Using SYMNAMES works exactly as my example, but only adds unneeded complexity into this simple process. icon_confused.gif
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts JOIN STATEMENT PERFORMANCE. DFSORT/ICETOOL 12
No new posts Converting ASCII values to COMP-3 (ZD... JCL & VSAM 2
No new posts Merge files with a key and insert a b... DFSORT/ICETOOL 6
No new posts Generate output lines (SYSIN card for... DFSORT/ICETOOL 4
No new posts COBOL program logic for comparisipn o... COBOL Programming 2
Search our Forums:

Back to Top