|
View previous topic :: View next topic
|
| Author |
Message |
sumannath
New User

Joined: 20 Mar 2017 Posts: 8 Location: India
|
|
|
|
I have two files with the following data:
File 1:
| Code: |
----+----1----+----2
100101 110010
150001 154999
530000 550000
130101 134999
410010 419999
|
and
File 2:
| Code: |
----+----1
146598
356452
225698
531254
100998
413265
654214
418754
151247
544155
236547
|
I have to get records from file 2 which fall between any of the limits (values) of file 1.
I have written the following sort jcl and it seems to work:
| Code: |
//STEP10 EXEC PGM=SORT
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SORTJNF1 DD *
100101 110010
150001 154999
530000 550000
130101 134999
410010 419999
/*
//SORTJNF2 DD *
146598
356452
225698
531254
100998
413265
654214
418754
151247
544155
236547
/*
//JNF1CNTL DD *
INREC BUILD=(1,15,C'X')
/*
//JNF2CNTL DD *
INREC BUILD=(1,7,C'X')
/*
//SORTOUT DD SYSOUT=*
//SYSIN DD *
JOINKEYS FILES=F1,FIELDS=(16,1,A)
JOINKEYS FILES=F2,FIELDS=(8,1,A)
REFORMAT FIELDS(F1:1,14,F2:1,6)
SORT FIELDS=COPY
OUTFIL INCLUDE=(15,6,ZD,GT,1,6,ZD,AND,15,6,ZD,LT,8,6,ZD)
//
|
The sort first finds the cartesian product of the two files and then prints the records where the value is between the limits:
o/p:
| Code: |
100101 110010 100998
150001 154999 151247
530000 550000 531254
530000 550000 544155
410010 419999 413265
410010 419999 418754
|
My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach? If not, what can be a better approach to solve this. |
|
| Back to top |
|
 |
Robert Sample
Global Moderator

Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
| Quote: |
| My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach? If not, what can be a better approach to solve this. |
As long as you have unlimited time and memory, using the cartesian product will resolve the issue. But 100 million records is a lot to go through and can take a lot of time (and memory). And what happens if you have 500 records in file 1 and 2 million in file 2 -- now you're having to go through 1 billion records. Writing a program will take resources but you'll probably save enough by not using cartesian products that the programming time will be minor comparatively. |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2286 Location: USA
|
|
|
|
| sumannath wrote: |
My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach? |
Cartesian product is almost never a good solution.
| sumannath wrote: |
| If not, what can be a better approach to solve this. |
If F1 size is really not greater than 100 records, I would add an extra step before the main sort, to generate dynamically the required INCLUDE statement for further SORT step.
Any tool can be used for that purpose, even the SORT itself.
| Code: |
//*============================
//GENCOND EXEC PGM=SORT
// . . . . . . . .
//SORTIN DD *
100101 110010
150001 154999
530000 550000
130101 134999
410010 419999
//*
//SORTOUT DD DISP=(NEW,PASS),SPACE=(TRK,(10,10)),
// DSN=&&INCLUDE
//SYSIN DD *
SORT FIELDS=COPY
OUTFIL REMOVECC,
HEADER1=(C' INCLUDE COND=(1,6,CH,EQ,X''00'', DUMMY COND'),
BUILD=(10:C'OR,1,6,ZD,GT,',1,6,
C',AND,1,6,ZD,LT,',8,6,C',',
80:X), assure LRECL=80
TRAILER1=(10:C'OR,8,6,CH,EQ,X''00'') DUMMY COND',
/,C' SORT FIELDS=COPY')
//*
//*============================
//STEP10 EXEC PGM=SORT,COND=(0,NE,GENCOND)
// . . . . . . . . . . .
//SORTIN DD *
146598
356452
225698
531254
100998
413265
654214
418754
151247
544155
236547
//*
//SORTOUT DD SYSOUT=*
//SYSIN DD DISP=(OLD,DELETE),DSN=&&INCLUDE
//*
//*============================
// |
|
|
| Back to top |
|
 |
sumannath
New User

Joined: 20 Mar 2017 Posts: 8 Location: India
|
|
|
|
Your solution is working perfectly and is generating the required sort card with the includes dynamically built.
| sergeyken wrote: |
If F1 size is really not greater than 100 records, I would add an extra step before the main sort, to generate dynamically the required INCLUDE statement for further SORT step.
|
Wanted to understand why do you state the really not greater than 100 records? Will it take a significant time if the records in file 1 are about 1000 ? |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2286 Location: USA
|
|
|
|
| sumannath wrote: |
Your solution is working perfectly and is generating the required sort card with the includes dynamically built.
| sergeyken wrote: |
If F1 size is really not greater than 100 records, I would add an extra step before the main sort, to generate dynamically the required INCLUDE statement for further SORT step.
|
Wanted to understand why do you state the really not greater than 100 records? Will it take a significant time if the records in file 1 are about 1000 ? |
This is what you wrote:
| Quote: |
| file1 goes to 100 limits |
There should be a limit (not described explicitly) on the total size of SORT control statements. I doubt if SORT would accept 100,000 or more lines of comparison in INCLUDE statement.
But you can try. |
|
| Back to top |
|
 |
Rohit Umarjikar
Global Moderator

Joined: 21 Sep 2010 Posts: 3109 Location: NYC,USA
|
|
|
|
Why complicate and design using DFSORT? But you can try one more approach.
1. Using resize split each range into two records and tag unique names.
2. In next step use the output from step1 in the sysnames and use these range tags to code in INCLUDE COND accordingly. The only problem is too many OR range in the INCLUDE.
3. This will eliminate Cartesian JOINKEYS.
However, I would agree with Robert , Easy to write program of your choice , simple for maintenance and provides clear understanding for someone who is new and code doesn’t have to be replaced if either data set size increases in future , think about 5 or 10 years later. You already asked if 1000 range to check against millions so it’s best to ahead programmatic way.
You can run both solution and compare testing stats and then decide based on the performance stats which one suits the need. |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2286 Location: USA
|
|
|
|
Using SYMNAMES works exactly as my example, but only adds unneeded complexity into this simple process.  |
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|