My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach? If not, what can be a better approach to solve this.
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
Quote:
My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach? If not, what can be a better approach to solve this.
As long as you have unlimited time and memory, using the cartesian product will resolve the issue. But 100 million records is a lot to go through and can take a lot of time (and memory). And what happens if you have 500 records in file 1 and 2 million in file 2 -- now you're having to go through 1 billion records. Writing a program will take resources but you'll probably save enough by not using cartesian products that the programming time will be minor comparatively.
My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach?
Cartesian product is almost never a good solution.
sumannath wrote:
If not, what can be a better approach to solve this.
If F1 size is really not greater than 100 records, I would add an extra step before the main sort, to generate dynamically the required INCLUDE statement for further SORT step.
Any tool can be used for that purpose, even the SORT itself.
Your solution is working perfectly and is generating the required sort card with the includes dynamically built.
sergeyken wrote:
If F1 size is really not greater than 100 records, I would add an extra step before the main sort, to generate dynamically the required INCLUDE statement for further SORT step.
Wanted to understand why do you state the really not greater than 100 records? Will it take a significant time if the records in file 1 are about 1000 ?
Your solution is working perfectly and is generating the required sort card with the includes dynamically built.
sergeyken wrote:
If F1 size is really not greater than 100 records, I would add an extra step before the main sort, to generate dynamically the required INCLUDE statement for further SORT step.
Wanted to understand why do you state the really not greater than 100 records? Will it take a significant time if the records in file 1 are about 1000 ?
This is what you wrote:
Quote:
file1 goes to 100 limits
There should be a limit (not described explicitly) on the total size of SORT control statements. I doubt if SORT would accept 100,000 or more lines of comparison in INCLUDE statement.
Why complicate and design using DFSORT? But you can try one more approach.
1. Using resize split each range into two records and tag unique names.
2. In next step use the output from step1 in the sysnames and use these range tags to code in INCLUDE COND accordingly. The only problem is too many OR range in the INCLUDE.
3. This will eliminate Cartesian JOINKEYS.
However, I would agree with Robert , Easy to write program of your choice , simple for maintenance and provides clear understanding for someone who is new and code doesn’t have to be replaced if either data set size increases in future , think about 5 or 10 years later. You already asked if 1000 range to check against millions so it’s best to ahead programmatic way.
You can run both solution and compare testing stats and then decide based on the performance stats which one suits the need.