Joined: 06 Jun 2008 Posts: 8671 Location: Dubuque, Iowa, USA
My question is, if files increase in size, file1 goes to 100 limits and file2 has a million records, is this a good approach? If not, what can be a better approach to solve this.
As long as you have unlimited time and memory, using the cartesian product will resolve the issue. But 100 million records is a lot to go through and can take a lot of time (and memory). And what happens if you have 500 records in file 1 and 2 million in file 2 -- now you're having to go through 1 billion records. Writing a program will take resources but you'll probably save enough by not using cartesian products that the programming time will be minor comparatively.
Why complicate and design using DFSORT? But you can try one more approach.
1. Using resize split each range into two records and tag unique names.
2. In next step use the output from step1 in the sysnames and use these range tags to code in INCLUDE COND accordingly. The only problem is too many OR range in the INCLUDE.
3. This will eliminate Cartesian JOINKEYS.
However, I would agree with Robert , Easy to write program of your choice , simple for maintenance and provides clear understanding for someone who is new and code doesn’t have to be replaced if either data set size increases in future , think about 5 or 10 years later. You already asked if 1000 range to check against millions so it’s best to ahead programmatic way.
You can run both solution and compare testing stats and then decide based on the performance stats which one suits the need.