I am trying to compare two files using a key and write to build one output file based on the conditions like, when a MATCH found, write the output from file-1 (col 1,59) and file-2 (29,09) content. When no match found, then write output from File-01 (col 1,59) along with '0000.0000'.
When I execute the aforementioned code, it has more millions of records in output even if I have input file records dint exceed few hundred thousand records. Looks like it has a lot of duplicate records build in the output file. When I include SUM FIELDS=NONE and eliminate spaces ' ' in column 1 to 59 to ignore duplicate and invalid records, I am getting the desired output.
I really don't understand what was an issue behind this much millions of records in output file. Any expertise advise would help. Thanks in advance.
As a simple test, if you have 10 records in each file all having same key values then you'll get 100 records in the output i.e. 10 times more than what is in the input file(s). The more the number of identical key records, the more it gets multiplied (Cartesian product).
So, if you have thousands of records (where duplicates can happen) then, it's no surprise to expect millions in the output.
2) JOIN the records on combined keys, including just assigned unique ID as key part. If the number of records with the same original key in one file is greater than in another file, they will not match because of non-matching assigned unique ID.
The details of implementation must be obvious. If not, then:
1) RTFM
2) GOTO beginners forum