Weirdo results with JOINKEYS

Balaryan · New User Joined: 20 Nov 2009 Posts: 27 Location: chennai

Hi There,

I am trying to compare two files using a key and write to build one output file based on the conditions like, when a MATCH found, write the output from file-1 (col 1,59) and file-2 (29,09) content. When no match found, then write output from File-01 (col 1,59) along with '0000.0000'.

RahulG31 · Active User Joined: 20 Dec 2014 Posts: 446 Location: USA

As a simple test, if you have 10 records in each file all having same key values then you'll get 100 records in the output i.e. 10 times more than what is in the input file(s). The more the number of identical key records, the more it gets multiplied (Cartesian product).

So, if you have thousands of records (where duplicates can happen) then, it's no surprise to expect millions in the output.

.

Balaryan · New User Joined: 20 Nov 2009 Posts: 27 Location: chennai

Hi Rahul,

Just curious, is there any way to have only the actual unique records without using 'SUM FIELDS=NONE' in this process?

sergeyken · Posted: Wed Oct 18, 2017 10:43 pm

As discussed many times in this forum, the typical approach is:

1) assign unique ID (sequence number) to each input record with equal key of both input datasets

AAA -> AAA+001
AAA -> AAA+002
AAA -> AAA+003
BBB -> BBB+001
BBB -> BBB+002
CCC -> CCC+001
CCC -> CCC+002
CCC -> CCC+003
CCC -> CCC+004
. . . etc. . . . . . . (for both input files)

2) JOIN the records on combined keys, including just assigned unique ID as key part. If the number of records with the same original key in one file is greater than in another file, they will not match because of non-matching assigned unique ID.

The details of implementation must be obvious. If not, then:
1) RTFM
2) GOTO beginners forum

Balaryan · New User Joined: 20 Nov 2009 Posts: 27 Location: chennai

Thanks Sergeyken. I got it.