|
View previous topic :: View next topic
|
| Author |
Message |
Balaryan Warnings : 2 New User
.jpg)
Joined: 20 Nov 2009 Posts: 27 Location: chennai
|
|
|
|
Hi There,
I am trying to compare two files using a key and write to build one output file based on the conditions like, when a MATCH found, write the output from file-1 (col 1,59) and file-2 (29,09) content. When no match found, then write output from File-01 (col 1,59) along with '0000.0000'.
| Code: |
OPTION COPY
JOINKEYS FILE=F1,FIELDS=(01,10,A,12,04,A)
JOINKEYS FILE=F2,FIELDS=(01,10,A,12,04,A)
REFORMAT FIELDS=(F1:1,59,?,F2:29,09)
OUTREC IFTHEN=(WHEN=(60,01,CH,EQ,C'1'),
BUILD=(01:01,39,X,41:40,20,61:C'0000.0000')),
IFTHEN=(WHEN=(60,01,CH,EQ,C'B'),
BUILD=(01:01,39,40:51,09,50:49,11,61:61,09)),
IFOUTLEN=69
|
When I execute the aforementioned code, it has more millions of records in output even if I have input file records dint exceed few hundred thousand records. Looks like it has a lot of duplicate records build in the output file. When I include SUM FIELDS=NONE and eliminate spaces ' ' in column 1 to 59 to ignore duplicate and invalid records, I am getting the desired output.
I really don't understand what was an issue behind this much millions of records in output file. Any expertise advise would help. Thanks in advance. |
|
| Back to top |
|
 |
RahulG31
Active User
Joined: 20 Dec 2014 Posts: 446 Location: USA
|
|
|
|
As a simple test, if you have 10 records in each file all having same key values then you'll get 100 records in the output i.e. 10 times more than what is in the input file(s). The more the number of identical key records, the more it gets multiplied (Cartesian product).
So, if you have thousands of records (where duplicates can happen) then, it's no surprise to expect millions in the output.
. |
|
| Back to top |
|
 |
Balaryan Warnings : 2 New User
.jpg)
Joined: 20 Nov 2009 Posts: 27 Location: chennai
|
|
|
|
Hi Rahul,
Just curious, is there any way to have only the actual unique records without using 'SUM FIELDS=NONE' in this process? |
|
| Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2288 Location: USA
|
|
|
|
As discussed many times in this forum, the typical approach is:
1) assign unique ID (sequence number) to each input record with equal key of both input datasets
AAA -> AAA+001
AAA -> AAA+002
AAA -> AAA+003
BBB -> BBB+001
BBB -> BBB+002
CCC -> CCC+001
CCC -> CCC+002
CCC -> CCC+003
CCC -> CCC+004
. . . etc. . . . . . . (for both input files)
2) JOIN the records on combined keys, including just assigned unique ID as key part. If the number of records with the same original key in one file is greater than in another file, they will not match because of non-matching assigned unique ID.
The details of implementation must be obvious. If not, then:
1) RTFM
2) GOTO beginners forum |
|
| Back to top |
|
 |
Balaryan Warnings : 2 New User
.jpg)
Joined: 20 Nov 2009 Posts: 27 Location: chennai
|
|
|
|
| Thanks Sergeyken. I got it. |
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|