I was wondering if anybody could help me with a minor niggle I have....
I've created a Data Mining job that starts with a list of account numbers and builds up the different characteristics for each account from lots of different input files.
As not all of the accounts have a data on the other files, when I update a field on the base file, I create a new file of matched records and then have to re-merge (sort-dedupe) the original base file with the new file so I've always got the same number of records I started out with :-
Is there any way to code the above in 1 step rather than the 2 that i'm currently using ?
The above works perfectly fine, but I'm now up to 17 of these in my datamining job and it's starting to add quite a bit of time/cpu onto the job when it runs.
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
Steve,
Will you have duplicate keys on 1,16 in either of your input data sets here : MINED.DATA.BASE/ORDER.TOTALS? If no, then the below might help.
If the field from ORDER.TOTALS has to be written to output only if you find a match in it, and if you like to retain the original data when no match is found you could do it with a JOIN UNPAIRED,F1 in your first step. Currently you do not have a JOIN statement that means - return only matching records. With JOIN UNPAIRED,F1 you get all the records from F1 plus F2 fields when a match is found.
And you might need an INREC IFTHEN to selectively OVERLAY the F2 field only when a match is found. To know whether a match is found, you might want to use the matchmarker ("?") in your REFORMAT that assumes values "1" - File1 Only, "2" - File2 Only OR 'B' - Both.
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
Yes, you should retain the original value in REFORMAT and then in the INREC IFTHEN, evaluate if it is a matched record, and then OVERLAY the data from F2. So only for matched records the field will get overwritten. For non-matching records, the field will remain unchanged.
This is untested, but it would look something like this: