Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
Well, you are most of the way there. You have eliminated all of the duplicated records of length 80, but now you may need a second pass of the data selecting only the first duplicate on the 4 byte key.
Probably not the most elegant or efficient way, but for a quick fix it should work. I am pretty sure that Frank or Kolusu will give a better solution a little later in the day.
Joined: 14 Mar 2007 Posts: 8797 Location: Welsh Wales
Well, based on your post, the data given and the required result, the code below worked for me.
The first select removes all duplicate records based on the file length of 80. So any unchanged records are deleted. This leaves the task of finding only records with the same key but a different data content.
The second select will pick only the first duplicate record of those that have an identical key value, but a different data content.
OK... I' ll bite
why not simply concatenate the two files in the proper sequence ...
the content of the concatenation of the two input files is the same as the OUT1 content
Here the file is 80 and assumed all are character fields. For ex: if the length of input file was 1000 and had all types of variables such as PD, ZD, BI, C etc., Do I have to include all individual variables after "ON"
i.e. like below? Will it work? if not is there any easier way?
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
That is NOT the correct syntax for multiple ON fields. The correct syntax would be:
ON(p,m,f) ON(p,m,f) ON(p,m,f)
However, you cannot use a 100 byte PD field. The maximum length you can use for a PD field is 16 bytes = 31 digits. A 100 byte PD field would have 199 digits ... kind of ridiculous.
As to whether you'd need to break down the large field into smaller fields of different formats, that depends on whether or not each field is "normalized". For example, if your PD fields have a mixture of F, C and D signs, then you'd need to treat them as PD. If they all just had C signs, then you could treat them as BI.
You DO NOT need CTL1CNTL ... a COPY operator automatically does a COPY operation. You don't have to tell it to do one with SORT FIELDS=COPY. You could write this as:
Code:
COPY FROM(IN1) TO(OUT1)
COPY FROM(IN2) TO(OUT1)
But using the concatenated data sets as input would eliminate the need for these COPY operators anyway.