Hi all,
I have one query on sort.
I am having records(in a set of three) in a file which is having duplicates after a rerun of that particulat step.Let me elaborate it:
There is a Stepxxx which produces a dataset abc.xyz
All the above six records are in the same output dataset abc.xyz.
Now how to sort the above file so that i get only latest run records in the output file aaa.xyz.
Like the date 2005.12.12.57 records with H,D,I after the date parameter i shud get in the output b`coz 57 is > 30 in the duplicate(after rerun of the file).This is only parameter that changes after rerun.
If u don`t get it,i will try explaining more.
Thanks
sunny
Joined: 15 Mar 2005 Posts: 17 Location: Toronto, Canada
Hi Sunnyk,
If I understand your issue correctly, here is how you can get only the latest run records into an output file from the 'so called' duplicate records file.
If the date stamp is same for all the records, then you can have this date field in INCLUDE statement of SORT, to have all the records with this date stamp written in the same order as the input file,
The SYSIN DD statement would be:
//SYSIN DD*
SORT FIELDS=COPY
INCLUDE COND=(5,13,CH,EQ,C'2005.12.12.57')
//
If the date stamp is greater than or equal to '2005.12.12.57' then
The SYSIN DD statement would be:
//SYSIN DD*
SORT FIELDS=COPY
INCLUDE COND=(5,13,CH,GE,C'2005.12.12.57')
//
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
sunnyk,
I assume that you don't want to hardcode the actual most current timestamp as Siva suggests since the timestamp will change each time you do the run.
You talk about the records being duplicates. "Duplicates" means that a pair of records has the same values in a particular field or fields. In your case, the pairs of records have different timestamps so they are obviously not duplicates on the timestamp. So I'll assume that the other fields besides the timestamp (for example, 1234 and H732832938jkdkdsdk1111 for the H pair) are what make the records duplicates. Given that assumption, you can use the following DFSORT/ICETOOL job to get the record with the latest timestamp for each pair of "duplicate" records:
Hi frank,
Thanks for ur quick response.But the problem is still half solved.Actually i want the output in the form H,D,I sequence i.e same as input dataset.But as ur output shows its sorted on that field too(field number 18).So is there any way to keep it as it is in H/D/I sequence.
Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
Quote:
is there any way to keep it as it is in H/D/I sequence.
Yes, by adding a sequence number we can sort on to get the records back in their original order, but it will take a couple more passes over the data. Here's the DFSORT/ICETOOL job to do it: