I have one query on sort.
I am having records(in a set of three) in a file which is having duplicates after a rerun of that particulat step.Let me elaborate it:
There is a Stepxxx which produces a dataset abc.xyz
All the above six records are in the same output dataset abc.xyz.
Now how to sort the above file so that i get only latest run records in the output file aaa.xyz.
Like the date 2005.12.12.57 records with H,D,I after the date parameter i shud get in the output b`coz 57 is > 30 in the duplicate(after rerun of the file).This is only parameter that changes after rerun.
If u don`t get it,i will try explaining more.
I assume that you don't want to hardcode the actual most current timestamp as Siva suggests since the timestamp will change each time you do the run.
You talk about the records being duplicates. "Duplicates" means that a pair of records has the same values in a particular field or fields. In your case, the pairs of records have different timestamps so they are obviously not duplicates on the timestamp. So I'll assume that the other fields besides the timestamp (for example, 1234 and H732832938jkdkdsdk1111 for the H pair) are what make the records duplicates. Given that assumption, you can use the following DFSORT/ICETOOL job to get the record with the latest timestamp for each pair of "duplicate" records:
Thanks for ur quick response.But the problem is still half solved.Actually i want the output in the form H,D,I sequence i.e same as input dataset.But as ur output shows its sorted on that field too(field number 18).So is there any way to keep it as it is in H/D/I sequence.