I've a file with sample records as show below.. The data is in batch format.. with first record being the batch header and last record being the batch trailer. Each batch can have multiple invoices. Inv hdr starts with '1' and inv line starts with '2'. Is there any way to remove the duplicate invoices(inlcuding its line items)? In my example Inv# 111 details repeated twice. I wanted them to be removed so that I wont face duplicate inv error while loading them.
May be I have misunderstood the requirement but based on below comment from OP, I think he wanted to check of duplicate set of records
Is there any way to remove the duplicate invoices(inlcuding its line items)?
For example, If I change input records as below and re-run the same job, it still produces the same output even though entire "recordset" wasn't a duplicate.The job,in turn would remove second set of records with header '111'.
Like I said, may I have misunderstood the requirement but I thought OP wanted to compare line items as well.
Do you want to retain this group entirely as all the records under this group are not a 1 to 1 match with the earlier group? and the output will be a copy of input to output as is?
Another way of eliminating the duplicates is in the second set of records the header and the first detail record is a perfect match with the earlier group and then you delete them and move the mismatched detail record to prior group like this?
I should have originally provided expected output based on my understanding of the requirement.
Yes, from sample input I provided, I was thinking all the records needed to be retained as atleast one of the detail records didn't match. I afraid to say but yes I thought OP was asking about 1 to 1 match. I know, if this is what OP wanted, then its going to be tricky and that's why I asked OP about max. number of detail records.
Again,this is my understanding of the requirement and I could be wrong.
You get given an invoice by a car-repair company. It has a number on it, which will be unique for that company. Their copy of the invoice will be needed to produce their accounts/book-keeping and for their tax purposes (potentially many other things as well, stock for instance, blah, blah).
There can be more than one "thing" on the invoice. Like replacement parts, labour. Each "thing" will be one "line" on the invoice.
So, if there is a file with duplicate invoices, it should have all the lines matching the original invoice.
So, to this point, the requirment is the option as understood by Kolusu.
What if the invoice numbers are not unique? Data-entry error? Or the fact that a three-digit invoice doesn't give a lot of room for continuing uniqueness. Or maybe data-entry error with the "lines"?
TS hasn't asked for any of this to be covered in any way, so matching the invoice number and removing everything relating to the "duplicate" if it exists, is what he says he wants. Still with Kolusu's understanding.