I have a requirement to remove duplicates from the input file.
The input file has 2 variables, Account and Action.
The sample input is as follows:
Acct1 Action1
Acct1 spaces
Acct1 Action1
Acct2 spaces -- For account 2
Acct2 spaces -- 2 records with no action
Acct3 Action3a -- For account 3
Acct3 Action3b -- 2 records with different actions
Acct4 Action4
Acct5 spaces
Conditions are:
1)If there is a duplicate account in the file (one with Action info and another without Action info), the record with Action info is extracted. (Acct1 in the above example)
2)If there is a duplicate account, and both record do not have Action info, then only one record is extracted.(Acct 2 in the above example)
3) If there are duplicate records on account key and they have diffrerent actions, then both the records should be extracted. (Acct 3 in the above example)
The expected output is:
Acct1 Action1
Acct2 spaces
Acct3 Action3a
Acct3 Action3b
Acct4 Action4
Acct5 spaces
There is a possibility of more records for each account with same or different Action.