I am having a dataset with account numbers and information about the accounts. Each account number can appear multiple times. They should be anonimized in such a way that every time a certain account number gets the same value. So, if account number 0912345678 appears 5 times it should for instance have five times the value gr762389qw (just an example)
Joined: 06 Jun 2008 Posts: 8449 Location: Dubuque, Iowa, USA
Is the data set sorted by the account number? If so, the logic is pretty simple -- read the first record, assign the replacement value, write to output, then loop reading next record, checking to see if it matches the previous account number and if so using the same replacement value (if not, assign a new one), and writing.
If the data set is not sorted by account number, you will need to build a table in your COBOL program of old account number and replacement value, and your logic will be read a record, look in the table to see if account number is already there (if not, add it and assign the replacement value) and use the replacement value, write the record. You will also need to have logic to detect if your table fills up, and depending upon the number of values you are replacing, you might find this approach to take a LOT of time.