I have a interesting querry I need to perform on a single file where there are two records that should always be in sequence together. The first 6 bytes of the first record are always 400000. The first 6 bytes of the second record should always be 400070. Occasionally there is a 400000 record followed by another 400000 record. I need to find all of these sprinkled in a file of 9 million records. It seems like I should be able to do this with a syncsort, but I'm struggling to find the right syntax.
So the file has records with a key like the following:
400000
400070
400000
400070
I'm looking for the occurance when this happens, I need to pull out that first 400000 record when it is followed by a second 400000 record.
Actually, I want identify the records that do NOT follow that pattern. So I'm expecting 400000 to be followed by 400070 every time. on the occurance where 400000 is followed by another 400000, then I want to pull out that first 400000 record. Not sure if that makes sense or not. It's not easy to explain.
Looking at this from different angle and maybe using a little more information to develop matching keys. I split the data out into 2 seperate files. All the 400 records in 1 file. All the 400070 records in another file. Each record has additional pieces of information that can be used as keys to match the 400000 and 400070 together, the write out the 400000 that doesn't have a matching 400070 to its own file, like an XSUM.
Each record has an account number and a transaction number that should match 1:1 for each 400000 and 400070 combo. Could I used a Sort Fields=None,XSUM type of function. Here's the clincher, the account number is in the same place on both files, but the transaction number is in different locations. Like below.
400000____Account#_Transaction#
400070____Account#_______Transaction#
I should be able to match the 2 records together identifying key field locations for F1 and F2, then we I don't get a match, write out the exception record to a seperate file.
I'm so close on this, but I'm just not strong with the syntax.
I think I got what I'm looking for using the following.
SORT FIELDS=COPY
JOINKEYS FILE=F1,FIELDS=(5,3,A,11,15,A,928,8,A),SORTED,NOSEQCK
JOINKEYS FILE=F2,FIELDS=(5,3,A,11,15,A,53,8,A),SORTED,NOSEQCK
JOIN F1,UNPAIRED,ONLY
END
First screw up was not accounting for the 4 bytes at the beginning of the VB file. Once I got thru beating myself up for that mistake, and I fixed all the key positions relative starting positions it worked.
I'm still validating my results, but the resulting Output file includes only records from F1 that did not have matching records in F2. Note: I took the original file that contained both 400000 and 400070 and split them into 2 files, one for each record type. I knew then that these two files are sorted in the same order and followed the same sort sequence. Then I could do a Join key on each file seperately to identify when the first file file found a record that didn't have a match in the second file, and then F1 UNPAIRED ONLY wrote out just those records from F1 that didn't have matches in F2.
You guys have been a great help! Thank you for responding and helping!