Search file for records that don't follow expected sequence

sdrolz · New User Joined: 18 Apr 2016 Posts: 4 Location: USA

I have a interesting querry I need to perform on a single file where there are two records that should always be in sequence together. The first 6 bytes of the first record are always 400000. The first 6 bytes of the second record should always be 400070. Occasionally there is a 400000 record followed by another 400000 record. I need to find all of these sprinkled in a file of 9 million records. It seems like I should be able to do this with a syncsort, but I'm struggling to find the right syntax.

So the file has records with a key like the following:
400000
400070
400000
400070

I'm looking for the occurance when this happens, I need to pull out that first 400000 record when it is followed by a second 400000 record.

400000
400000
400070
400000
400070[/b]

Joerg.Findeisen · Posted: Tue Dec 14, 2021 10:53 pm

You can use SORT JOINKEYS for that task. I assume you want to print the first record when your condition is met, right?

Joerg.Findeisen · Posted: Tue Dec 14, 2021 11:26 pm

Sample:

sdrolz · New User Joined: 18 Apr 2016 Posts: 4 Location: USA

Actually, I want identify the records that do NOT follow that pattern. So I'm expecting 400000 to be followed by 400070 every time. on the occurance where 400000 is followed by another 400000, then I want to pull out that first 400000 record. Not sure if that makes sense or not. It's not easy to explain.

sdrolz · New User Joined: 18 Apr 2016 Posts: 4 Location: USA

Looking at this from different angle and maybe using a little more information to develop matching keys. I split the data out into 2 seperate files. All the 400 records in 1 file. All the 400070 records in another file. Each record has additional pieces of information that can be used as keys to match the 400000 and 400070 together, the write out the 400000 that doesn't have a matching 400070 to its own file, like an XSUM.
Each record has an account number and a transaction number that should match 1:1 for each 400000 and 400070 combo. Could I used a Sort Fields=None,XSUM type of function. Here's the clincher, the account number is in the same place on both files, but the transaction number is in different locations. Like below.
400000____Account#_Transaction#
400070____Account#_______Transaction#

I should be able to match the 2 records together identifying key field locations for F1 and F2, then we I don't get a match, write out the exception record to a seperate file.

I'm so close on this, but I'm just not strong with the syntax.

Joerg.Findeisen · Posted: Wed Dec 15, 2021 11:12 am

See if that code snippet is of help for you.

sergeyken · Posted: Wed Dec 15, 2021 3:11 pm

General approach needs to take into account also real Account#, and TransactionID

1) use

sdrolz · New User Joined: 18 Apr 2016 Posts: 4 Location: USA

I think I got what I'm looking for using the following.

SORT FIELDS=COPY
JOINKEYS FILE=F1,FIELDS=(5,3,A,11,15,A,928,8,A),SORTED,NOSEQCK
JOINKEYS FILE=F2,FIELDS=(5,3,A,11,15,A,53,8,A),SORTED,NOSEQCK
JOIN F1,UNPAIRED,ONLY
END

First screw up was not accounting for the 4 bytes at the beginning of the VB file. Once I got thru beating myself up for that mistake, and I fixed all the key positions relative starting positions it worked.

I'm still validating my results, but the resulting Output file includes only records from F1 that did not have matching records in F2. Note: I took the original file that contained both 400000 and 400070 and split them into 2 files, one for each record type. I knew then that these two files are sorted in the same order and followed the same sort sequence. Then I could do a Join key on each file seperately to identify when the first file file found a record that didn't have a match in the second file, and then F1 UNPAIRED ONLY wrote out just those records from F1 that didn't have matches in F2.

You guys have been a great help! Thank you for responding and helping!

Joerg.Findeisen · Posted: Thu Dec 16, 2021 5:05 am

As you are searching for records out of sequence, I would consider not to expect all input as really SORTED.