View previous topic :: View next topic
|
Author |
Message |
sdrolz
New User

Joined: 18 Apr 2016 Posts: 4 Location: USA
|
|
|
|
I have a interesting querry I need to perform on a single file where there are two records that should always be in sequence together. The first 6 bytes of the first record are always 400000. The first 6 bytes of the second record should always be 400070. Occasionally there is a 400000 record followed by another 400000 record. I need to find all of these sprinkled in a file of 9 million records. It seems like I should be able to do this with a syncsort, but I'm struggling to find the right syntax.
So the file has records with a key like the following:
400000
400070
400000
400070
I'm looking for the occurance when this happens, I need to pull out that first 400000 record when it is followed by a second 400000 record.
400000
400000
400070
400000
400070[/b] |
|
Back to top |
|
 |
Joerg.Findeisen
Senior Member

Joined: 15 Aug 2015 Posts: 1388 Location: Bamberg, Germany
|
|
|
|
You can use SORT JOINKEYS for that task. I assume you want to print the first record when your condition is met, right? |
|
Back to top |
|
 |
Joerg.Findeisen
Senior Member

Joined: 15 Aug 2015 Posts: 1388 Location: Bamberg, Germany
|
|
|
|
Sample:
Code: |
400000 #1
400000 #2
400070
400000 #3
400070
400000 #4
400000 #5
400070 |
Output:
Code: |
400000 #1 has a match in line 00000001
400000 #4 has a match in line 00000006 |
|
|
Back to top |
|
 |
sdrolz
New User

Joined: 18 Apr 2016 Posts: 4 Location: USA
|
|
|
|
Actually, I want identify the records that do NOT follow that pattern. So I'm expecting 400000 to be followed by 400070 every time. on the occurance where 400000 is followed by another 400000, then I want to pull out that first 400000 record. Not sure if that makes sense or not. It's not easy to explain. |
|
Back to top |
|
 |
sdrolz
New User

Joined: 18 Apr 2016 Posts: 4 Location: USA
|
|
|
|
Looking at this from different angle and maybe using a little more information to develop matching keys. I split the data out into 2 seperate files. All the 400 records in 1 file. All the 400070 records in another file. Each record has additional pieces of information that can be used as keys to match the 400000 and 400070 together, the write out the 400000 that doesn't have a matching 400070 to its own file, like an XSUM.
Each record has an account number and a transaction number that should match 1:1 for each 400000 and 400070 combo. Could I used a Sort Fields=None,XSUM type of function. Here's the clincher, the account number is in the same place on both files, but the transaction number is in different locations. Like below.
400000____Account#_Transaction#
400070____Account#_______Transaction#
I should be able to match the 2 records together identifying key field locations for F1 and F2, then we I don't get a match, write out the exception record to a seperate file.
I'm so close on this, but I'm just not strong with the syntax. |
|
Back to top |
|
 |
Joerg.Findeisen
Senior Member

Joined: 15 Aug 2015 Posts: 1388 Location: Bamberg, Germany
|
|
|
|
See if that code snippet is of help for you.
Code: |
//WHATEVER EXEC PGM=SORT
//F1 DD *
400000 #1
400000 #2
400070
400000 #3
400070
400000 #4
400000 #5
400070
/*
//F2 DD .. <* exactly the same as DD:F1
//SYSOUT DD SYSOUT=*
//SORTOUT DD SYSOUT=*
//SYSIN DD *
OPTION COPY
JOINKEYS F1=F1,FIELDS=(21,8,A)
JOINKEYS F2=F2,FIELDS=(21,8,A)
REFORMAT FIELDS=(F1:1,10,F2:1,10)
OUTREC OVERLAY=(40:SEQNUM,8,ZD)
OUTFIL FNAMES=(SORTOUT),
REMOVECC,
INCLUDE=(1,6,BI,EQ,11,6,BI),
BUILD=(11,10,X,C'has a match in line',X,40,8,ZD,M11)
END
/*
//JNF1CNTL DD *
INREC IFTHEN=(WHEN=INIT,OVERLAY=(21:SEQNUM,8,ZD,START=0))
END
/*
//JNF2CNTL DD *
INREC IFTHEN=(WHEN=INIT,OVERLAY=(21:SEQNUM,8,ZD,START=1))
END
/* |
|
|
Back to top |
|
 |
sergeyken
Senior Member

Joined: 29 Apr 2008 Posts: 2217 Location: USA
|
|
|
|
General approach needs to take into account also real Account#, and TransactionID
1) use
Code: |
INREC IFTHEN=(WHEN=(1,6,CH,EQ,C’400000’),
BUILD=(…align TransactionID….)) |
2) optionally use (to be on safe side)
Code: |
SORT FIELDS=(Account#…,TransactionID…),EQUALS |
3) identify correct groups
Code: |
OUTREC IFTHEN=(WHEN=GROUP,
BEGIN=(1,6,CH,EQ,C’400000’),
END=(1,6,CH,EQ,C’400070’), - maybe not needed, to detect orphaned 400000….
PUSH=(100:ID=8,SEQ=2)) |
4) split the result into two temporary datasets (OUTFIL FNAMES=…): for 400000, and for 400070
5) using JOINKEYS on ID values, you can easily detect the non-matching 400000 lines, together with their Account#, and TransactionID
6) use SYNCTOOL instead of SYNCSORT to perform all stages as a single JCL step |
|
Back to top |
|
 |
sdrolz
New User

Joined: 18 Apr 2016 Posts: 4 Location: USA
|
|
|
|
I think I got what I'm looking for using the following.
SORT FIELDS=COPY
JOINKEYS FILE=F1,FIELDS=(5,3,A,11,15,A,928,8,A),SORTED,NOSEQCK
JOINKEYS FILE=F2,FIELDS=(5,3,A,11,15,A,53,8,A),SORTED,NOSEQCK
JOIN F1,UNPAIRED,ONLY
END
First screw up was not accounting for the 4 bytes at the beginning of the VB file. Once I got thru beating myself up for that mistake, and I fixed all the key positions relative starting positions it worked.
I'm still validating my results, but the resulting Output file includes only records from F1 that did not have matching records in F2. Note: I took the original file that contained both 400000 and 400070 and split them into 2 files, one for each record type. I knew then that these two files are sorted in the same order and followed the same sort sequence. Then I could do a Join key on each file seperately to identify when the first file file found a record that didn't have a match in the second file, and then F1 UNPAIRED ONLY wrote out just those records from F1 that didn't have matches in F2.
You guys have been a great help! Thank you for responding and helping! |
|
Back to top |
|
 |
Joerg.Findeisen
Senior Member

Joined: 15 Aug 2015 Posts: 1388 Location: Bamberg, Germany
|
|
|
|
As you are searching for records out of sequence, I would consider not to expect all input as really SORTED. |
|
Back to top |
|
 |
|
|