Hi, I have a FB file with LRECL=80 as shown below. My requirement is to find a particular 4 character string that appears multiple times within one record. For example, this 4 character string "2501" appears 3 times in 1st, 3 times in 2nd and 4 times in 3rd record and so on. How to get this count?
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
I have to say count is much easier to produce, and much easier to understand afterwards. If you just list, then a human has to still do the counting... they all have the same value...
Have a search for HUMPHREY in the DFSORT forum. That is for characters, but the same will apply for "strings" with a bit of extra calculation.
Note that my results are different from yours. For CBA and NSA the "2" at the start of the first 2501 you count is from the "key" so it would seem "unusual" to count it just because it happens to appear when data starts "501".
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Another, for fixed-length records. You can have a maximum of 18 "2501"s (excluding the key from your 80 bytes) so append 18 Xs to the end of each record. Then do the FINDREP, reducing the size of the storage occupied by the new string by one. This "shifts" the 18 added Xs one to the left for each "hit", padding with space. After the FINDREP, discover how many Xs are left of the 18.
One advantage of this over the PARSE is no limit of 100 that can be counted (doesn't matter in this example).
Note that this and the other as they stand are both "destructive" of their input records.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
If you really just want them listed, then what you have shown is all you need. I would put ABSPOS=7 on the first PARSE so it doesn't look in the "key". Bear in mind you need as many as the logical/actual maximum occurences.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
As enrico indicates, there is an easy way for you to answer your first.
ADDPOS and SUBPOS you use when you want to adjust the starting-point of the next PARSE by a known value. The "known value" doesn't have to give you an exact "hit", just so that the PARSE can find the correct starting-point when that is before the end of the previous PARSE (including what has defined the end of the PARSE) or where bytes are known to be not needed.
You might want to include what has delimited the previous PARSE as data in the next PARSE for instance, so use SUBPOS. Basically, anything you need that starts before the end point of the previous PARSE.
You might want to "skip" the prefix of a field, so ADDPOS. Basically, anything you need that starts further than the end point of the previous PARSE.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Got an update for this. A Bug Fix, as the original FINDREP has no ENDPOS (so could get false hits if counting numerics). And a simplification, to use a binary count value instead of the character one.
The binary method can count up to 255 simply, and beyond would not be onerous.
Code:
//HUMPHREY EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTOUT DD SYSOUT=*
//SYMNAMES DD *
*
* The character or string to be counted
*
SEARCH-CHAR-OR-STRING,C'1'
*
* Contents of ARBITRARY-STRING are irrelevant, as long as one byte
* shorter than SEARCH-CHAR-OR-STRING.
*
ARBITRARY-STRING,C''
*
* In binary, from zero to the maximum to be counted. If line is filled,
* multiple constants will be needed.
*
COUNTS-18,X'000102030405060708090A0B0C0D0E0F101112'
*
* Important - define the entire input record here.
*
INPUT-RECORD,*,80,CH
* Define DUMMY... imemdiately after the record, it is "POSITION'ed" to
* later.
*
DUMMY-FOR-FIRST-BYTE-AFTER-RECORD,*,1,CH
*
* Fields prefixed OUTPUT- are just for the example, not part
* of the solution.
*
OUTPUT-SEPARATOR,25,1,CH
OUTPUT-COUNT,26,2,ZD
OUTPUT-LAST-BYTE,80,1,CH
*
* To locate to first byte after end of record.
*
POSITION,DUMMY-FOR-FIRST-BYTE-AFTER-RECORD
*
* The following must be have the length of count constant(s), else some
* danger of a false hit.
*
EXTENSION-FOR-COUNTS-UNDERFLOW,*,19,CH
*
* If the constant for the count required splitting (see above)
* then each part of the constant requires a unique starting position
* immediately after then end of the previous one.
*
EXTENSION-FOR-COUNTS,*,=,CH
*
* This will hold the count, in binary of length one, starting from zero
* and going up to maximum needed, to a maximum of 255.
*
EXTENSION-FOR-COUNTS-1ST-BYTE,=,1,BI
//SYMNOUT DD SYSOUT=*
//SYSIN DD *
OPTION COPY
* IFOUTLEN is just for the example, not part of the solution.
*
INREC IFOUTLEN=80,
* Establishes the count contant(s) in the extention to the input record.
*
IFTHEN=(WHEN=INIT,
OVERLAY=(EXTENSION-FOR-COUNTS:COUNTS-18)),
* Does the required search, in the example from position 7 to position
* 80, change these to what is needed.
*
IFTHEN=(WHEN=INIT,
FINDREP=(IN=SEARCH-CHAR-OR-STRING,
OUT=ARBITRARY-STRING,
STARTPOS=7,
ENDPOS=80)),
* The appended constant has been shifted left by the FINDREP for each
* instance of the character/string found, so that the first byte of the
* extended counts points to the count of times shifted.
* The leading part of the appended count constant has "underflowed"
* into the place established for it. The count of successful searches
* is just sitting there in a one-byte binary field.
*
IFTHEN=(WHEN=INIT,
OVERLAY=(OUTPUT-COUNT:
EXTENSION-FOR-COUNTS-1ST-BYTE,
TO=ZD,
LENGTH=2)),