Need help in reading and validating a multiline file

ABINAYATHULASI · New User Joined: 28 May 2016 Posts: 3 Location: India

I have a file with a single record spanning across multiple lines.There are many such records in that file.

EG: Note :the single record spans from H1 TO S1. it may be even 15000 lines in a single record .
H1.......
H2........
H3......
D1......
D2.......
DN ST .........
DN BZ.........
DH.......
D1....
D2.....
S1........

I want to validate if DNST line(highlighted) alone matches with wat i pass in control card .If matches i have to write the entire block to output file if it doesnt match then ignore them.

enrico-sorichetti · Posted: Tue Apr 18, 2017 1:19 pm

Nic Clouston · Posted: Tue Apr 18, 2017 4:10 pm

More terminology - it is a 'data set' not 'file'.

Is that really one record or is it a group of related records? Perhaps with header(s) and maybe a trailer?

If it is a group of records then you should peruse the DFsort forum for similar reuirements.

Abid Hasan · New User Joined: 25 Mar 2013 Posts: 88 Location: India

Hello,

Basis what is shared, and if I understand it correctly, the requirement is to read a dataset and test each record for a certain value. All of these records being tested WILL BE a part of a group which can be uniquely identified using an existing header and tail record. There can be multiple groups, and each group can have as high as 15k records.
If the value being tested matches, then the entire group of records need to be written to output, else this group needs to be discarded.

If the above understanding is correct, AND you're looking for a COBOL solution - which will be much simpler to code as against DFSORT. The underlying challenge here is to hold the group of records in place until the test is complete; and until this point TS has not stated where THIS record which is to be tested will be present - so assumption is that it can occur anywhere in the entire group.

Define a table large enough to hold the entire group of records, say 15000.
Read each record, identify if it is header, start writing to the table. At the same time also test if YOUR condition holds true. If condition is true, then set a flag to true, then first write the complete table written until this point to output DS, followed by writing subsequent records directly to output (no need to write table anymore until the tail record for this group is encountered).
Once tail record is encountered, stop writing to output, initialize the table - and repeat the complete process as stated earlier.

In case of DFSORT, at best I think 2 passes of data will be required, and presumably the code will be a little complex - reason remains same - finding an algorithm to uniquely identify the data to be grouped and written.

Edit: The complexity increases if the position of the value to be checked is not fixed in the record.

enrico-sorichetti · Posted: Tue Apr 18, 2017 9:02 pm

RahulG31 · Active User Joined: 20 Dec 2014 Posts: 446 Location: USA

Abid Hasan, I would say we use both DFSORT and COBOL program.

Step1: Use DFSORT, IFTHEN=(WHEN=GROUP with BEGIN=Header Identifier and PUSH=Header Record (or Part of header record that can uniquely identify among different headers).
OUTFIL Include=Condition you want to check with build the PUSHed header.

The output of this Step should be the Uniquely Identifiable header records for the groups that have the required condition satisfied.

Step2: Create a COBOL program that uses the Original Input file as well as the file that we got from Step1 as the Input. The first step in COBOL program is to load the file from Step1 to Internal table and then READ the Original Input file to match Header against this Internal table. If a match is found then Write to Output.

The advantage over COBOL program only solution, will be that you don't have to write and initialize table multiple times and it won't be as large a table as it could be, I suppose.

.

Arun Raj · Posted: Tue Apr 18, 2017 10:20 pm

I think we have discussed similar topics a few months ago in the DFSORT forum. If we are bringing the sort product into this, then I would think JOINing the original input with a shortened version of itself (created as Rahul mentioned) using a JNFnCNTL would do this.

But well then, we are discussing a cobol solution here, I see enrico's logic has eliminated the need of any working storage table.

Abid Hasan · New User Joined: 25 Mar 2013 Posts: 88 Location: India

Hello Rahul/Arun,

You guys are spot on with the approaches involving DFSORT, the only problem that I could think of at the time of posting was at least 2 passes of data, again the reason being - first pass is required to traverse through the data once and identify the groups and pad an identifier. Rest of it is an algorithm to play with this and segregate.

Mr. Sorichetti,

Aah, you got me there

a much better approach indeed. I didn't like the solution I'd posted because of the 'undefined/large' (going by what TS had posted initially) size of the table. The revisit to the table is a costly affair, but would have been quick since all operation would be in-memory. But your solution solves it in a much better way.

Bill Woodger · Posted: Wed Apr 19, 2017 3:35 am

With Enterprise COBOL V4, there'll be no problem storing 15000 8000-byte records in a table. With V5+, no problem storing 15000 of any length from a sequential data set.