Reading dataset in Python - New Line characters

vasanthz · Posted: Fri Aug 04, 2023 12:13 am

Hi,

I have a dataset as below, assigned to DDNAME DDIN.
It has 3 records in it, but when I read it with Python it reads the dataset as one single line. How to make the program treat NL as new line?

Joerg.Findeisen · Posted: Fri Aug 04, 2023 12:34 am

Shouldn't you add a readline or similar?

Pedro · Posted: Fri Aug 04, 2023 12:41 am

To debug, I think you should print each line. That is, I do not think it is reading every line.

vasanthz · Posted: Fri Aug 04, 2023 12:44 am

The program is reading the input PS file as one single line,

Pedro · Posted: Fri Aug 04, 2023 12:46 am

re: treat NL as new line?

z/OS data sets do not normally have NL characters at the end of the line. For FB datasets, the system knows how long each line is. For VB datasets, the length is in the first half word (?) of the line.

Turn HEX ON to verify that there is indeed a NL character at the end of each line.

vasanthz · Posted: Fri Aug 04, 2023 12:46 am

How to tell that NL is the new line character? the python documentation doesn't have the NL character, they got CR, CRLF, LF

vasanthz · Posted: Fri Aug 04, 2023 12:50 am

Thank you Pedro and Joerg for looking at this,

Joerg.Findeisen · Posted: Fri Aug 04, 2023 12:57 am

Found the following sample https://community.ibm.com/community/user/ibmz-and-linuxone/discussion/reading-an-mvs-dataset-using-z-open-automation-utility:

all_lines = zoautil_py.datasets.read(...).split("\n")
for line in all_lines:
# do work here

That would read the complete input and split by NL.

Pedro · Posted: Fri Aug 04, 2023 12:59 am

I do not have experience with python on z/OS, but likely the documentation is incomplete.

I think you need to experiment. Add x'00' (nul) to the end of the line and maybe x'25' (LF) to another. And also whatever hex chars your documentation says CR and CRLF are.

Joerg.Findeisen · Posted: Fri Aug 04, 2023 1:04 am

@vasanthz: Can you please try the split() thing out? That would also cover @sergeyken's suggestion with the count of NL's.

vasanthz · Posted: Fri Aug 04, 2023 1:10 am

Pedro · Posted: Fri Aug 04, 2023 1:11 am

re: "Can you please try the split() thing out?"

Vasanth has not described the data set attributes, but normally ISPF will not put \n characters at the end of each line.

For example, if it is FB80 with blksize 3120, when the access method creates out the data, it bunches up 39 logical records concatenated into a single physical record, without any \n separator characters.

vasanthz · Posted: Fri Aug 04, 2023 1:16 am

Pedro · Posted: Fri Aug 04, 2023 1:17 am

Check your NULLS setting in your ISPF editor profile. Maybe turn on may change how the python reads it.

If I recall correctly, you have to edit each line for the nulls to be added at the end.

Pedro · Posted: Fri Aug 04, 2023 1:20 am

re: The dataset is PS, FB, LRECL 80, Block Size 24000.

Try blksize 80. It is inefficient storage wise, but if that is what it takes for your program to work....

vasanthz · Posted: Fri Aug 04, 2023 1:27 am

vasanthz · Posted: Fri Aug 04, 2023 1:31 am

Joerg.Findeisen · Posted: Fri Aug 04, 2023 1:35 am

Was/is it possible to apply the split() operation to the one single read record?

It was just a sample, I don't think you need that zoautil util. Please see also the other link regarding GitHub for reading large amounts of data.

vasanthz · Posted: Fri Aug 04, 2023 2:00 am

Thanks Joerg, the link has a package whl file,I will try downloading it.
community.ibm.com/community/user/ibmz-and-linuxone/discussion/reading-an-mvs-dataset-using-z-open-automation-utility

The SPLIT command didn't work,
AttributeError: '_io.TextIOWrapper' object has no attribute 'split'

Joerg.Findeisen · Posted: Fri Aug 04, 2023 2:09 am

See if https://stackoverflow.com/questions/9857731/python-read-in-string-from-file-and-split-it-into-values is of any help. Basically it seems you can apply split() to the variable name. I don't know if file_line.split(..) is valid. Give it a try.

enrico-sorichetti · Posted: Fri Aug 04, 2023 2:22 am

FB files/datasets do not have any control chars embedded

FB files/dataset should/must be trated as binary files

something along the lines of

Joerg.Findeisen · Posted: Fri Aug 04, 2023 2:29 am

vasanthz · Posted: Fri Aug 04, 2023 8:31 pm

Thanks Enrico for the binary suggestion, but binary data is hard to work with.

The below code works OK, it uses the splitlines method, similar to the one Joerg suggested. The problem is, it reads the whole file and does the splitting. I will live with this for now and try to get zoautil installed.