IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Reading dataset in Python - New Line characters


IBM Mainframe Forums -> All Other Mainframe Topics
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 12:13 am
Reply with quote

Hi,

I have a dataset as below, assigned to DDNAME DDIN.
It has 3 records in it, but when I read it with Python it reads the dataset as one single line. How to make the program treat NL as new line?

Code:
Command ===>                                       
******    ********************************* Top of D
000001    THIS IS A FIRST LINE IN THE INPUT FILE   
000002    THIS IS THE SECOND LINE IN THE INPUT FILE
000003    THIS IS THE THIRD LINE IN THE INPUT FILE 


The program:
Code:
count = 0   
reader = open("//DD:DDIN","r",encoding='cp037')   
for line in reader:                               
    count +=1                                     
reader.close()                                                                     
print(f'Number of lines in the file is {count}') 


Current Output:
Code:
Number of lines in the file is 1


Expected Output:
Code:
Number of lines in the file is 3
Back to top
View user's profile Send private message
Joerg.Findeisen

Senior Member


Joined: 15 Aug 2015
Posts: 1295
Location: Bamberg, Germany

PostPosted: Fri Aug 04, 2023 12:34 am
Reply with quote

Shouldn't you add a readline or similar?
Back to top
View user's profile Send private message
Pedro

Global Moderator


Joined: 01 Sep 2006
Posts: 2569
Location: Silicon Valley

PostPosted: Fri Aug 04, 2023 12:41 am
Reply with quote

To debug, I think you should print each line. That is, I do not think it is reading every line.
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 12:44 am
Reply with quote

The program is reading the input PS file as one single line,

Quote:
Shouldn't you add a readline or similar?

I think readline is not required as this will do the reading
Code:
for line in reader:


For code:
Code:
reader = open("//DD:DDIN","r",encoding='cp037') 
#######                                         
count = 0                                       
for line in reader:                             
    count +=1                                   
    print(line)                                                             
reader.close()                                                                   
print(f'Number of lines in the file is {count}')


The output is
Code:
THIS IS A FIRST LINE IN THE INPUT FILE  THIS IS THE SECOND LINE IN THE INPUT FILE  THIS IS THE THIRD LINE IN THE INPUT FILE   
Number of lines in the file is 1
Back to top
View user's profile Send private message
Pedro

Global Moderator


Joined: 01 Sep 2006
Posts: 2569
Location: Silicon Valley

PostPosted: Fri Aug 04, 2023 12:46 am
Reply with quote

re: treat NL as new line?

z/OS data sets do not normally have NL characters at the end of the line. For FB datasets, the system knows how long each line is. For VB datasets, the length is in the first half word (?) of the line.

Turn HEX ON to verify that there is indeed a NL character at the end of each line.
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 12:46 am
Reply with quote

How to tell that NL is the new line character? the python documentation doesn't have the NL character, they got CR, CRLF, LF
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 12:50 am
Reply with quote

Thank you Pedro and Joerg for looking at this,

Quote:
Turn HEX ON to verify that there is indeed a NL character at the end of each line.

There is no NL at the end of each line.
Quote:
z/OS data sets do not normally have NL characters at the end of the line. For FB datasets, the system knows how long each line is. For VB datasets, the length is in the first half word (?) of the line.

I dont know how to tell the program when a record ends icon_sad.gif
Back to top
View user's profile Send private message
Joerg.Findeisen

Senior Member


Joined: 15 Aug 2015
Posts: 1295
Location: Bamberg, Germany

PostPosted: Fri Aug 04, 2023 12:57 am
Reply with quote

Found the following sample https://community.ibm.com/community/user/ibmz-and-linuxone/discussion/reading-an-mvs-dataset-using-z-open-automation-utility:

all_lines = zoautil_py.datasets.read(...).split("\n")​
for line in all_lines:
# do work here


That would read the complete input and split by NL.
Back to top
View user's profile Send private message
Pedro

Global Moderator


Joined: 01 Sep 2006
Posts: 2569
Location: Silicon Valley

PostPosted: Fri Aug 04, 2023 12:59 am
Reply with quote

I do not have experience with python on z/OS, but likely the documentation is incomplete.

I think you need to experiment. Add x'00' (nul) to the end of the line and maybe x'25' (LF) to another. And also whatever hex chars your documentation says CR and CRLF are.
Back to top
View user's profile Send private message
Joerg.Findeisen

Senior Member


Joined: 15 Aug 2015
Posts: 1295
Location: Bamberg, Germany

PostPosted: Fri Aug 04, 2023 1:04 am
Reply with quote

@vasanthz: Can you please try the split() thing out? That would also cover @sergeyken's suggestion with the count of NL's.
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 1:10 am
Reply with quote

Joerg.Findeisen wrote:
Found the following sample https://community.ibm.com/community/user/ibmz-and-linuxone/discussion/reading-an-mvs-dataset-using-z-open-automation-utility:

all_lines = zoautil_py.datasets.read(...).split("\n")​
for line in all_lines:
# do work here


That would read the complete input and split by NL.


Thank you for the link, unfortunately we don't have zoautil installed currently. I was thinking of installing it, but it requires APF authorization which I currently don't have access to. I will ask my sysprogs.
Back to top
View user's profile Send private message
Pedro

Global Moderator


Joined: 01 Sep 2006
Posts: 2569
Location: Silicon Valley

PostPosted: Fri Aug 04, 2023 1:11 am
Reply with quote

re: "Can you please try the split() thing out?"

Vasanth has not described the data set attributes, but normally ISPF will not put \n characters at the end of each line.

For example, if it is FB80 with blksize 3120, when the access method creates out the data, it bunches up 39 logical records concatenated into a single physical record, without any \n separator characters.
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 1:16 am
Reply with quote

Quote:
For example, if it is FB80 with blksize 3120, when the access method creates out the data, it bunches up 39 logical records concatenated into a single physical record, without any /n separator characters.

The dataset is PS, FB, LRECL 80, Block Size 24000.

I can try to put more data into the file and see if it splits it by blocks. Thank you
Back to top
View user's profile Send private message
Pedro

Global Moderator


Joined: 01 Sep 2006
Posts: 2569
Location: Silicon Valley

PostPosted: Fri Aug 04, 2023 1:17 am
Reply with quote

Check your NULLS setting in your ISPF editor profile. Maybe turn on may change how the python reads it.

If I recall correctly, you have to edit each line for the nulls to be added at the end.
Back to top
View user's profile Send private message
Pedro

Global Moderator


Joined: 01 Sep 2006
Posts: 2569
Location: Silicon Valley

PostPosted: Fri Aug 04, 2023 1:20 am
Reply with quote

re: The dataset is PS, FB, LRECL 80, Block Size 24000.

Try blksize 80. It is inefficient storage wise, but if that is what it takes for your program to work....
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 1:27 am
Reply with quote

Quote:
Try blksize 80. It is inefficient storage wise, but if that is what it takes for your program to work....

I tried it, It didn't help icon_sad.gif still the input file gets read as a single record.
I guess as Joerg mentioned zoautil is essential for working with Mainframe datasets
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 1:31 am
Reply with quote

Quote:
Check your NULLS setting in your ISPF editor profile. Maybe turn on may change how the python reads it.

Created two files with NULLS ON and NULLS OFF profile setting, still no joy. I will try to print the file in hex and see what information it has
Back to top
View user's profile Send private message
Joerg.Findeisen

Senior Member


Joined: 15 Aug 2015
Posts: 1295
Location: Bamberg, Germany

PostPosted: Fri Aug 04, 2023 1:35 am
Reply with quote

Was/is it possible to apply the split() operation to the one single read record?

It was just a sample, I don't think you need that zoautil util. Please see also the other link regarding GitHub for reading large amounts of data.
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 2:00 am
Reply with quote

Thanks Joerg, the link has a package whl file,I will try downloading it.
community.ibm.com/community/user/ibmz-and-linuxone/discussion/reading-an-mvs-dataset-using-z-open-automation-utility

The SPLIT command didn't work,
AttributeError: '_io.TextIOWrapper' object has no attribute 'split'
Back to top
View user's profile Send private message
Joerg.Findeisen

Senior Member


Joined: 15 Aug 2015
Posts: 1295
Location: Bamberg, Germany

PostPosted: Fri Aug 04, 2023 2:09 am
Reply with quote

See if https://stackoverflow.com/questions/9857731/python-read-in-string-from-file-and-split-it-into-values is of any help. Basically it seems you can apply split() to the variable name. I don't know if file_line.split(..) is valid. Give it a try.
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10877
Location: italy

PostPosted: Fri Aug 04, 2023 2:22 am
Reply with quote

FB files/datasets do not have any control chars embedded

FB files/dataset should/must be trated as binary files

something along the lines of

Code:
file = open ("FB80file", "rb")
print(file.read(80))
file.close()


up to you to assign the record read to a byte buffer and loop until end of file
Back to top
View user's profile Send private message
Joerg.Findeisen

Senior Member


Joined: 15 Aug 2015
Posts: 1295
Location: Bamberg, Germany

PostPosted: Fri Aug 04, 2023 2:29 am
Reply with quote

enrico-sorichetti wrote:
file = open ("FB80file", "rb")

Remove the blank between the function name and it's arguments. It can make a big difference.
Back to top
View user's profile Send private message
vasanthz

Global Moderator


Joined: 28 Aug 2007
Posts: 1743
Location: Tirupur, India

PostPosted: Fri Aug 04, 2023 8:31 pm
Reply with quote

Thanks Enrico for the binary suggestion, but binary data is hard to work with.

The below code works OK, it uses the splitlines method, similar to the one Joerg suggested. The problem is, it reads the whole file and does the splitting. I will live with this for now and try to get zoautil installed.
Code:
reader = open("//DD:DDIN","r",encoding='cp037')
#######                                         
count = 0                                       
for line in reader:                             
    listrecords = line.splitlines()             
    for record in listrecords:                 
        count +=1                               
        print(record)                           
reader.close()                                 
print(f'Number of lines in the file is {count}')


Output:
Code:
THIS IS A FIRST LINE IN THE INPUT FILE   
THIS IS THE SECOND LINE IN THE INPUT FILE
THIS IS THE THIRD LINE IN THE INPUT FILE 
Number of lines in the file is 3         


Thank you everyone for useful tips and suggestions
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> All Other Mainframe Topics

 


Similar Topics
Topic Forum Replies
No new posts Reading Empty Dataset DFSORT/ICETOOL 5
No new posts Mainframe Dataset Binary or ASCII Mainframe Interview Questions 4
No new posts SPOOL to Mainframe dataset in batch mode JCL & VSAM 7
No new posts Write line by line from two files DFSORT/ICETOOL 7
No new posts FINDREP - Only first record from give... DFSORT/ICETOOL 3
Search our Forums:

Back to Top