IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

How handle huge file using REXX execio


IBM Mainframe Forums -> CLIST & REXX
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Prosenjit001

New User


Joined: 02 Nov 2011
Posts: 14
Location: India

PostPosted: Wed Nov 02, 2011 3:45 pm
Reply with quote

I have one huge dataset , I want to read that dataset and parse based on some condition . But I am getting an error , due to execio unable process that huge dataset icon_sad.gif , Is there any other alternative to process huge files usingREXX ?
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Wed Nov 02, 2011 3:53 pm
Reply with quote

if it just has to be in rexx, instead of a more appropriate utility, such as sort,
which has amazing parsing capabiliites,
just read on record at a time,
parse,
then write the record.

that way you don't have an input or output stem overflow/size problem
Back to top
View user's profile Send private message
Bill O'Boyle

CICS Moderator


Joined: 14 Jan 2008
Posts: 2501
Location: Atlanta, Georgia, USA

PostPosted: Wed Nov 02, 2011 4:14 pm
Reply with quote

REXX was not designed for this amount of I-O.

You should follow Dick's suggestion, use a REXX alternative and abandon this folly....

Mr. Bill
Back to top
View user's profile Send private message
Prosenjit001

New User


Joined: 02 Nov 2011
Posts: 14
Location: India

PostPosted: Wed Nov 02, 2011 4:20 pm
Reply with quote

Actually my problem is , file layout is not properly defined , fileds in the file are separated my '~' , if some fields are missing then then there is blank

File looks like -

aaaa~bbb~cc~dddd
a~~ cccc~dd
~aa~cc~ddddd
Back to top
View user's profile Send private message
dbzTHEdinosauer

Global Moderator


Joined: 20 Oct 2006
Posts: 6966
Location: porcelain throne

PostPosted: Wed Nov 02, 2011 4:23 pm
Reply with quote

actually, the problem is your limited skill set.

suggest you spend a few moments looking at the sort manual.
As I said, sort parsing capability is amazing
and the type of parsing that you have indicated, is not hard.
Back to top
View user's profile Send private message
Marso

REXX Moderator


Joined: 13 Mar 2006
Posts: 1353
Location: Israel

PostPosted: Wed Nov 02, 2011 7:03 pm
Reply with quote

Prosenjit001 wrote:
File looks like -
aaaa~bbb~cc~dddd
a~~ cccc~dd
~aa~cc~ddddd

If your short sample is representative, then the layout is properly defined: there are 4 values, each separated by a tilde

As already advised, use SORT.
Read the "Deconstruct and reconstruct CSV records" chapter in this document: Smart DFSORT Tricks
Use this as a base for your purpose.

If you have SYNCSORT, try anyway, there are many similarities between the 2 products.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Wed Nov 02, 2011 7:48 pm
Reply with quote

Hello,

If you are not able to do what you want with your sort product, this would be a very simple bit of cobol code. . .

Read
Unstring
Process

That's all. . .
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10889
Location: italy

PostPosted: Wed Nov 02, 2011 7:52 pm
Reply with quote

it would be nice if the TS could explain better the requirement
ok for the input ( pretty easy to understand )
but... what about the output expected ?
Back to top
View user's profile Send private message
enrico-sorichetti

Superior Member


Joined: 14 Mar 2007
Posts: 10889
Location: italy

PostPosted: Wed Nov 02, 2011 8:00 pm
Reply with quote

the suggestion not to use EXECIO to process huge files belong to the same category of the suggestion about not using huge stems.

Quote:
due to execio unable process that huge dataset

not an EXECIO issue, rather your lack of skills.

EXECIO will process any dataset whatever it' s size ( if You know how to do it)

the issue about <segmenting> IO operations with EXECIO has been discussed quite a few times

the suggestion is not about capability, but about performance
Back to top
View user's profile Send private message
Ed Goodman

Active Member


Joined: 08 Jun 2011
Posts: 556
Location: USA

PostPosted: Wed Nov 02, 2011 10:47 pm
Reply with quote

A slight hint: The Execio statement does NOT have to read all records at once. You can read as few as one record at a time.
Back to top
View user's profile Send private message
jerryte

Active User


Joined: 29 Oct 2010
Posts: 203
Location: Toronto, ON, Canada

PostPosted: Fri Nov 04, 2011 11:31 pm
Reply with quote

Prosenjit,

If you do "EXECIO * DISKR" then it copies the entire dataset into memory. The larger the file the more memory is needed. Thus a very large file will cause an abend.

I would suggest to do something like "EXECIO 1000 DISKR" which will read 1000 records at a time. Then check for RC = 2 which means end of file. Code the logic to process the 1000 (or less when EOF) and then read the next 1000. Use a stem variable to make it easy.
NOTE: you could read one record at a time but this would take a long time to execute given that your file is large. Do 1000 or more at a time.

Hope this helps.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Fri Nov 04, 2011 11:42 pm
Reply with quote

Hello,

There is no good reason to process a "huge" file using rexx - no matter how the file is read (little at a time or all at once). . . Indeed, if the file is huge and there is considerable "work" to do with each record, the cpu requirement will be most unattractive (possibly unacceptable by the management).

If the sole purpose of your process is to reformat the records, you can do this easily with your sort product or a simple COBOL program that reads a record, UNSTRINGs the record based on the tilde delimiter (~), and writes a new file with the reformatted data.
Back to top
View user's profile Send private message
JPVRoff

New User


Joined: 06 Oct 2009
Posts: 45
Location: Melbourne, Australia

PostPosted: Thu Nov 10, 2011 11:54 am
Reply with quote

dick scherrer wrote:
There is no good reason to process a "huge" file using rexx - no matter how the file is read (little at a time or all at once). . .


Hi Dick,

I guess it all depends on how big 'huge' is. Rexx & EXECIO are very useful tools to write a quick 'n' dirty fix, as it generally doesn't need much in the way of coding and testing.

When you count the number of compiles, and coding time, for a single file manipulation, sometimes Rexx can come out ahead. Depends on the size of the file, I guess.

I have a test thing set-up for when it comes to reading in files just to check usage. It's only a small (3000 records) test, but it gives some indication of CPU, etc, use.
For a 3,000 record, 27,000 byte file (averaged over a few runs - probably +/- 2%):
Read 1 at a time - 5292 SRV - 0.063 CPU seconds
Read 10 at a time - 4105 SRV - 0.046 CPU seconds
Read 100 at a time - 4049 SRV - 0.041 CPU seconds
Read 1000 at a time - 4681 SRV - 0.056 CPU seconds
Read all at once - 4812 SRV - 0.057 CPU seconds

By experimentation, I found that 50-200 records at a time, regardless of the record size, was about the most efficient. I say regardless, because if you start getting into very small records (<80 bytes) then you can get small efficiencies by reading in 1000+ records - but not enough to justify testing it.
Back to top
View user's profile Send private message
dick scherrer

Moderator Emeritus


Joined: 23 Nov 2006
Posts: 19243
Location: Inside the Matrix

PostPosted: Thu Nov 10, 2011 8:36 pm
Reply with quote

Hi Jonathan,

Good to "see" you here icon_smile.gif

Quote:
I guess it all depends on how big 'huge' is.
Much of what i deal with is tens to hundreds of millions of records. . . Rarely is rexx considered. . .

Quote:
When you count the number of compiles, and coding time,
I've kept a library of dozens of little file manipulation programs. Cloning the right model and adding a few lines of code usually works on the first clean compile (sometimes have a typo or 2 to fix).

With the increased power of the sort products, these are even better for performance. Unfortunately, some organizations (i've been a migrant data worker for 30 years) do not permit use of the new functions icon_sad.gif
Back to top
View user's profile Send private message
don.leahy

Active Member


Joined: 06 Jul 2010
Posts: 767
Location: Whitby, ON, Canada

PostPosted: Thu Nov 10, 2011 9:29 pm
Reply with quote

My own rule of thumb is that if you cannot comfortably use EXECIO * then you should probably consider using something other than Rexx.

As others have noted, you can work around that, but EXECIO n does not scale up very well. Even when you find an optimum value of n, the performance won't be very impressive compared to other approaches.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> CLIST & REXX

 


Similar Topics
Topic Forum Replies
No new posts 3 File Match Method - Useful across a... COBOL Programming 2
No new posts Run rexx with JCL Job CLIST & REXX 1
No new posts Run rexx in batch job CLIST & REXX 7
No new posts Does anyone know rexx for VSE CLIST & REXX 3
No new posts Unable to interpret a hex value to De... COBOL Programming 7
Search our Forums:

Back to Top