View previous topic :: View next topic
|
Author |
Message |
Prosenjit001
New User
Joined: 02 Nov 2011 Posts: 14 Location: India
|
|
|
|
I have one huge dataset , I want to read that dataset and parse based on some condition . But I am getting an error , due to execio unable process that huge dataset , Is there any other alternative to process huge files usingREXX ? |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
if it just has to be in rexx, instead of a more appropriate utility, such as sort,
which has amazing parsing capabiliites,
just read on record at a time,
parse,
then write the record.
that way you don't have an input or output stem overflow/size problem |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
REXX was not designed for this amount of I-O.
You should follow Dick's suggestion, use a REXX alternative and abandon this folly....
Mr. Bill |
|
Back to top |
|
|
Prosenjit001
New User
Joined: 02 Nov 2011 Posts: 14 Location: India
|
|
|
|
Actually my problem is , file layout is not properly defined , fileds in the file are separated my '~' , if some fields are missing then then there is blank
File looks like -
aaaa~bbb~cc~dddd
a~~ cccc~dd
~aa~cc~ddddd |
|
Back to top |
|
|
dbzTHEdinosauer
Global Moderator
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
|
|
|
|
actually, the problem is your limited skill set.
suggest you spend a few moments looking at the sort manual.
As I said, sort parsing capability is amazing
and the type of parsing that you have indicated, is not hard. |
|
Back to top |
|
|
Marso
REXX Moderator
Joined: 13 Mar 2006 Posts: 1353 Location: Israel
|
|
|
|
Prosenjit001 wrote: |
File looks like -
aaaa~bbb~cc~dddd
a~~ cccc~dd
~aa~cc~ddddd |
If your short sample is representative, then the layout is properly defined: there are 4 values, each separated by a tilde
As already advised, use SORT.
Read the "Deconstruct and reconstruct CSV records" chapter in this document: Smart DFSORT Tricks
Use this as a base for your purpose.
If you have SYNCSORT, try anyway, there are many similarities between the 2 products. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
If you are not able to do what you want with your sort product, this would be a very simple bit of cobol code. . .
Read
Unstring
Process
That's all. . . |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10889 Location: italy
|
|
|
|
it would be nice if the TS could explain better the requirement
ok for the input ( pretty easy to understand )
but... what about the output expected ? |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10889 Location: italy
|
|
|
|
the suggestion not to use EXECIO to process huge files belong to the same category of the suggestion about not using huge stems.
Quote: |
due to execio unable process that huge dataset |
not an EXECIO issue, rather your lack of skills.
EXECIO will process any dataset whatever it' s size ( if You know how to do it)
the issue about <segmenting> IO operations with EXECIO has been discussed quite a few times
the suggestion is not about capability, but about performance |
|
Back to top |
|
|
Ed Goodman
Active Member
Joined: 08 Jun 2011 Posts: 556 Location: USA
|
|
|
|
A slight hint: The Execio statement does NOT have to read all records at once. You can read as few as one record at a time. |
|
Back to top |
|
|
jerryte
Active User
Joined: 29 Oct 2010 Posts: 203 Location: Toronto, ON, Canada
|
|
|
|
Prosenjit,
If you do "EXECIO * DISKR" then it copies the entire dataset into memory. The larger the file the more memory is needed. Thus a very large file will cause an abend.
I would suggest to do something like "EXECIO 1000 DISKR" which will read 1000 records at a time. Then check for RC = 2 which means end of file. Code the logic to process the 1000 (or less when EOF) and then read the next 1000. Use a stem variable to make it easy.
NOTE: you could read one record at a time but this would take a long time to execute given that your file is large. Do 1000 or more at a time.
Hope this helps. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hello,
There is no good reason to process a "huge" file using rexx - no matter how the file is read (little at a time or all at once). . . Indeed, if the file is huge and there is considerable "work" to do with each record, the cpu requirement will be most unattractive (possibly unacceptable by the management).
If the sole purpose of your process is to reformat the records, you can do this easily with your sort product or a simple COBOL program that reads a record, UNSTRINGs the record based on the tilde delimiter (~), and writes a new file with the reformatted data. |
|
Back to top |
|
|
JPVRoff
New User
Joined: 06 Oct 2009 Posts: 45 Location: Melbourne, Australia
|
|
|
|
dick scherrer wrote: |
There is no good reason to process a "huge" file using rexx - no matter how the file is read (little at a time or all at once). . . |
Hi Dick,
I guess it all depends on how big 'huge' is. Rexx & EXECIO are very useful tools to write a quick 'n' dirty fix, as it generally doesn't need much in the way of coding and testing.
When you count the number of compiles, and coding time, for a single file manipulation, sometimes Rexx can come out ahead. Depends on the size of the file, I guess.
I have a test thing set-up for when it comes to reading in files just to check usage. It's only a small (3000 records) test, but it gives some indication of CPU, etc, use.
For a 3,000 record, 27,000 byte file (averaged over a few runs - probably +/- 2%):
Read 1 at a time - 5292 SRV - 0.063 CPU seconds
Read 10 at a time - 4105 SRV - 0.046 CPU seconds
Read 100 at a time - 4049 SRV - 0.041 CPU seconds
Read 1000 at a time - 4681 SRV - 0.056 CPU seconds
Read all at once - 4812 SRV - 0.057 CPU seconds
By experimentation, I found that 50-200 records at a time, regardless of the record size, was about the most efficient. I say regardless, because if you start getting into very small records (<80 bytes) then you can get small efficiencies by reading in 1000+ records - but not enough to justify testing it. |
|
Back to top |
|
|
dick scherrer
Moderator Emeritus
Joined: 23 Nov 2006 Posts: 19243 Location: Inside the Matrix
|
|
|
|
Hi Jonathan,
Good to "see" you here
Quote: |
I guess it all depends on how big 'huge' is. |
Much of what i deal with is tens to hundreds of millions of records. . . Rarely is rexx considered. . .
Quote: |
When you count the number of compiles, and coding time, |
I've kept a library of dozens of little file manipulation programs. Cloning the right model and adding a few lines of code usually works on the first clean compile (sometimes have a typo or 2 to fix).
With the increased power of the sort products, these are even better for performance. Unfortunately, some organizations (i've been a migrant data worker for 30 years) do not permit use of the new functions |
|
Back to top |
|
|
don.leahy
Active Member
Joined: 06 Jul 2010 Posts: 767 Location: Whitby, ON, Canada
|
|
|
|
My own rule of thumb is that if you cannot comfortably use EXECIO * then you should probably consider using something other than Rexx.
As others have noted, you can work around that, but EXECIO n does not scale up very well. Even when you find an optimum value of n, the performance won't be very impressive compared to other approaches. |
|
Back to top |
|
|
|