Portal | Manuals | References | Downloads | Info | Programs | JCLs | Master the Mainframes
IBM Mainframe Computers Forums Index
 
Register
 
IBM Mainframe Computers Forums Index Mainframe: Search IBM Mainframe Forum: FAQ Memberlist Usergroups Profile Log in to check your private messages Log in
 

 

Storing huge volume of data, compare and process

 
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> All Other Mainframe Topics
View previous topic :: :: View next topic  
Author Message
Pradeep K M

New User


Joined: 13 Jan 2017
Posts: 2
Location: India

PostPosted: Mon Jan 16, 2017 5:08 pm    Post subject: Storing huge volume of data, compare and process
Reply with quote

Hi,

There are total 100 flat files (TAPE) with approx 1 million records in each file, created since 2009 every month. The record length is 200 bytes - let's call it as set1. Monthly, I'll be getting another flat file with same layout comprising of approx 20 thousand records - set2. I need to compare set2 with set1 based on 18 bytes key and then write the matched records into an output file.

Notes:

* It will be a monthly process. Set1 data changes every month in such a way that, the oldest file among the 100 will be out of scope and a new file will be added every month.
* Set 2 data is not a static data - keeps changing every month.
* There is no general criteria using which I could reduce/eliminate the volume of data from 100 flat files.
* DB2 is out of scope as this needs to be finished quickly. Working with DBAs and taking approvals, access etc takes quite a long time in our company.
* Will be used only in batch job.


The queries that I have are,

* How should I handle such a huge data in an efficient way in terms of storage, performance CPU Time etc.
* Do I create a single VSAM KSDS one time to store data from 100 flat files (total will be approx 100M after removing the duplicates) and then do the compare. After comparison write the output to a new file, remove the oldest data and update the new file to the VSAM. Also, I will get some scenarios where I need to update the existing records (in Case of VSAM).
* Or Is it better to use the combined TAPE file or concatenated tape files (100) instead of going for VSAM where we need storage in disk.
* If I use tape files, I feel the efficiency will be low compared to VSAM.
* Is there any method where I could split the data and work on it or is there any other better idea?
Back to top
View user's profile Send private message

Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 7992
Location: Bellevue, IA

PostPosted: Mon Jan 16, 2017 6:20 pm    Post subject:
Reply with quote

Quote:
* If I use tape files, I feel the efficiency will be low compared to VSAM.
This makes ABSOLUTELY no sense. To create a VSAM data set from your tape data, you will have to read all 100 tape files, sort to remove duplicates, and then define a VSAM data set and load it from the remaining data. Simply reading the 100 tape files and doing your comparisons means you are NOT performing the latter steps of this process, which -- by definition -- means you are increasing efficiency.

Write a program in the language of your choice to read the smaller data set into memory (a COBOL array, for example), and use that to drive your processing. You can load the array in key sequence. This allows you to use binary SEARCH if the tape files are not sorted by key sequence, or merely make one pass through the array for each tape if they are sorted by key sequence. Either way, even adding the time to create the program, you'll use much less time each month than you would by creating a VSAM data set.
Back to top
View user's profile Send private message
Pradeep K M

New User


Joined: 13 Jan 2017
Posts: 2
Location: India

PostPosted: Mon Jan 16, 2017 8:00 pm    Post subject: Reply to: Storing huge volume of data, compare and process
Reply with quote

To Robert Sample:

Thanks for the response. Apologies in case if I couldn't convey my message clearly. Is it ok to create the VSAM ONLY ONE TIME initially by taking the 100 TAPE files and then do the insert and rewrite to the same VSAM every month using the new TAPE file?
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 7992
Location: Bellevue, IA

PostPosted: Mon Jan 16, 2017 8:36 pm    Post subject:
Reply with quote

I don't think you've explained nearly enough for an accurate determination to be made. Your original post said that the oldest tape file's data will be dropped each month. How do you determine which records in the VSAM data set are to be dropped each month? If you have a way to determine that, then a VSAM KSDS makes sense. Otherwise, as I pointed out earlier, you'll need to rebuild the VSAM data set every month and that will DEFINITELY be less efficient than just processing the tape files directly.

There may be other reasons to build a VSAM data set from the tape data -- online processes or other batch jobs that need the data. Without knowing a lot of the specifics, it is not possible for us to say whether or not building a VSAM data set makes sense.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic    IBMMAINFRAMES.com Support Forums -> All Other Mainframe Topics All times are GMT + 6 Hours
Page 1 of 1

 

Search our Forum:

Similar Topics
Topic Author Forum Replies Posted
No new posts Export flat file data into excel sheet murali.andaluri DFSORT/ICETOOL 2 Mon Mar 20, 2017 5:39 pm
No new posts Append data from two files into a sin... Praveen04 DFSORT/ICETOOL 5 Thu Mar 16, 2017 7:29 pm
No new posts SPUFI -- Joining 3 tables – data in... Sysaron DB2 2 Wed Mar 08, 2017 4:18 am
No new posts Overriding PS data in rexx Shaheen Shaik CLIST & REXX 8 Fri Mar 03, 2017 5:08 pm
No new posts JCL to get submitted once dataset has... vinu78 JCL & VSAM 17 Thu Feb 16, 2017 7:32 am


Facebook
Back to Top
 
Mainframe Wiki | Forum Rules | Bookmarks | Subscriptions | FAQ | Tutorials | Contact Us