Hi. Is there a way to split a large file into smaller files using syncsort/ICETOOL? I'd like to be able to just set a certain amount of records and have it create as many datasets as it needs based on that number I set.
Sorry of for the wrong forum. I looked a bit at OUTFIL but that seems to be done with include cond. I just want to say after x amount or records create a file, then after that next set of x records create file 2, etc. There may be more OUTFIL can do that I'm not familar with so I'll keep looking into that option.
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
If the number of output datasets are going to be unknown, you may need to build the job dynamically based on the input record count. But this involves multiple passes of data.
REXX could be a better option to read 'x' records from input, allocate a new output dataset and write into it, and repeat until end-of-input
I was able to get what I needed with the following code for example but it means knowing how many files I would want to create based on knowing how many total records I have on the original file.
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
But even SPLIT1R may not help if the input record count keeps changing every time.
This older topic HERE might be of some interest to you. But I'm sure it can be improved with the newer functions available in sort products these days.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Some record counts would be good. Why do you want to split? Does it matter where the split is, or is there no relationship between records?
Has the file you want to split already been through a SORT (or something else) which can be amended to simply add a file containing the number of records?
Are you able to use the INTRDR for this task (so that JCL and control cards can be generated and submitted to run by a JOB)?
If nothing else, you can have more DD statements than needed, and clean up the unused datasets in a step afterwards.
Actually, working with a co-worker on this we did end up creating more DD statements than needed and just clean them up afterwards.
This all came about because the requestor had a 4 million record mainframe file they wanted split up in order to load easier to their SQL Server. So we split it up into 1mil chunks creating 5 datasets with the last one having a handful or records and then getting rid of the unused datasets in another step as you mentioned.
This should serve our purposes for the amount of times we do this which isn't often but good to have a way of doing it.
Sure. We're going to expand it out to 10 datasets and make a generic template out of it for our group but this is what we came up with. Feel free to make it better
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Thanks. The only thing I'd suggest is removing the DCB from the output DD statements. SORT will generate the correct DCB info.
If you needed some data manipulation which changed the record-lengths from the input, you'd not need to change the JCL to get the correct output. Then have two places to maintain it.
Doesn't matter as it stands, since the records are not changed, but just so it doesn't get copied like it is.
I assumed the number on the SPLIT1R is either testing or got chopped in the paste...