Splitting 1 dataset into multiple datasets

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

Hi. Is there a way to split a large file into smaller files using syncsort/ICETOOL? I'd like to be able to just set a certain amount of records and have it create as many datasets as it needs based on that number I set.

Thanks

Bill Woodger · Posted: Wed Apr 16, 2014 9:18 pm

SyncSort topics live in the JCL forum.

Have you looked at the various SPLIT* options of OUTFIL?

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

Sorry of for the wrong forum. I looked a bit at OUTFIL but that seems to be done with include cond. I just want to say after x amount or records create a file, then after that next set of x records create file 2, etc. There may be more OUTFIL can do that I'm not familar with so I'll keep looking into that option.

Thanks

Arun Raj · Posted: Wed Apr 16, 2014 9:38 pm

If the number of output datasets are going to be unknown, you may need to build the job dynamically based on the input record count. But this involves multiple passes of data.

REXX could be a better option to read 'x' records from input, allocate a new output dataset and write into it, and repeat until end-of-input

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

Thanks, I'll check into REXX. Sounds like it's what I want to have happen.

nevilh · Active User Joined: 01 Sep 2006 Posts: 262

Rather than write your own Rexx why not try using IDCAMS REPRO and use the skip and count parameters.

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

Not familar with REPRO but tried the following from an exmaple I found online but did not work unfortunately.

Arun Raj · Posted: Wed Apr 16, 2014 11:07 pm

nevilh,

I am afraid IDCAMS REPRO will NOT fit for the OPs requirement.

The OP has a large input file to be split into a number of smaller output files, with a fixed 'x' number of records in each output file.

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

I was able to get what I needed with the following code for example but it means knowing how many files I would want to create based on knowing how many total records I have on the original file.

Arun Raj · Posted: Wed Apr 16, 2014 11:25 pm

Jay Villaverde,

I think SPLIT1R parameter is a better alternative for your above example.

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

Thanks, will try SPLIT1R which is more efficient.

Arun Raj · Posted: Wed Apr 16, 2014 11:40 pm

But even SPLIT1R may not help if the input record count keeps changing every time.

This older topic HERE might be of some interest to you. But I'm sure it can be improved with the newer functions available in sort products these days.

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

True, I will still to have a general idea of total records going in, but it's a start. Will check out that link.

Thanks

Bill Woodger · Posted: Thu Apr 17, 2014 12:13 am

Some record counts would be good. Why do you want to split? Does it matter where the split is, or is there no relationship between records?

Has the file you want to split already been through a SORT (or something else) which can be amended to simply add a file containing the number of records?

Are you able to use the INTRDR for this task (so that JCL and control cards can be generated and submitted to run by a JOB)?

If nothing else, you can have more DD statements than needed, and clean up the unused datasets in a step afterwards.

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

Actually, working with a co-worker on this we did end up creating more DD statements than needed and just clean them up afterwards.

This all came about because the requestor had a 4 million record mainframe file they wanted split up in order to load easier to their SQL Server. So we split it up into 1mil chunks creating 5 datasets with the last one having a handful or records and then getting rid of the unused datasets in another step as you mentioned.

This should serve our purposes for the amount of times we do this which isn't often but good to have a way of doing it.

Bill Woodger · Posted: Thu Apr 17, 2014 12:42 am

Good work. Thanks for letting us know.

Can you post the code you cam up with? It may be useful for other people in the future.

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

Sure. We're going to expand it out to 10 datasets and make a generic template out of it for our group but this is what we came up with. Feel free to make it better

Thanks everyone for their input.

Bill Woodger · Posted: Thu Apr 17, 2014 3:19 am

Thanks. The only thing I'd suggest is removing the DCB from the output DD statements. SORT will generate the correct DCB info.

If you needed some data manipulation which changed the record-lengths from the input, you'd not need to change the JCL to get the correct output. Then have two places to maintain it.

Doesn't matter as it stands, since the records are not changed, but just so it doesn't get copied like it is.

I assumed the number on the SPLIT1R is either testing or got chopped in the paste...

Jay Villaverde · New User Joined: 08 Mar 2014 Posts: 27 Location: USA

Thanks for the tip. Yeah, we were just testing SPLIT1R with a smaller file.