|
View previous topic :: View next topic
|
| Author |
Message |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
Hi. Is there a way to split a large file into smaller files using syncsort/ICETOOL? I'd like to be able to just set a certain amount of records and have it create as many datasets as it needs based on that number I set.
Thanks |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
SyncSort topics live in the JCL forum.
Have you looked at the various SPLIT* options of OUTFIL? |
|
| Back to top |
|
 |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
Sorry of for the wrong forum. I looked a bit at OUTFIL but that seems to be done with include cond. I just want to say after x amount or records create a file, then after that next set of x records create file 2, etc. There may be more OUTFIL can do that I'm not familar with so I'll keep looking into that option.
Thanks |
|
| Back to top |
|
 |
Arun Raj
Moderator
Joined: 17 Oct 2006 Posts: 2482 Location: @my desk
|
|
|
|
If the number of output datasets are going to be unknown, you may need to build the job dynamically based on the input record count. But this involves multiple passes of data.
REXX could be a better option to read 'x' records from input, allocate a new output dataset and write into it, and repeat until end-of-input |
|
| Back to top |
|
 |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
| Thanks, I'll check into REXX. Sounds like it's what I want to have happen. |
|
| Back to top |
|
 |
nevilh
Active User
Joined: 01 Sep 2006 Posts: 262
|
|
|
|
| Rather than write your own Rexx why not try using IDCAMS REPRO and use the skip and count parameters. |
|
| Back to top |
|
 |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
Not familar with REPRO but tried the following from an exmaple I found online but did not work unfortunately.
| Code: |
//COPYDATA JOB (31000,G5),'COPY',CLASS=O,MSGCLASS=2,LINES=500000,
// NOTIFY=&SYSUID TYPRUN=HOLD
//**********************************************************************
//S1SORT EXEC PGM=IDCAMS,REGION=3072K
//**********************************************************************
//SORTIN DD DISP=SHR,DSN=EDT.TST.RXD.UR078074.D036
//**********************************************************************
//SORTOUT DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.G5.UR078074.D036,
// SPACE=(CYL,(500,100),RLSE,,ROUND),UNIT=TEST,
//* LABEL=RETPD=999,
// DCB=*.SORTIN
//SYSOUT DD SYSOUT=6
//SYSIN DD *
*
REPRO -
INFILE(SORTIN) -
OUTFILE(SORTOUT)
// |
|
|
| Back to top |
|
 |
Arun Raj
Moderator
Joined: 17 Oct 2006 Posts: 2482 Location: @my desk
|
|
|
|
nevilh,
I am afraid IDCAMS REPRO will NOT fit for the OPs requirement.
The OP has a large input file to be split into a number of smaller output files, with a fixed 'x' number of records in each output file. |
|
| Back to top |
|
 |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
I was able to get what I needed with the following code for example but it means knowing how many files I would want to create based on knowing how many total records I have on the original file.
| Code: |
//SRTTESTA JOB (31000,G5),'EAS',CLASS=F,MSGCLASS=2,
// NOTIFY=&SYSUID TYPRUN=HOLD
//S1SORT EXEC PGM=SORT,REGION=3072K
//SORTIN DD DISP=SHR,DSN=TST.WAY.G500414.X47CSV
//OUT1 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G500414.X47CSV1,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//OUT2 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G500414.X47CSV2,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//OUT3 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G500414.X47CSV3,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//OUT4 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G500414.X47CSV4,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//OUT5 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G500414.X47CSV5,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//SYSOUT DD SYSOUT=6
//SYSIN DD *
OPTION COPY
OUTFIL FNAMES=OUT1,ENDREC=00000020
OUTFIL FNAMES=OUT2,STARTREC=00000021,ENDREC=00000040
OUTFIL FNAMES=OUT3,STARTREC=00000041,ENDREC=00000060
OUTFIL FNAMES=OUT4,STARTREC=00000061,ENDREC=00000080
OUTFIL FNAMES=OUT5,SAVE
|
|
|
| Back to top |
|
 |
Arun Raj
Moderator
Joined: 17 Oct 2006 Posts: 2482 Location: @my desk
|
|
|
|
Jay Villaverde,
I think SPLIT1R parameter is a better alternative for your above example.
| Code: |
OPTION COPY
OUTFIL FNAMES=(OUT1,OUT2,OUT3,OUT4,OUT5),SPLIT1R=20 |
But with changing number of input records, ie. your jcl has to be dynamic with varying number of output files. |
|
| Back to top |
|
 |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
| Thanks, will try SPLIT1R which is more efficient. |
|
| Back to top |
|
 |
Arun Raj
Moderator
Joined: 17 Oct 2006 Posts: 2482 Location: @my desk
|
|
|
|
But even SPLIT1R may not help if the input record count keeps changing every time.
This older topic HERE might be of some interest to you. But I'm sure it can be improved with the newer functions available in sort products these days. |
|
| Back to top |
|
 |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
True, I will still to have a general idea of total records going in, but it's a start. Will check out that link.
Thanks |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Some record counts would be good. Why do you want to split? Does it matter where the split is, or is there no relationship between records?
Has the file you want to split already been through a SORT (or something else) which can be amended to simply add a file containing the number of records?
Are you able to use the INTRDR for this task (so that JCL and control cards can be generated and submitted to run by a JOB)?
If nothing else, you can have more DD statements than needed, and clean up the unused datasets in a step afterwards. |
|
| Back to top |
|
 |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
Actually, working with a co-worker on this we did end up creating more DD statements than needed and just clean them up afterwards.
This all came about because the requestor had a 4 million record mainframe file they wanted split up in order to load easier to their SQL Server. So we split it up into 1mil chunks creating 5 datasets with the last one having a handful or records and then getting rid of the unused datasets in another step as you mentioned.
This should serve our purposes for the amount of times we do this which isn't often but good to have a way of doing it. |
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Good work. Thanks for letting us know.
Can you post the code you cam up with? It may be useful for other people in the future. |
|
| Back to top |
|
 |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
Sure. We're going to expand it out to 10 datasets and make a generic template out of it for our group but this is what we came up with. Feel free to make it better
Thanks everyone for their input.
| Code: |
//SPLITDSN JOB (31000,G5),'SPLIT',CLASS=F,MSGCLASS=2,
// NOTIFY=&SYSUID TYPRUN=HOLD
//S1SORT EXEC PGM=SORT,REGION=3072K
//SORTIN DD DISP=SHR,DSN=TST.WAY.G500414.X47CSV
//OUT1 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G502015.X47CSV1,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//OUT2 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G502015.X47CSV2,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//OUT3 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G502015.X47CSV3,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//OUT4 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G502015.X47CSV4,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//OUT5 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G502015.X47CSV5,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//OUT6 DD DISP=(NEW,CATLG,DELETE),
// DSN=TST.WAY.G502015.X47CSV6,
// SPACE=(CYL,(500,500),RLSE,,ROUND),UNIT=TEST,
// LABEL=RETPD=180,
// DCB=*.SORTIN
//SYSOUT DD SYSOUT=6
//SYSIN DD *
OPTION COPY
OUTFIL FNAMES=(OUT1,OUT2,OUT3,OUT4,OUT5,OUT6),SPLIT1R=20000
//*
//S2IDCAMS EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//OUT1 DD DSN=TST.WAY.G502015.X47CSV1,DISP=SHR
//OUT2 DD DSN=TST.WAY.G502015.X47CSV2,DISP=SHR
//OUT3 DD DSN=TST.WAY.G502015.X47CSV3,DISP=SHR
//OUT4 DD DSN=TST.WAY.G502015.X47CSV4,DISP=SHR
//OUT5 DD DSN=TST.WAY.G502015.X47CSV5,DISP=SHR
//OUT6 DD DSN=TST.WAY.G502015.X47CSV6,DISP=SHR
//SYSIN DD *
PRINT INFILE(OUT1) CHARACTER COUNT(1)
IF LASTCC = 4 THEN DELETE 'TST.WAY.G502015.X47CSV1' PURGE
PRINT INFILE(OUT2) CHARACTER COUNT(1)
IF LASTCC = 4 THEN DELETE 'TST.WAY.G502015.X47CSV2' PURGE
PRINT INFILE(OUT3) CHARACTER COUNT(1)
IF LASTCC = 4 THEN DELETE 'TST.WAY.G502015.X47CSV3' PURGE
PRINT INFILE(OUT4) CHARACTER COUNT(1)
IF LASTCC = 4 THEN DELETE 'TST.WAY.G502015.X47CSV4' PURGE
PRINT INFILE(OUT5) CHARACTER COUNT(1)
IF LASTCC = 4 THEN DELETE 'TST.WAY.G502015.X47CSV5' PURGE
PRINT INFILE(OUT6) CHARACTER COUNT(1)
IF LASTCC = 4 THEN DELETE 'TST.WAY.G502015.X47CSV6' PURGE
/*
|
|
|
| Back to top |
|
 |
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Thanks. The only thing I'd suggest is removing the DCB from the output DD statements. SORT will generate the correct DCB info.
If you needed some data manipulation which changed the record-lengths from the input, you'd not need to change the JCL to get the correct output. Then have two places to maintain it.
Doesn't matter as it stands, since the records are not changed, but just so it doesn't get copied like it is.
I assumed the number on the SPLIT1R is either testing or got chopped in the paste... |
|
| Back to top |
|
 |
Jay Villaverde
New User
Joined: 08 Mar 2014 Posts: 27 Location: USA
|
|
|
|
| Thanks for the tip. Yeah, we were just testing SPLIT1R with a smaller file. |
|
| Back to top |
|
 |
|
|
 |
All times are GMT + 6 Hours |
|