View previous topic :: View next topic
Author
Message
abin Active User Joined: 14 Aug 2006Posts: 198
Hi All,
Please provide some light this problem.
I have a file of length 5000.
Its layout is of format
01 ws-layout.
05 ws-key1 pic x(20)
05 FILLER pic x(20)
05 ws-key2 pic x(10)
05 FILLEr pic x(4950).
The file contains more than some 35 million records.
I have to spilt this file into 10 different files. But none of this file should have the key values overlapped.
ie if file1 have a record with key1 nnone of other files have a record with the same key.
Please let me know if there needs some more clarrification.
Thanks,
Abin
Back to top
IQofaGerbil Active User Joined: 05 May 2006Posts: 183 Location: Scotland
More clarification? Yes please!
Examples of inputs and expected outputs would certainly help.
Back to top
abin Active User Joined: 14 Aug 2006Posts: 198
Hi,
Thanks for replying.
Input wuld look like
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
3333333333333333333SOMEDATA6 3333333333
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999
PS: Input file is in sorted order
I want to split this file into two part
Now when I split first file shuld contain
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
3333333333333333333SOMEDATA6 3333333333
and second file shuld contain
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999
PS: This is only a sample data. Actuall data contains more than 35 million records and I want to split into more than 10 different files all containing neraly equal number of records.
Thanks,
Abin
Back to top
santhunaveen New User Joined: 22 Sep 2006Posts: 33
Hi,
We can split the files using OUTFIL option. Check below example.
Code:
//***********************************************************
//SPLITFLS EXEC PGM=SORT
//***********************************************************
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SORTIN DD DSN=FILE1......,DISP=SHR
//SORTOF01 DD DSN=OUTPUTFILE1.......,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
//SORTOF02 DD DSN=OUTPUTFILE2..............,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
//SORTOF03 DD DSN=OUTPUTFILE3..............,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
//SORTOF04 DD DSN=OUTPUTFILE3..............,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
????????????????// u can give as many files as u want
????????????????
????????????????
//SYSIN DD *
SORT FIELDS=COPY
OUTFIL FILES=01,ENDREC=200 // for file 1 how many records u want to split
OUTFIL FILES=02,STARTREC=201,ENDREC=400 // for file 2
OUTFIL FILES=03,STARTREC=401,ENDREC=700 // for file 3
OUTFIL FILES=03,STARTREC=401,ENDREC=700 // for file 4
...........................// for all files
..............................
................................
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//*
Please correct me if i'm wrong.......
Back to top
IQofaGerbil Active User Joined: 05 May 2006Posts: 183 Location: Scotland
The actual requirement is to split the main file into several files WITHOUT the keys wrapping over.
Simply using an arbiitrary number will not satisfy this requirement.
Back to top
William Thompson Global Moderator Joined: 18 Nov 2006Posts: 3156 Location: Tucson AZ
abin wrote:
The file contains more than some 35 million records.
I have to spilt this file into 10 different files. But none of this file should have the key values overlapped.
Are the records already sorted by the key?
If you want all 10 files to be nearly the same size, do you know how many are in each keyrange?
Back to top
abin Active User Joined: 14 Aug 2006Posts: 198
Hi Will,
Yes the records are sorted by the key.
If you want all 10 files to be nearly the same size, do you know how many are in each keyrange?
This I am afraid is not predictable. But we could divide the entire number by number of files needed. Say suppose Input contains 3000000 records
and I want 10 files. Then each file should contain nearly = 300000 records
Back to top
William Thompson Global Moderator Joined: 18 Nov 2006Posts: 3156 Location: Tucson AZ
OK, every 300,000 writes you want to wait for a key break and close that file and open a new one, right?
Easily done with a programming language but I don't know if it can be done with sort.
Is sort a requirement or do you have other resources that might be used?
Back to top
abin Active User Joined: 14 Aug 2006Posts: 198
Hi will,
every 300,000 writes you want to wait for a key break and close that file and open a new one, right?
you are partly right. I dont want to wait for key break, since this is going to be a batch job.
You are right we could to do it well using a programming language. But if could it with SORT it would be great.
Back to top
sril.krishy Active User Joined: 30 Jul 2005Posts: 183 Location: hyderabad
Hi,
I think you can create the dynamic control cards to devide the file into 10 files.
Anyway let's wait for Frank's responce.He might throw good idea.
Thank you
Krishy
Back to top
santhunaveen New User Joined: 22 Sep 2006Posts: 33
Hi IQofaGerbil,
"The actual requirement is to split the main file into several files WITHOUT the keys wrapping over. "
If the file is already sorted means....then where is the question of keys wrapping over ......................
Back to top
IQofaGerbil Active User Joined: 05 May 2006Posts: 183 Location: Scotland
Hi back santhunaveen
Well, from the expected output
Quote:
I want to split this file into two part
Now when I split first file shuld contain
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
3333333333333333333SOMEDATA6 3333333333
and second file shuld contain
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999
what I got from that was that the record should be split after one key was completed after the 3's and before the 4's
so what Abin does not want is
//SYSIN DD *
SORT FIELDS=COPY
OUTFIL FILES=01,ENDREC=5 // for file 1 how many records u want to split
OUTFIL FILES=02,STARTREC=6,ENDREC=5 // for file 2
first file
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
second file
3333333333333333333SOMEDATA6 3333333333
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999
see where the key (3's) has been split?
Maybe I am wrong but that solution will only work if you know where the keys end/start hence William's line of questioning.
Back to top
abin Active User Joined: 14 Aug 2006Posts: 198
Hi Gerbil,
what you said is correct.
Back to top
Frank Yaeger DFSORT Developer Joined: 15 Feb 2005Posts: 7129 Location: San Jose, CA
This is very tricky and takes one merge pass and several copy passes, but here's a DFSORT/ICETOOL job that I believe will do what you want:
Code:
//S1 EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SORTIN01 DD DSN=... input file (FB/5000)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//T2 DD DSN=&&T2,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//SYSIN DD *
OPTION EQUALS
INREC OVERLAY=(5009:1,19,30,10)
MERGE FIELDS=(5009,29,CH,A)
OUTFIL FNAMES=T1,OVERLAY=(5001:SEQNUM,8,ZD)
OUTFIL FNAMES=T2,NODETAIL,REMOVECC,
SECTIONS=(5009,29,
TRAILER3=(SUBCOUNT=(M11,LENGTH=8)))
//S2 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//SYMNAMES DD *
DIV1,+300000
DIV2,+600000
DIV3,+900000
DIV4,+1200000
DIV5,+1500000
DIV6,+1800000
DIV7,+2100000
DIV8,+2400000
DIV9,+2700000
//T2 DD DSN=&&T2,DISP=(OLD,PASS)
//SPL1 DD DSN=&&S1,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL2 DD DSN=&&S2,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL3 DD DSN=&&S3,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL4 DD DSN=&&S4,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL5 DD DSN=&&S5,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL6 DD DSN=&&S6,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL7 DD DSN=&&S7,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL8 DD DSN=&&S8,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL9 DD DSN=&&S9,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//TOOLIN DD *
COPY FROM(T2) TO(SPL1) USING(CTL1)
COPY FROM(T2) TO(SPL2) USING(CTL2)
COPY FROM(T2) TO(SPL3) USING(CTL3)
COPY FROM(T2) TO(SPL4) USING(CTL4)
COPY FROM(T2) TO(SPL5) USING(CTL5)
COPY FROM(T2) TO(SPL6) USING(CTL6)
COPY FROM(T2) TO(SPL7) USING(CTL7)
COPY FROM(T2) TO(SPL8) USING(CTL8)
COPY FROM(T2) TO(SPL9) USING(CTL9)
//CTL1CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV1)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL1,+',1,8,80:X)
//CTL2CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV2)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL2,+',1,8,80:X)
//CTL3CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV3)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL3,+',1,8,80:X)
//CTL4CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV4)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL4,+',1,8,80:X)
//CTL5CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV5)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL5,+',1,8,80:X)
//CTL6CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV6)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL6,+',1,8,80:X)
//CTL7CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV7)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL7,+',1,8,80:X)
//CTL8CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV8)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL8,+',1,8,80:X)
//CTL9CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV9)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL9,+',1,8,80:X)
//S3 EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SYMNAMES DD DSN=&&S1,DISP=(OLD,PASS)
// DD DSN=&&S2,DISP=(OLD,PASS)
// DD DSN=&&S3,DISP=(OLD,PASS)
// DD DSN=&&S4,DISP=(OLD,PASS)
// DD DSN=&&S5,DISP=(OLD,PASS)
// DD DSN=&&S6,DISP=(OLD,PASS)
// DD DSN=&&S7,DISP=(OLD,PASS)
// DD DSN=&&S8,DISP=(OLD,PASS)
// DD DSN=&&S9,DISP=(OLD,PASS)
//SORTIN DD DSN=&&T1,DISP=(OLD,PASS)
//OUT1 DD DSN=... output file1 (FB/5000)
//OUT2 DD DSN=... output file2 (FB/5000)
//OUT3 DD DSN=... output file3 (FB/5000)
//OUT4 DD DSN=... output file4 (FB/5000)
//OUT5 DD DSN=... output file5 (FB/5000)
//OUT6 DD DSN=... output file6 (FB/5000)
//OUT7 DD DSN=... output file7 (FB/5000)
//OUT8 DD DSN=... output file8 (FB/5000)
//OUT9 DD DSN=... output file9 (FB/5000)
//OUT10 DD DSN=... output file10 (FB/5000)
//SYSIN DD *
OPTION COPY
OUTFIL FNAMES=OUT1,
INCLUDE=(5001,8,ZD,LE,SPL1),
BUILD=(1,5000)
OUTFIL FNAMES=OUT2,
INCLUDE=(5001,8,ZD,GT,SPL1,AND,5001,8,ZD,LE,SPL2),
BUILD=(1,5000)
OUTFIL FNAMES=OUT3,
INCLUDE=(5001,8,ZD,GT,SPL2,AND,5001,8,ZD,LE,SPL3),
BUILD=(1,5000)
OUTFIL FNAMES=OUT4,
INCLUDE=(5001,8,ZD,GT,SPL3,AND,5001,8,ZD,LE,SPL4),
BUILD=(1,5000)
OUTFIL FNAMES=OUT5,
INCLUDE=(5001,8,ZD,GT,SPL4,AND,5001,8,ZD,LE,SPL5),
BUILD=(1,5000)
OUTFIL FNAMES=OUT6,
INCLUDE=(5001,8,ZD,GT,SPL5,AND,5001,8,ZD,LE,SPL6),
BUILD=(1,5000)
OUTFIL FNAMES=OUT7,
INCLUDE=(5001,8,ZD,GT,SPL6,AND,5001,8,ZD,LE,SPL7),
BUILD=(1,5000)
OUTFIL FNAMES=OUT8,
INCLUDE=(5001,8,ZD,GT,SPL7,AND,5001,8,ZD,LE,SPL8),
BUILD=(1,5000)
OUTFIL FNAMES=OUT9,
INCLUDE=(5001,8,ZD,GT,SPL8,AND,5001,8,ZD,LE,SPL9),
BUILD=(1,5000)
OUTFIL FNAMES=OUT10,SAVE,
BUILD=(1,5000)
Back to top
Please enable JavaScript!