View previous topic :: View next topic
|
Author |
Message |
abin
Active User
.jpg)
Joined: 14 Aug 2006 Posts: 198
|
|
|
|
Hi All,
Please provide some light this problem.
I have a file of length 5000.
Its layout is of format
01 ws-layout.
05 ws-key1 pic x(20)
05 FILLER pic x(20)
05 ws-key2 pic x(10)
05 FILLEr pic x(4950).
The file contains more than some 35 million records.
I have to spilt this file into 10 different files. But none of this file should have the key values overlapped.
ie if file1 have a record with key1 nnone of other files have a record with the same key.
Please let me know if there needs some more clarrification.
Thanks,
Abin |
|
Back to top |
|
 |
IQofaGerbil
Active User

Joined: 05 May 2006 Posts: 183 Location: Scotland
|
|
|
|
More clarification? Yes please!
Examples of inputs and expected outputs would certainly help. |
|
Back to top |
|
 |
abin
Active User
.jpg)
Joined: 14 Aug 2006 Posts: 198
|
|
|
|
Hi,
Thanks for replying.
Input wuld look like
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
3333333333333333333SOMEDATA6 3333333333
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999
PS: Input file is in sorted order
I want to split this file into two part
Now when I split first file shuld contain
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
3333333333333333333SOMEDATA6 3333333333
and second file shuld contain
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999
PS: This is only a sample data. Actuall data contains more than 35 million records and I want to split into more than 10 different files all containing neraly equal number of records.
Thanks,
Abin |
|
Back to top |
|
 |
santhunaveen
New User
Joined: 22 Sep 2006 Posts: 33
|
|
|
|
Hi,
We can split the files using OUTFIL option. Check below example.
Code:
//***********************************************************
//SPLITFLS EXEC PGM=SORT
//***********************************************************
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SORTIN DD DSN=FILE1......,DISP=SHR
//SORTOF01 DD DSN=OUTPUTFILE1.......,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
//SORTOF02 DD DSN=OUTPUTFILE2..............,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
//SORTOF03 DD DSN=OUTPUTFILE3..............,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
//SORTOF04 DD DSN=OUTPUTFILE3..............,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
????????????????// u can give as many files as u want
????????????????
????????????????
//SYSIN DD *
SORT FIELDS=COPY
OUTFIL FILES=01,ENDREC=200 // for file 1 how many records u want to split
OUTFIL FILES=02,STARTREC=201,ENDREC=400 // for file 2
OUTFIL FILES=03,STARTREC=401,ENDREC=700 // for file 3
OUTFIL FILES=03,STARTREC=401,ENDREC=700 // for file 4
...........................// for all files
..............................
................................
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//*
Please correct me if i'm wrong....... |
|
Back to top |
|
 |
IQofaGerbil
Active User

Joined: 05 May 2006 Posts: 183 Location: Scotland
|
|
|
|
The actual requirement is to split the main file into several files WITHOUT the keys wrapping over.
Simply using an arbiitrary number will not satisfy this requirement. |
|
Back to top |
|
 |
William Thompson
Global Moderator
Joined: 18 Nov 2006 Posts: 3156 Location: Tucson AZ
|
|
|
|
abin wrote: |
The file contains more than some 35 million records.
I have to spilt this file into 10 different files. But none of this file should have the key values overlapped. |
Are the records already sorted by the key?
If you want all 10 files to be nearly the same size, do you know how many are in each keyrange? |
|
Back to top |
|
 |
abin
Active User
.jpg)
Joined: 14 Aug 2006 Posts: 198
|
|
|
|
Hi Will,
Yes the records are sorted by the key.
If you want all 10 files to be nearly the same size, do you know how many are in each keyrange?
This I am afraid is not predictable. But we could divide the entire number by number of files needed. Say suppose Input contains 3000000 records
and I want 10 files. Then each file should contain nearly = 300000 records |
|
Back to top |
|
 |
William Thompson
Global Moderator
Joined: 18 Nov 2006 Posts: 3156 Location: Tucson AZ
|
|
|
|
OK, every 300,000 writes you want to wait for a key break and close that file and open a new one, right?
Easily done with a programming language but I don't know if it can be done with sort.
Is sort a requirement or do you have other resources that might be used? |
|
Back to top |
|
 |
abin
Active User
.jpg)
Joined: 14 Aug 2006 Posts: 198
|
|
|
|
Hi will,
every 300,000 writes you want to wait for a key break and close that file and open a new one, right?
you are partly right. I dont want to wait for key break, since this is going to be a batch job.
You are right we could to do it well using a programming language. But if could it with SORT it would be great.  |
|
Back to top |
|
 |
sril.krishy
Active User
Joined: 30 Jul 2005 Posts: 183 Location: hyderabad
|
|
|
|
Hi,
I think you can create the dynamic control cards to devide the file into 10 files.
Anyway let's wait for Frank's responce.He might throw good idea.
Thank you
Krishy |
|
Back to top |
|
 |
santhunaveen
New User
Joined: 22 Sep 2006 Posts: 33
|
|
|
|
Hi IQofaGerbil,
"The actual requirement is to split the main file into several files WITHOUT the keys wrapping over. "
If the file is already sorted means....then where is the question of keys wrapping over...................... |
|
Back to top |
|
 |
IQofaGerbil
Active User

Joined: 05 May 2006 Posts: 183 Location: Scotland
|
|
|
|
Hi back santhunaveen
Well, from the expected output
Quote: |
I want to split this file into two part
Now when I split first file shuld contain
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
3333333333333333333SOMEDATA6 3333333333
and second file shuld contain
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999
|
what I got from that was that the record should be split after one key was completed after the 3's and before the 4's
so what Abin does not want is
//SYSIN DD *
SORT FIELDS=COPY
OUTFIL FILES=01,ENDREC=5 // for file 1 how many records u want to split
OUTFIL FILES=02,STARTREC=6,ENDREC=5 // for file 2
first file
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
second file
3333333333333333333SOMEDATA6 3333333333
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999
see where the key (3's) has been split?
Maybe I am wrong but that solution will only work if you know where the keys end/start hence William's line of questioning. |
|
Back to top |
|
 |
abin
Active User
.jpg)
Joined: 14 Aug 2006 Posts: 198
|
|
|
|
Hi Gerbil,
what you said is correct. |
|
Back to top |
|
 |
Frank Yaeger
DFSORT Developer

Joined: 15 Feb 2005 Posts: 7129 Location: San Jose, CA
|
|
|
|
This is very tricky and takes one merge pass and several copy passes, but here's a DFSORT/ICETOOL job that I believe will do what you want:
Code: |
//S1 EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SORTIN01 DD DSN=... input file (FB/5000)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//T2 DD DSN=&&T2,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//SYSIN DD *
OPTION EQUALS
INREC OVERLAY=(5009:1,19,30,10)
MERGE FIELDS=(5009,29,CH,A)
OUTFIL FNAMES=T1,OVERLAY=(5001:SEQNUM,8,ZD)
OUTFIL FNAMES=T2,NODETAIL,REMOVECC,
SECTIONS=(5009,29,
TRAILER3=(SUBCOUNT=(M11,LENGTH=8)))
//S2 EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//SYMNAMES DD *
DIV1,+300000
DIV2,+600000
DIV3,+900000
DIV4,+1200000
DIV5,+1500000
DIV6,+1800000
DIV7,+2100000
DIV8,+2400000
DIV9,+2700000
//T2 DD DSN=&&T2,DISP=(OLD,PASS)
//SPL1 DD DSN=&&S1,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL2 DD DSN=&&S2,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL3 DD DSN=&&S3,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL4 DD DSN=&&S4,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL5 DD DSN=&&S5,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL6 DD DSN=&&S6,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL7 DD DSN=&&S7,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL8 DD DSN=&&S8,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL9 DD DSN=&&S9,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//TOOLIN DD *
COPY FROM(T2) TO(SPL1) USING(CTL1)
COPY FROM(T2) TO(SPL2) USING(CTL2)
COPY FROM(T2) TO(SPL3) USING(CTL3)
COPY FROM(T2) TO(SPL4) USING(CTL4)
COPY FROM(T2) TO(SPL5) USING(CTL5)
COPY FROM(T2) TO(SPL6) USING(CTL6)
COPY FROM(T2) TO(SPL7) USING(CTL7)
COPY FROM(T2) TO(SPL8) USING(CTL8)
COPY FROM(T2) TO(SPL9) USING(CTL9)
//CTL1CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV1)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL1,+',1,8,80:X)
//CTL2CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV2)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL2,+',1,8,80:X)
//CTL3CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV3)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL3,+',1,8,80:X)
//CTL4CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV4)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL4,+',1,8,80:X)
//CTL5CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV5)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL5,+',1,8,80:X)
//CTL6CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV6)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL6,+',1,8,80:X)
//CTL7CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV7)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL7,+',1,8,80:X)
//CTL8CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV8)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL8,+',1,8,80:X)
//CTL9CNTL DD *
INCLUDE COND=(1,8,ZD,GE,DIV9)
OPTION STOPAFT=1
OUTREC BUILD=(C'SPL9,+',1,8,80:X)
//S3 EXEC PGM=ICEMAN
//SYSOUT DD SYSOUT=*
//SYMNAMES DD DSN=&&S1,DISP=(OLD,PASS)
// DD DSN=&&S2,DISP=(OLD,PASS)
// DD DSN=&&S3,DISP=(OLD,PASS)
// DD DSN=&&S4,DISP=(OLD,PASS)
// DD DSN=&&S5,DISP=(OLD,PASS)
// DD DSN=&&S6,DISP=(OLD,PASS)
// DD DSN=&&S7,DISP=(OLD,PASS)
// DD DSN=&&S8,DISP=(OLD,PASS)
// DD DSN=&&S9,DISP=(OLD,PASS)
//SORTIN DD DSN=&&T1,DISP=(OLD,PASS)
//OUT1 DD DSN=... output file1 (FB/5000)
//OUT2 DD DSN=... output file2 (FB/5000)
//OUT3 DD DSN=... output file3 (FB/5000)
//OUT4 DD DSN=... output file4 (FB/5000)
//OUT5 DD DSN=... output file5 (FB/5000)
//OUT6 DD DSN=... output file6 (FB/5000)
//OUT7 DD DSN=... output file7 (FB/5000)
//OUT8 DD DSN=... output file8 (FB/5000)
//OUT9 DD DSN=... output file9 (FB/5000)
//OUT10 DD DSN=... output file10 (FB/5000)
//SYSIN DD *
OPTION COPY
OUTFIL FNAMES=OUT1,
INCLUDE=(5001,8,ZD,LE,SPL1),
BUILD=(1,5000)
OUTFIL FNAMES=OUT2,
INCLUDE=(5001,8,ZD,GT,SPL1,AND,5001,8,ZD,LE,SPL2),
BUILD=(1,5000)
OUTFIL FNAMES=OUT3,
INCLUDE=(5001,8,ZD,GT,SPL2,AND,5001,8,ZD,LE,SPL3),
BUILD=(1,5000)
OUTFIL FNAMES=OUT4,
INCLUDE=(5001,8,ZD,GT,SPL3,AND,5001,8,ZD,LE,SPL4),
BUILD=(1,5000)
OUTFIL FNAMES=OUT5,
INCLUDE=(5001,8,ZD,GT,SPL4,AND,5001,8,ZD,LE,SPL5),
BUILD=(1,5000)
OUTFIL FNAMES=OUT6,
INCLUDE=(5001,8,ZD,GT,SPL5,AND,5001,8,ZD,LE,SPL6),
BUILD=(1,5000)
OUTFIL FNAMES=OUT7,
INCLUDE=(5001,8,ZD,GT,SPL6,AND,5001,8,ZD,LE,SPL7),
BUILD=(1,5000)
OUTFIL FNAMES=OUT8,
INCLUDE=(5001,8,ZD,GT,SPL7,AND,5001,8,ZD,LE,SPL8),
BUILD=(1,5000)
OUTFIL FNAMES=OUT9,
INCLUDE=(5001,8,ZD,GT,SPL8,AND,5001,8,ZD,LE,SPL9),
BUILD=(1,5000)
OUTFIL FNAMES=OUT10,SAVE,
BUILD=(1,5000)
|
|
|
Back to top |
|
 |
|
|