IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Spliting a file into different files using SORT


IBM Mainframe Forums -> JCL & VSAM
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
abin

Active User


Joined: 14 Aug 2006
Posts: 198

PostPosted: Mon Mar 12, 2007 5:31 pm
Reply with quote

Hi All,

Please provide some light this problem.

I have a file of length 5000.

Its layout is of format
01 ws-layout.
05 ws-key1 pic x(20)
05 FILLER pic x(20)
05 ws-key2 pic x(10)
05 FILLEr pic x(4950).

The file contains more than some 35 million records.

I have to spilt this file into 10 different files. But none of this file should have the key values overlapped.

ie if file1 have a record with key1 nnone of other files have a record with the same key.

Please let me know if there needs some more clarrification.

Thanks,
Abin
Back to top
View user's profile Send private message
IQofaGerbil

Active User


Joined: 05 May 2006
Posts: 183
Location: Scotland

PostPosted: Mon Mar 12, 2007 7:07 pm
Reply with quote

More clarification? Yes please!

Examples of inputs and expected outputs would certainly help.
Back to top
View user's profile Send private message
abin

Active User


Joined: 14 Aug 2006
Posts: 198

PostPosted: Tue Mar 13, 2007 10:04 am
Reply with quote

Hi,

Thanks for replying.

Input wuld look like
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
3333333333333333333SOMEDATA6 3333333333
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999

PS: Input file is in sorted order

I want to split this file into two part
Now when I split first file shuld contain
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
3333333333333333333SOMEDATA6 3333333333
and second file shuld contain
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999

PS: This is only a sample data. Actuall data contains more than 35 million records and I want to split into more than 10 different files all containing neraly equal number of records.

Thanks,
Abin
Back to top
View user's profile Send private message
santhunaveen

New User


Joined: 22 Sep 2006
Posts: 33

PostPosted: Tue Mar 13, 2007 3:44 pm
Reply with quote

Hi,

We can split the files using OUTFIL option. Check below example.

Code:

//***********************************************************
//SPLITFLS EXEC PGM=SORT
//***********************************************************
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//SORTIN DD DSN=FILE1......,DISP=SHR
//SORTOF01 DD DSN=OUTPUTFILE1.......,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
//SORTOF02 DD DSN=OUTPUTFILE2..............,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
//SORTOF03 DD DSN=OUTPUTFILE3..............,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
//SORTOF04 DD DSN=OUTPUTFILE3..............,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(1,1),RLSE),
// RECFM=FB,LRECL=20
????????????????// u can give as many files as u want
????????????????
????????????????
//SYSIN DD *
SORT FIELDS=COPY
OUTFIL FILES=01,ENDREC=200 // for file 1 how many records u want to split
OUTFIL FILES=02,STARTREC=201,ENDREC=400 // for file 2
OUTFIL FILES=03,STARTREC=401,ENDREC=700 // for file 3
OUTFIL FILES=03,STARTREC=401,ENDREC=700 // for file 4
...........................// for all files
..............................
................................
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//*


Please correct me if i'm wrong.......
Back to top
View user's profile Send private message
IQofaGerbil

Active User


Joined: 05 May 2006
Posts: 183
Location: Scotland

PostPosted: Tue Mar 13, 2007 4:31 pm
Reply with quote

The actual requirement is to split the main file into several files WITHOUT the keys wrapping over.

Simply using an arbiitrary number will not satisfy this requirement.
Back to top
View user's profile Send private message
William Thompson

Global Moderator


Joined: 18 Nov 2006
Posts: 3156
Location: Tucson AZ

PostPosted: Tue Mar 13, 2007 5:18 pm
Reply with quote

abin wrote:
The file contains more than some 35 million records.
I have to spilt this file into 10 different files. But none of this file should have the key values overlapped.
Are the records already sorted by the key?
If you want all 10 files to be nearly the same size, do you know how many are in each keyrange?
Back to top
View user's profile Send private message
abin

Active User


Joined: 14 Aug 2006
Posts: 198

PostPosted: Tue Mar 13, 2007 5:59 pm
Reply with quote

Hi Will,

Yes the records are sorted by the key.

If you want all 10 files to be nearly the same size, do you know how many are in each keyrange?

This I am afraid is not predictable. But we could divide the entire number by number of files needed. Say suppose Input contains 3000000 records
and I want 10 files. Then each file should contain nearly = 300000 records
Back to top
View user's profile Send private message
William Thompson

Global Moderator


Joined: 18 Nov 2006
Posts: 3156
Location: Tucson AZ

PostPosted: Tue Mar 13, 2007 6:08 pm
Reply with quote

OK, every 300,000 writes you want to wait for a key break and close that file and open a new one, right?
Easily done with a programming language but I don't know if it can be done with sort.
Is sort a requirement or do you have other resources that might be used?
Back to top
View user's profile Send private message
abin

Active User


Joined: 14 Aug 2006
Posts: 198

PostPosted: Tue Mar 13, 2007 6:25 pm
Reply with quote

Hi will,

every 300,000 writes you want to wait for a key break and close that file and open a new one, right?

you are partly right. I dont want to wait for key break, since this is going to be a batch job.

You are right we could to do it well using a programming language. But if could it with SORT it would be great. icon_smile.gif
Back to top
View user's profile Send private message
sril.krishy

Active User


Joined: 30 Jul 2005
Posts: 183
Location: hyderabad

PostPosted: Tue Mar 13, 2007 6:36 pm
Reply with quote

Hi,
I think you can create the dynamic control cards to devide the file into 10 files.

Anyway let's wait for Frank's responce.He might throw good idea.

Thank you
Krishy
Back to top
View user's profile Send private message
santhunaveen

New User


Joined: 22 Sep 2006
Posts: 33

PostPosted: Tue Mar 13, 2007 7:44 pm
Reply with quote

Hi IQofaGerbil,

"The actual requirement is to split the main file into several files WITHOUT the keys wrapping over. "

If the file is already sorted means....then where is the question of keys wrapping over......................
Back to top
View user's profile Send private message
IQofaGerbil

Active User


Joined: 05 May 2006
Posts: 183
Location: Scotland

PostPosted: Tue Mar 13, 2007 9:20 pm
Reply with quote

Hi back santhunaveen

Well, from the expected output

Quote:

I want to split this file into two part
Now when I split first file shuld contain
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
3333333333333333333SOMEDATA6 3333333333
and second file shuld contain
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999



what I got from that was that the record should be split after one key was completed after the 3's and before the 4's
so what Abin does not want is


//SYSIN DD *
SORT FIELDS=COPY
OUTFIL FILES=01,ENDREC=5 // for file 1 how many records u want to split
OUTFIL FILES=02,STARTREC=6,ENDREC=5 // for file 2

first file
1111111111111111111SOMEDATA1 1111111111
1111111111111111111SOMEDATA2 1111111111
2222222222222222222SOMEDATA3 2222222222
3333333333333333333SOMEDATA4 3333333333
3333333333333333333SOMEDATA5 3333333333
second file
3333333333333333333SOMEDATA6 3333333333
4444444444444444444SOMEDATA7 4444444444
5555555555555555555SOMEDATA8 5555555555
6666666666666666666SOMEDATA9 6666666666
6666666666666666666SOMEDATA10 6666666666
9999999999999999999SOMEDATA11 9999999999


see where the key (3's) has been split?

Maybe I am wrong but that solution will only work if you know where the keys end/start hence William's line of questioning.
Back to top
View user's profile Send private message
abin

Active User


Joined: 14 Aug 2006
Posts: 198

PostPosted: Tue Mar 13, 2007 9:25 pm
Reply with quote

Hi Gerbil,

what you said is correct.
Back to top
View user's profile Send private message
Frank Yaeger

DFSORT Developer


Joined: 15 Feb 2005
Posts: 7129
Location: San Jose, CA

PostPosted: Tue Mar 13, 2007 10:59 pm
Reply with quote

This is very tricky and takes one merge pass and several copy passes, but here's a DFSORT/ICETOOL job that I believe will do what you want:

Code:

//S1    EXEC  PGM=ICEMAN
//SYSOUT   DD  SYSOUT=*
//SORTIN01 DD DSN=...  input file (FB/5000)
//T1 DD DSN=&&T1,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//T2 DD DSN=&&T2,UNIT=SYSDA,SPACE=(CYL,(5,5)),DISP=(,PASS)
//SYSIN    DD    *
  OPTION EQUALS
  INREC OVERLAY=(5009:1,19,30,10)
  MERGE FIELDS=(5009,29,CH,A)
  OUTFIL FNAMES=T1,OVERLAY=(5001:SEQNUM,8,ZD)
  OUTFIL FNAMES=T2,NODETAIL,REMOVECC,
    SECTIONS=(5009,29,
      TRAILER3=(SUBCOUNT=(M11,LENGTH=8)))
//S2    EXEC  PGM=ICETOOL
//TOOLMSG   DD  SYSOUT=*
//DFSMSG    DD  SYSOUT=*
//SYMNAMES DD *
DIV1,+300000
DIV2,+600000
DIV3,+900000
DIV4,+1200000
DIV5,+1500000
DIV6,+1800000
DIV7,+2100000
DIV8,+2400000
DIV9,+2700000
//T2 DD DSN=&&T2,DISP=(OLD,PASS)
//SPL1 DD DSN=&&S1,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL2 DD DSN=&&S2,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL3 DD DSN=&&S3,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL4 DD DSN=&&S4,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL5 DD DSN=&&S5,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL6 DD DSN=&&S6,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL7 DD DSN=&&S7,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL8 DD DSN=&&S8,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//SPL9 DD DSN=&&S9,UNIT=SYSDA,SPACE=(TRK,(1,1)),DISP=(,PASS)
//TOOLIN   DD    *
COPY FROM(T2) TO(SPL1) USING(CTL1)
COPY FROM(T2) TO(SPL2) USING(CTL2)
COPY FROM(T2) TO(SPL3) USING(CTL3)
COPY FROM(T2) TO(SPL4) USING(CTL4)
COPY FROM(T2) TO(SPL5) USING(CTL5)
COPY FROM(T2) TO(SPL6) USING(CTL6)
COPY FROM(T2) TO(SPL7) USING(CTL7)
COPY FROM(T2) TO(SPL8) USING(CTL8)
COPY FROM(T2) TO(SPL9) USING(CTL9)
//CTL1CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,DIV1)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL1,+',1,8,80:X)
//CTL2CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,DIV2)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL2,+',1,8,80:X)
//CTL3CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,DIV3)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL3,+',1,8,80:X)
//CTL4CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,DIV4)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL4,+',1,8,80:X)
//CTL5CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,DIV5)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL5,+',1,8,80:X)
//CTL6CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,DIV6)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL6,+',1,8,80:X)
//CTL7CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,DIV7)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL7,+',1,8,80:X)
//CTL8CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,DIV8)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL8,+',1,8,80:X)
//CTL9CNTL DD *
  INCLUDE COND=(1,8,ZD,GE,DIV9)
  OPTION STOPAFT=1
  OUTREC BUILD=(C'SPL9,+',1,8,80:X)
//S3  EXEC  PGM=ICEMAN
//SYSOUT    DD  SYSOUT=*
//SYMNAMES DD DSN=&&S1,DISP=(OLD,PASS)
//         DD DSN=&&S2,DISP=(OLD,PASS)
//         DD DSN=&&S3,DISP=(OLD,PASS)
//         DD DSN=&&S4,DISP=(OLD,PASS)
//         DD DSN=&&S5,DISP=(OLD,PASS)
//         DD DSN=&&S6,DISP=(OLD,PASS)
//         DD DSN=&&S7,DISP=(OLD,PASS)
//         DD DSN=&&S8,DISP=(OLD,PASS)
//         DD DSN=&&S9,DISP=(OLD,PASS)
//SORTIN DD DSN=&&T1,DISP=(OLD,PASS)
//OUT1 DD DSN=... output file1 (FB/5000)
//OUT2 DD DSN=... output file2 (FB/5000)
//OUT3 DD DSN=... output file3 (FB/5000)
//OUT4 DD DSN=... output file4 (FB/5000)
//OUT5 DD DSN=... output file5 (FB/5000)
//OUT6 DD DSN=... output file6 (FB/5000)
//OUT7 DD DSN=... output file7 (FB/5000)
//OUT8 DD DSN=... output file8 (FB/5000)
//OUT9 DD DSN=... output file9 (FB/5000)
//OUT10 DD DSN=... output file10 (FB/5000)
//SYSIN    DD    *
  OPTION COPY
  OUTFIL FNAMES=OUT1,
    INCLUDE=(5001,8,ZD,LE,SPL1),
    BUILD=(1,5000)
  OUTFIL FNAMES=OUT2,
    INCLUDE=(5001,8,ZD,GT,SPL1,AND,5001,8,ZD,LE,SPL2),
    BUILD=(1,5000)
  OUTFIL FNAMES=OUT3,
    INCLUDE=(5001,8,ZD,GT,SPL2,AND,5001,8,ZD,LE,SPL3),
    BUILD=(1,5000)
  OUTFIL FNAMES=OUT4,
    INCLUDE=(5001,8,ZD,GT,SPL3,AND,5001,8,ZD,LE,SPL4),
    BUILD=(1,5000)
  OUTFIL FNAMES=OUT5,
    INCLUDE=(5001,8,ZD,GT,SPL4,AND,5001,8,ZD,LE,SPL5),
    BUILD=(1,5000)
  OUTFIL FNAMES=OUT6,
    INCLUDE=(5001,8,ZD,GT,SPL5,AND,5001,8,ZD,LE,SPL6),
    BUILD=(1,5000)
  OUTFIL FNAMES=OUT7,
    INCLUDE=(5001,8,ZD,GT,SPL6,AND,5001,8,ZD,LE,SPL7),
    BUILD=(1,5000)
  OUTFIL FNAMES=OUT8,
    INCLUDE=(5001,8,ZD,GT,SPL7,AND,5001,8,ZD,LE,SPL8),
    BUILD=(1,5000)
  OUTFIL FNAMES=OUT9,
    INCLUDE=(5001,8,ZD,GT,SPL8,AND,5001,8,ZD,LE,SPL9),
    BUILD=(1,5000)
  OUTFIL FNAMES=OUT10,SAVE,
    BUILD=(1,5000)
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> JCL & VSAM

 


Similar Topics
Topic Forum Replies
No new posts Unable to interpret a hex value to De... COBOL Programming 4
No new posts 3 files concatenated to 1 DFSORT/ICETOOL 2
No new posts JCL sort to compare dates in two file... DFSORT/ICETOOL 2
No new posts Is this possible via sort (in one pass)? SYNCSORT 4
No new posts GDG generation name to GDG Base name ... DFSORT/ICETOOL 3
Search our Forums:

Back to Top