IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

DFSORT - splitting a large file by groups


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Bruce Malcolm

New User


Joined: 13 Dec 2012
Posts: 3
Location: UK

PostPosted: Wed Dec 19, 2012 4:53 pm
Reply with quote

Hi,

My requirement is to split some large files (50 million records, lrecl 250) into smaller files of approximately 5 million records. No records are to be discarded.
I've looked at using one of the SPLIT options in DFSORT, but these seem to work based on relative record.
My data consists of groups of records that are related (the grouping could consist of a few records or many hundreds) and the data is in group order. My smaller output files need to keep the groups intact - I can't split the data for a given group across two files.
Is there an DFSORT option that could be used for this?
(I know a simple COBOL program could be written for this purpose, but want to explore the DFSORT route first).

thanks, Bruce.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Dec 19, 2012 5:02 pm
Reply with quote

Yes, it can be done, with code. There is an example here, if I can find it.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Dec 19, 2012 5:24 pm
Reply with quote

Have a look at this one. It is a somewhat lengthy thread (which shows the benefits of good answers). If this is not a reasonable fit for your requirement, let us know.
Back to top
View user's profile Send private message
Bruce Malcolm

New User


Joined: 13 Dec 2012
Posts: 3
Location: UK

PostPosted: Wed Dec 19, 2012 8:01 pm
Reply with quote

Thanks for that reply Bill.

I think this solution would require prior knowledge of the key values? - so they could be added to the DFSORT code?

I don't have that, I just want to split the file after I've reached 5,000,000 records, and I've just reached the end of all the data for one key (group). Which might be an additional 100 or so records but that isn't an issue - keeping the data together for one group is important.

thanks, Bruce.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Dec 19, 2012 8:08 pm
Reply with quote

The "key values" are whatever you are using to define the group.

If you don't have prior knowledge of what defines a group, then it is going to be tricky doing anything other than a simple split, so I'm confused.

Can you post some sample data demonstrating the "grouping" required?
Back to top
View user's profile Send private message
Bruce Malcolm

New User


Joined: 13 Dec 2012
Posts: 3
Location: UK

PostPosted: Wed Dec 19, 2012 8:23 pm
Reply with quote

The key is the first 8 bytes of the record (the record is a fixed length 224 bytes).
They key is numeric, so could range from 00000001 to 99999999.
The file will already be sorted in an ascending key order - but they might not be straightforward increments of 1.
e.g. the first 112 records might have a key of 00000006, the next 44 records might have a key of 00000008, the next 97 records might have a key of 00000011 etc.

thanks, Bruce.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Wed Dec 19, 2012 9:40 pm
Reply with quote

So, have a look at this.

Allocates a sequence number, incremented by two (to turn 5m into 10m).

Uses 1,8 to define the GROUP and "PUSHES" the first four of the sequence number of the group-definer to all records of the GROUP. The first four digits of the number represent how many 5-millions (10-millions) there are.

You'll need at least five OUTFILs, deciding whether the "overflow" is to be in the fifth file or you want a seperate, sixth one, for such circumstances.

"Tested" with very small numbers of records (shift the four digits being checked to the right for different volumes).

Code:
//BIGSPLT  EXEC PGM=SORT
//SYSOUT   DD SYSOUT=*
//SORTOF01 DD SYSOUT=*
//SORTOF02 DD SYSOUT=*
//SORTOFOV DD SYSOUT=*
//SORTOUT DD SYSOUT=*
//SYSIN DD *
  OPTION COPY
  INREC IFTHEN=(WHEN=INIT,
                 OVERLAY=(225:SEQNUM,10,ZD,
                             START=0,
                             INCR=2)),
        IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,8),PUSH=(225:225,4))
  OUTFIL FILES=01,INCLUDE=(225,4,ZD,EQ,00),
             BUILD=(1,224)
  OUTFIL FILES=02,INCLUDE=(225,4,ZD,EQ,01),
             BUILD=(1,224)
  OUTFIL FILES=OV,SAVE,
             BUILD=(1,224)
//SORTIN DD *
Back to top
View user's profile Send private message
Skolusu

Senior Member


Joined: 07 Dec 2007
Posts: 2205
Location: San Jose

PostPosted: Wed Dec 19, 2012 11:20 pm
Reply with quote

Bruce Malcolm wrote:
Hi,

My requirement is to split some large files (50 million records, lrecl 250) into smaller files of approximately 5 million records. No records are to be discarded.
I've looked at using one of the SPLIT options in DFSORT, but these seem to work based on relative record.
My data consists of groups of records that are related (the grouping could consist of a few records or many hundreds) and the data is in group order. My smaller output files need to keep the groups intact - I can't split the data for a given group across two files.
Is there an DFSORT option that could be used for this?
(I know a simple COBOL program could be written for this purpose, but want to explore the DFSORT route first).

thanks, Bruce.


Your input is 50 million and you want to have smaller files with 5 million in each file, so that would make 10 files each with 5 million records. If you have more records we account that into another file.

Use the following DFSORT JCL which will give you the desired results.

Code:

//STEP0100 EXEC PGM=SORT                                         
//SYSOUT   DD SYSOUT=*                                           
//SORTIN   DD DISP=SHRDSN=Your Input FB 250 Byte file
//OUT01    DD SYSOUT=*                                           
//OUT02    DD SYSOUT=*                                           
//OUT03    DD SYSOUT=*                                           
//OUT04    DD SYSOUT=*                                           
//OUT05    DD SYSOUT=*                                           
//OUT06    DD SYSOUT=*                                           
//OUT07    DD SYSOUT=*                                           
//OUT08    DD SYSOUT=*                                           
//OUT09    DD SYSOUT=*                                           
//OUT10    DD SYSOUT=*                                           
//LEFTOVER DD SYSOUT=*                                           
//SYSIN    DD *                                                   
  OPTION COPY                                                     
  INREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(1,8),PUSH=(255:SEQ=8)),     
        IFTHEN=(WHEN=GROUP,RECORDS=5000000,PUSH=(251:ID=3)),     
        IFTHEN=(WHEN=GROUP,BEGIN=(255,8,ZD,EQ,1),PUSH=(251:251,3))
                                                                 
  OUTFIL FNAMES=OUT01,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,01)     
  OUTFIL FNAMES=OUT02,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,02)     
  OUTFIL FNAMES=OUT03,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,03)     
  OUTFIL FNAMES=OUT04,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,04)     
  OUTFIL FNAMES=OUT05,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,05)     
  OUTFIL FNAMES=OUT06,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,06)     
  OUTFIL FNAMES=OUT07,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,07)     
  OUTFIL FNAMES=OUT08,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,08)     
  OUTFIL FNAMES=OUT09,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,09)     
  OUTFIL FNAMES=OUT10,BUILD=(1,250),INCLUDE=(251,3,ZD,EQ,10)     
  OUTFIL FNAMES=LEFTOVER,SAVE                                     
//*
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts Compare 2 files and retrive records f... DFSORT/ICETOOL 3
No new posts FTP VB File from Mainframe retaining ... JCL & VSAM 8
No new posts Extract the file name from another fi... DFSORT/ICETOOL 6
No new posts How to split large record length file... DFSORT/ICETOOL 10
No new posts Extracting Variable decimal numbers f... DFSORT/ICETOOL 17
Search our Forums:

Back to Top