I am trying to write a sort in which I have to split the file from particular record identifier (IVR000000) and then count the records(i.e. IVR000000 not the number of rows) and put the count at last in trailer record (e.g. )
My File looks like below:
IVR000000~2036~9567183~E~Mrs.~Syt Est~~C (RECORD STARTING)
PTX000000~47790432~N~MTF~DSC~7291202~CI Har (RECORD END) IVR000000~6248~9762202~E~~Jltes~~Cmtes (NEW RECORD STARTING)
IVR999999~000000013 (TRAILER RECORD)
The record starts with the 'IVR000000' so when the split happens upto the RECORD END the rows should be in that file only and the new record must start from 'IVR000000' only.
I am not sure if it can be done from SORT or ICETOOL or some other utility?
Joined: 10 May 2007 Posts: 2319 Location: Hampshire, UK
Still not clear - your first example shows a trailer saying 13 whereas there are only 2 IVR000000 records present. Your second example shows only one set being extracted whereas it is implied ALL IVR000000 sets are to be selected. If you want only one set then what is the criteria for distinguishing that set from the other sets?
Actually in first example, the trailer should be only 2 not 13 (mistake).
In second example, I have given only 4 IVR000000 record sets, if splitted in two files like 3 sets in File1 and 1 set in File2 then their record count must also be calculated and inserted as trailer in the file which is 3 and 1 respectively.
In real, I have more than 3-4 lacs IVR000000 record sets which I need to split into 4-5 files, the record sets should be inserted in files such that the file1 contains 1 lac record set, file2 contains 1 lacs,....., nth file contains the rest of the record sets.
There is no criteria to select IVR000000 records sets (No record should be excluded only split the big file into 4-5 files), the only condition is that every new client record starts with this IVR000000 entry and its details are in subsequent lines till before the next IVR000000 is encountered. This all is a one set of record and must not get mixed with other IVR000000 details.
Hope this time I am clear. Thank you all for your patience.
If still there is any confusion, please let me know.
Is your file made up entirely of IVR...PTX... records as the groups, plus the trailer, where you want to know the number of groups?
The you want to put approximately "n" records in each of three files, with the rest in a fourth file? The reason for "approximately" is that you don't want to split a group across files? And each file needs a trailer generated?
One more thing is that the file is VB file with LRECL=2052, so putting the indicator at the last bytes might not work.
Yes Bill, my entire file is made up of IVR,PTX,PLN,TRX,.... like prefixes with trailer record at the end. So, I want it to split in a way to keep all this in splitted files so that the split files can be feeded in the next job as this big file was feeded but now with less volume of records.
We plan to have 100,000 record sets in one file, and it could be the case that the last 2-3 files will not have any data if the records are less.
For variable-length records, you extend at the "front" of the record.
If you have only one type of group, the only thing you have to worry about is accidentally including the trailer in a group. So you can OMIT COND= the trailer.
Add a sequence number using INREC and IFTHEN=(WHEN=INIT. Use IFTHEN=(WHEN=GROUP for your group-starter value. PUSH the sequence number to the position of the sequence number. This will give all the records in the group the same sequence number.
Have four OUTFILs. First can be LT 100000 in the sequence number. Second GE 100000 and LT 200000, third GE 200000 and LT 300000, fourth with SAVE (to catch any remainder).
Thanks Bill for the suggestion but its not working, I also dig out some info and tried many things with the BEGIN, GROUP approach, but the file is not getting split edit: the word is 'split' not 'splitted' which does not exist
I am using the below code and also tried many other combinations:
The input file I am using is the same which I shared earlier with 4 'IVR000000' record set. In first OUTFIL I am getting all rows from Input file except the trailer and in 2nd OUTFIL I am getting the whole Input File including the trailer.
BUILD is required to eliminate the SEQ columns that were inserted otherwise the data is shifted by 3 columns.
The trailer in 2nd split file is having count 5 (because of Last ID for total 5 record sets) which should be 000002 (3 in first file and 2 in second file).
I am using Last ID as the trailer count, so for first split file it will be fine but after that the last ID will be the total ID's till that split file. So now if I can minus 3 (record sets in first file) from the Last ID of 2nd file I can get the desired output.
For example, if in split1 I put 1-100 records, split2 101-200, split3 201-270 then I should subtract 100 in split2 trailer and 200 in split3 trailer from the Last ID. Hope I am clear.
Please let me know if there is any other solution to get the desired result and issues if the last file produced is empty.
You missed my suggestion to add a sequence number (WHEN=INIT in INREC) and to PUSH that sequence-number for your GROUP. Include an additional temporary extension which is one for your IVR-headers and zero for anything else. Then TOT/TOTAL in the TRAILER1 will grive you the count.
Isn't it easier just to OMIT the file-trailer right up front? Or do you want to check the values on it?
Thanks a lot Bill. I have achieved what I wanted with your suggestions and help in a single sort step.
As per your suggestion I tried the single OMIT instead of checking in every OUTFIL and also using the Sequence number to form the group but the results were not proper. May be because I am performing multiple things in as single sort step.
My final sort is as below: