How to split large record length file using STARTAFT in SORT

Figo1988 · New User Joined: 19 Mar 2024 Posts: 6 Location: Canada

I have a dataset of Length 30000, where each record has many filenames listed of max length of lets assume 14 bytes but the filename always starts after a unique set of characters , lets assume X’A0B0C0’ or C’µ^{‘ . The number of file names in each record is unknown and the number of records in the input file is also unknown

INPUT DATA

µ^{ FIGO.ABC.DATA1 asjhdajsdg µ^{ FIGO.ABC.DATA2 ahgsdgahshgdasd
hjgashgda µ^{ FIGO.ABC.DATA3 asdasgdhgaskdhagsdjhgasdjahgsdjggajd
hgasdgµ^{ FIGO.ABC.DATA4 asgd µ^{ FIGO.ABC.DATA5 ahsgdgsadjgajgd

EXPECTED RESULT

FIGO.ABC.DATA1
FIGO.ABC.DATA2
FIGO.ABC.DATA3
FIGO.ABC.DATA4
FIGO.ABC.DATA5

My Half baked Solution
******************
I proceeded with OUTFIL, though i can use INREC.
I can use ,/, for writing in the next line and
use PARSE=(%01=(STARTAFT=X’A0B0C0’,FIXLEN=14) but I can’t use REPEAT as I don’t how many file names will be listed in each record.
Also as I don’t know how many records are in the input file , I don’t know how many (%2,%3, …..%n) I need to use in my BUILD

I already achieved this in Cobol, but just wanted to know if it can be achieved in SORT.

Joerg.Findeisen · Posted: Wed Mar 27, 2024 10:36 am

I have provided a solution in https://ibmmainframes.com/about68962.html some time ago.

Figo1988 · New User Joined: 19 Mar 2024 Posts: 6 Location: Canada

Much Appreciated for the super fast response, Joerg!!!

But may i know, what's the reason behind deciding the REPEAT factor to be %199 ? By the way i already found your earlier post but the same REPEAT factor of %199 is where confused me to think that my request was a different one and i still think it is

However I executed your idea but the results, though they look good, am really not sure if it has processed all the 1 million records of my input file, as there is a difference in the total records that i got as an output from my COBOL output.

And by the way, it was weird to see that the job's execution almost took a minute , am not sure if its because we are processing in the INREC.

sergeyken · Posted: Thu Mar 28, 2024 3:48 am

Figo1988 · New User Joined: 19 Mar 2024 Posts: 6 Location: Canada

hi sergeyken,

Thanks for asking!!!

I am not sure if you had a chance to look at my input data and expected data.

Also the solution provided by Joerg is a brilliant one which is what i am exactly looking for.
But just was curious to know if it actually processing all my 1 million records

But coming to your Question, What i meant by Length is that the dataset is of LRECL 30000.
Inside the dataset , there are Million records.
In each record, at random position, Production file names are listed.
In reality The filenames has a max length of 44 bytes , but in my example , i gave only 14 to get the idea.

Thanks!!!

dneufarth · Posted: Thu Mar 28, 2024 4:40 am

Are the counts not in DFSMSG SYSOUT?

Figo1988 · New User Joined: 19 Mar 2024 Posts: 6 Location: Canada

HI DNEUFARTH,

Thanks for the response!!!

INSERT 2368872, DELETE 0
RECORDS - IN: 1211964, OUT: 23808360
OUT : DELETED = 21797170, REPORT = 0, DATA = 2011190
OUT : TOTAL IN = 23808360, TOTAL OUT = 2011190

I can see that it processed all the 1211964 records in my input.
this helps me to dig deeper why the results are different.

Thanks a lot for all the responses!!!
happy to join the world's Best Forum for mainframe

Joerg.Findeisen · Posted: Thu Mar 28, 2024 10:42 am

Figo1988 · New User Joined: 19 Mar 2024 Posts: 6 Location: Canada

Thanks Joerg!!!

Now i understand , why my results were different from COBOL.

And i didn't know %199 factor was random, so basically we have to come up with the REPEAT factor based on total LRECL which is 30000 in my case and the length of string that needs to be extracted , in my case it would be 44 for the the filename excluding the Unique character given in STARTAFT and ENDBFR

So that would be 30000/44 = 681 or rounded to 700 for a safer side.

This is where i was little bit hesitant on the REPEAT factor from your earlier thread and little bit lazy to have such a huge REPEAT factor in the BUILD

Thanks a ton, again Joerg

You are a genius anyway

But am happy to hear any other ideas without having a huge REPEAT factor.

sergeyken · Posted: Fri Mar 29, 2024 12:34 am

Figo1988 · New User Joined: 19 Mar 2024 Posts: 6 Location: Canada

sergeyken,

I think i was looking for help , not for free advice.

Perhaps you should check with Jeorg on how exactly he understood a simple mainframe terminology.

Its clear enough to understand that when i meant "I HAVE A DATASET OF LENGTH 30000" , i actually meant LRECL.

Nobody measures the length of a dataset based on the RECORD COUNT

, and infact i also mentioned i have million records inside.

Anyways if you are interested in helping, i would suggest in replying with new ideas , If not please you can remain silent instead of advising using PROVERBS

No hard feelings, please !!!!

Also , i feel this topic can be closed if there are no new ideas.

Thanks again , ALL