Merging lines of text. Can SORT do this?

Claes Norreen · Posted: Fri Apr 27, 2007 5:38 pm

Hi experts!

I've got a dataset with a key and 1 to n lines of text, and it looks something like this:

KEY1 Text
KEY2 Text (ends with X'0D')
Text
KEY3 Text (ends with X'0D')
Text (ends with X'0D')
Text

The Text will vary in length, to a maximum of 80 chars per line. I want to merge all lines of text into one record per key, removing trailing spaces and the X'0D' indicator in the process. Can I do this with SORT?

Frank Yaeger · Posted: Fri Apr 27, 2007 8:22 pm

Is the input RECFM=VB and LRECL=80 or something else?

You show some of the lines ending with X'0D' and some not. Is that correct or does every line end with X'0D'? If not, what are the rules for which lines end with X'0D' and which lines don't?

Would the output RECFM be VB? What would the output LRECL be?

If the input is:

KEY1 Text1
KEY2 Text2
Text3
KEY3 Text4
Text5
Text6

What would the expected output be?

Claes Norreen · Posted: Fri Apr 27, 2007 10:05 pm

Hi Frank,

Let's say the input is VB 104.

All keys has at least one record, showing the key value (of 20 chars) and the first 80 chars of the text. If the text is longer than 80 chars, the last char is X'0D', and the text continues in the next record, but WITHOUT the key value. Only if the text continues in the next record does it have the X'0D' indicator.

The output will be VB 424 (as a max. of 5 textlines of 80 chars can be found).

Sample output:
KEY1 Text1
KEY2 Text2 Text3
KEY3 Text4 Text5 Text6

Frank Yaeger · Posted: Fri Apr 27, 2007 10:39 pm

I still need you to clarify some things about the structure of the various records.

KEY1 text1

No continuation. so it has the 4-byte RDW, a 20 byte key, and 80 characters of text, or can it be less than 80 characters of text? In other words, would this type of record always be 104 bytes (4+20+80) or could it be less than 104 bytes (e.g. 4+20+50)?

KEY2 text2 (ends with X'0D')
text3

Continuation, so the first line has the 4-byte RDW, a 20 byte key, 80 characters of text and a X'0D'? That would be 105 bytes so the LRECL would have to be VB 105? Or would it only have 79 characters of text followed by the X'0D' to get VB 104? Would this type of line always have the X'0D' in position 105 (or 104?) or could it have less than 80 characters with the X'0D' earlier on, e.g. 4-byte RDW, a 20 byte key, 50 characters of text and a X'0D'?

Would the second line always be padded out to 80 characters (4-byte RDW + 80 bytes), or could it have less than 80 characters (e.g. 4-byte RDW + 50 bytes)?

You don't actually want a space between the key and each text segment - right?

Finally, is the key identifiable in some way (e.g. it starts with 'KEY')? If not, how do we know when we have the start of a line of text? Is it just that the previous line did not end with X'0D'?

Claes Norreen · Posted: Sat Apr 28, 2007 12:12 am

The text length can vary, to a maximum of 80 chars per record (including the X'0D'). If it has a continuation, it can be anywhere in the text. Here's an example (where # denotes a X'0D' ):

KEY1Text1
KEY2Text2#
continuation of text2#
blanks can even be padded before X'0D' #
endofthiskey
KEY3#
This is possible too...!#
#
<blank line>
KEY4End of example

So none of the lines are padded out to the full record length.

I can identify the key, let's just keep it fairly simple and use KEY in the first three letters.

In fact, the output must be slightly different than I first explained. Still VB 424, but I'd like the following layout:

RDW + Key (20) + Text1 (80) + [Text2 (80)] + [Text3 (80)] + [Text4 (80)] + [Text5 (80)].

Thanks for your time, Frank. I hope this clarifies it?

Frank Yaeger · Posted: Sat Apr 28, 2007 2:32 am

Here's a DFSORT/ICETOOL job that will do what you asked for. You'll need z/OS DFSORT V1R5 PTF UK90007 or DFSORT R14 PTF UK90006 (April, 2006) in order to use DFSORT's PARSE function. If you don't have the April, 2006 PTF, ask your System Programmer to install it (it's free). For complete details on all of the new DFSORT and ICETOOL functions available with the April, 2006 PTF, see:

Use [URL] BBCode for External Links

Claes Norreen · Posted: Sat Apr 28, 2007 12:00 pm

Wow, almost can't wait till Monday...!

Thanks Frank! I hope I challenged you a little bit there...? ;-)

Frank Yaeger · Posted: Sat Apr 28, 2007 8:15 pm

Claes Norreen · Posted: Mon Apr 30, 2007 12:23 pm

Hi Frank,

It works very well

- except for lines containing nothing but X'0D' in which case SORT gives CC=16 and error message is:

William Thompson · Posted: Mon Apr 30, 2007 12:52 pm

Look at the Sort manual, there is a parameter that handles that.

Claes Norreen · Posted: Mon Apr 30, 2007 12:55 pm

Thanks, VLSCMP did the trick ;-)

Claes Norreen · Posted: Thu May 03, 2007 2:44 pm

DFSORT reduced CPU usage by a factor 8 compared to the application program that handled this task before!

Thanks again!