IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Merging records, but not all the time


IBM Mainframe Forums -> DFSORT/ICETOOL
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
prino

Senior Member


Joined: 07 Feb 2009
Posts: 1306
Location: Vilnius, Lithuania

PostPosted: Sat Feb 17, 2018 4:20 pm
Reply with quote

Sample of input file, lrecl=121:

Code:
1
 +------+------+------+--------+-------+-------+------+-----+-----+
 -------------------------------------------------+-------------------------------------------------+
 | @@@@ | @@@@ | @@@@ |     @@ |  @@@@ |    @= | @@@@ | @@@ | @@@ |
 @@@@@@@@@                                       | @@@@@@@                                         |
 +------+------+------+--------+-------+-------+------+-----+-----+
 -------------------------------------------------+-------------------------------------------------+
 | #### |   ## |   ## | ####.# | ##:## | ###.# | -    | @@  | *   |
 @@@@ @@@@@@@@@@                                 | @@@@@@@@@@                                      |
 | #### |   ## |   ## | ####.# | ##:## |  ##.# | @    | @@  | *   |
 @@@@@@@@                                        | @@@@@@@ @@@@ @@@@@                              |
 | #### |    # |   ## | ####.# | ##:## | ###.# | @    | @   | *   |
 @@@@@                                           | @@@@@@@@@@@ @@@@@@@@@@@@@@@@ @@@                  |
 | #### |   ## |   ## | ####.# | ##:## | ###.# | -    | @@  | *   |
 @@@@@@@@@@ @@@@@@ @@@@                           | @@@@@@@@                                        |
 | #### |   ## |   ## | ####.# | ##:## | ###.# | -    | @@  | *   |
 +------+------+------+--------+-------+-------+------+-----+-----+
 -------------------------------------------------+-------------------------------------------------+ 
1


If there weren't any ASA control characters, i.e. just a series of odd-even records, I could (Yes, I've read Smart DFSORT Tricks) figure out how to remove trailing blanks from the odd ones, and merge the even ones onto them, but how do I go about it now that these poxy ASA's are there, without going into a multi-pass scenario. The process is currently done by an edit macro running in batch, but that's kind of CPU intensive. I could write a little PL/I program, but if it's easy to do with Sort, I'd be ever so happy.

The expected output looks like this, with an LRECL that's "large" enough:
Code:
1
 +------+------+------+--------+-------+-------+------+-----+-----+-------------------------------------------------+-------------------------------------------------+
 | @@@@ | @@@@ | @@@@ |     @@ |  @@@@ |    @= | @@@@ | @@@ | @@@ | @@@@@@@@@                                       | @@@@@@@                                         |
 +------+------+------+--------+-------+-------+------+-----+-----+-------------------------------------------------+-------------------------------------------------+
 | #### |   ## |   ## | ####.# | ##:## | ###.# | -    | @@  | *   | @@@@ @@@@@@@@@@                                 | @@@@@@@@@@                                      |
 | #### |   ## |   ## | ####.# | ##:## |  ##.# | @    | @@  | *   | @@@@@@@@                                        | @@@@@@@ @@@@ @@@@@                              |
 | #### |    # |   ## | ####.# | ##:## | ###.# | @    | @   | *   | @@@@@                                           | @@@@@@@@@@@ @@@@@@@@@@@@@@@@ @@@                 |
 | #### |   ## |   ## | ####.# | ##:## | ###.# | -    | @@  | *   | @@@@@@@@@@ @@@@@@ @@@@                           | @@@@@@@@                                        |
 +------+------+------+--------+-------+-------+------+-----+-----+-------------------------------------------------+-------------------------------------------------+
1


And just in case, the end-of-column markers for the last two columns do not match up as the data contains UTF-8 encoded data. In the world of the white boxes everything will look OK again.

As for the inevitable "Why not output full records in the first place?"

Two main reasons:
  1. this output file was later added to a program
  2. the regression test framework is built around files with an LRECL=121
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2022
Location: USA

PostPosted: Sun Feb 18, 2018 8:10 am
Reply with quote

Which of ASA control characters to recognize?

For instance, '-' in lines 3, 7. ... is one of control characters.
As well as '+' in line 3 from bottom is control character, too.

If only '1' is considered it seems to be simple. Otherwise more specification details are needed.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2022
Location: USA

PostPosted: Sun Feb 18, 2018 8:58 am
Reply with quote

FYI:
Code:

ASA Character   Action                                      ASCII Equivalent
----------------------------------------------------------------------------
blank           Advance 1 line (single spacing)             CR LF
1               Advance to next page (form feed)            CR FF
2–9, A, B, C    Advance to vertical tab stop                CR VT (approximately)
0               Advance 2 lines (double spacing)            CR LF LF
-               Advance 3 lines (triple spacing)            CR LF LF LF
+               Do not advance any lines before printing,
                overstrike previous line with current line  CR
Back to top
View user's profile Send private message
prino

Senior Member


Joined: 07 Feb 2009
Posts: 1306
Location: Vilnius, Lithuania

PostPosted: Sun Feb 18, 2018 10:24 pm
Reply with quote

Oops, mea culpa, mea maxima culpa!

The current output file, like all others, is FBA with an LRECL=121. However, the '-' that is, among others, present in line 3 of the first sample is not an ASA control character, but part of the to-be-merged line of data. The only characters that should be exempted from the merge process are the '1' Form Feed characters, the minus'es on lines 3, 7 & 18 are data. The '+' on line is the result of an erroneous Cut&Paste, it should have been preceded with a space!
In other words a '1' form feed al always followed by 2n lines of data, of which the "even" lines need to me merged with the preceding "odd" lines, where the first "odd" line is the first line after the line with the '1' ASA character.
Back to top
View user's profile Send private message
expat

Global Moderator


Joined: 14 Mar 2007
Posts: 8797
Location: Welsh Wales

PostPosted: Mon Feb 19, 2018 12:48 pm
Reply with quote

Ages since I've played with sort, but could you not just EXCLUDE records with 1 in col 1 during the process ?
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Tue Feb 20, 2018 9:55 am
Reply with quote

I tried to rearrange the sample input manually to get to the expected output and ended up in this (looks like an extra output record).
Code:
1
  +------+------+------+--------+-------+-------+------+-----+-----+-------------------------------------------------+-------------------------------------------------+
  | @@@@ | @@@@ | @@@@ |     @@ |  @@@@ |    @= | @@@@ | @@@ | @@@ | @@@@@@@@@                                       | @@@@@@@                                         |
  +------+------+------+--------+-------+-------+------+-----+-----+-------------------------------------------------+-------------------------------------------------+
  | #### |   ## |   ## | ####.# | ##:## | ###.# | -    | @@  | *   | @@@@ @@@@@@@@@@                                 | @@@@@@@@@@                                      |
  | #### |   ## |   ## | ####.# | ##:## |  ##.# | @    | @@  | *   | @@@@@@@@                                        | @@@@@@@ @@@@ @@@@@                              |
  | #### |    # |   ## | ####.# | ##:## | ###.# | @    | @   | *   | @@@@@                                           | @@@@@@@@@@@ @@@@@@@@@@@@@@@@ @@@                |
  | #### |   ## |   ## | ####.# | ##:## | ###.# | -    | @@  | *   | @@@@@@@@@@ @@@@@@ @@@@                          | @@@@@@@@                                        |
  | #### |   ## |   ## | ####.# | ##:## | ###.# | -    | @@  | *   |
  +------+------+------+--------+-------+-------+------+-----+-----+-------------------------------------------------+-------------------------------------------------+ 
1
Back to top
View user's profile Send private message
prino

Senior Member


Joined: 07 Feb 2009
Posts: 1306
Location: Vilnius, Lithuania

PostPosted: Tue Feb 20, 2018 4:13 pm
Reply with quote

Arun Raj wrote:
I tried to rearrange the sample input manually to get to the expected output and ended up in this (looks like an extra output record).

An extra record?

Basically the output of the program is a dataset that contains several hundred of split-into-two-lines-per-row tables, separated by '1' ASA control characters. A, let's call it "table section" consists of

  1. a FF record (ASA = '1')
  2. a separator ('+---+...') line
  3. a heading ('| ID | ....') line
  4. another separator line
  5. 1 (to 10) data lines
  6. another separator line
  7. 1 (to 10) data lines
  8. another separator line
  9. 1 (to 10) data lines
  10. a final separator line

and because the lines are longer than the historical 121 characters used in the other output files from this particular program (and we don't want to change the regression testing framework), all records (separator/heading/data) are split up into two parts, the first occupying, after a ' ' (blank ASA control character in column 1) the next 66 characters of the "odd" records, the second occupying (at the moment) characters 1-105 of the "even" records - we're processing UTF8 encoded data with PL/I that will be transferred back to white box world, where the "unaligned" '|' characters will again fall in line on programs that can deal with UTF8. The actual data in these columns cannot exceed 47 bytes (which is already a problem, we have UTF8 encoded strings of 55 bytes, to be dealt with at a later stage)

The current solution is running an edit macro in batch that simply finds a '1' in column 1, takes (for n=1...x, where x is the last line before the next new page) line 2n-1 following that, strips of the trailing blanks and concatenates line 2n to it, obviously all after copying the LRECL=121 dataset to one that can hold the longest resulting LRECL, which in the current set-up would be, back of a napkin calculation, LRECL=198. In reality a VB(259) dataset is used.

Probably less than a hundred lines of PL/I could do it, but it would be nice to see a solution using SORT. Without the '1' ASA characters a simple RESIZE followed by a remove of chars 68-121, and stripping of any railing blanks would do the trick, and that's what I would do in PL/I, or will, if nothing comes up here ;)
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Tue Feb 20, 2018 7:07 pm
Reply with quote

Quote:
An extra record?
Just wanted to confirm it is a typo while pasting, Field-1 in the 'table' seems to have 5 '####'s in the sample input, but in the output there are only 4 of them. Anyways the last post above explains the requirement in detail. Thank you.
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Tue Feb 20, 2018 10:00 pm
Reply with quote

Can you try something like this? Did not give much thought into it, and probably can make this better if I find some time later.
Code:
//STEP01  EXEC PGM=SORT       
//SYSOUT    DD SYSOUT=*       
//SORTIN    DD *             
1                             
AAA                           
BBB                           
CCC                           
DDD                           
1                             
EEE                           
FFF                           
GGG                           
HHH                           
1                             
//SORTOUT   DD SYSOUT=*       
//SYSIN     DD *                                                   
 OPTION COPY                                                       
 INREC  IFTHEN=(WHEN=(1,1,CH,NE,C'1'),                             
       OVERLAY=(81:SEQNUM,8,ZD,81:81,8,ZD,MOD,+2,M11,LENGTH=1))     
 OUTREC IFTHEN=(WHEN=GROUP,BEGIN=(81,1,CH,EQ,C'1'),                 
          PUSH=(82:1,3))                                           
 OUTFIL IFOUTLEN=80,                                               
          OMIT=(81,1,CH,EQ,C'1'),                                   
        IFTHEN=(WHEN=(81,1,CH,EQ,C'0'),BUILD=(82,3,1,3,80:X))     

SORTOUT
Code:
1     
AAABBB
CCCDDD
1     
EEEFFF
GGGHHH
1     

EDIT : logic on a high level : The INREC assigns a sequence number for the 'data records' (except the '1' records). And the 'MOD,+2' logic replaces the sequence numbers with alternate '1' and '0' IDs for data record pairs. The OUTREC logic propagates the data on the first record onto the second record on each pair. The OUTFIL omits the first record on each pair (record-2 already has data from record-1 attached).
Back to top
View user's profile Send private message
prino

Senior Member


Joined: 07 Feb 2009
Posts: 1306
Location: Vilnius, Lithuania

PostPosted: Wed Feb 21, 2018 1:47 am
Reply with quote

Thanks, I'll have to open the Tricks manual and check out what's happening, and how to adapt it to what I need.

I'll get back if I'll get stuck.
Back to top
View user's profile Send private message
prino

Senior Member


Joined: 07 Feb 2009
Posts: 1306
Location: Vilnius, Lithuania

PostPosted: Wed Feb 21, 2018 2:44 am
Reply with quote

Just a hint for others:

Code:
//SYSIN     DD *
 OPTION COPY
 INREC  IFTHEN=(WHEN=(1,1,CH,NE,C'1'),
       OVERLAY=(81:SEQNUM,8,ZD,81:81,8,ZD,MOD,+2,M11,LENGTH=1))
 OUTREC IFTHEN=(WHEN=GROUP,BEGIN=(81,1,CH,EQ,C'1'),
          PUSH=(82:1,3))
 OUTFIL IFOUTLEN=80,
          OMIT=(81,1,CH,EQ,C'1'),
        IFTHEN=(WHEN=(81,1,CH,EQ,C'0'),BUILD=(82,3,1,3,80:X))

add
Code:
//
before the OUTREC and then later before the OUTFIL statements, and you can see the intermediate data. Doing so made it very easy, a hell of a lot easier than going through the manual, for me to understand what the above is doing. ;)
Back to top
View user's profile Send private message
prino

Senior Member


Joined: 07 Feb 2009
Posts: 1306
Location: Vilnius, Lithuania

PostPosted: Wed Feb 21, 2018 3:58 am
Reply with quote

Just another omission in the original and updated problem statement, there are a number of "section separator lines", but the following:

Code:
//MRGNTOP EXEC PGM=SORT
//*
//SYSOUT    DD SYSOUT=*
//*
//SORTIN    DD DSN=&SYSUID..LN,
//             DISP=SHR
//*
//SORTOUT   DD SYSOUT=*
//*
//SYSIN     DD *
 OPTION COPY
 INREC  IFTHEN=(WHEN=(1,1,CH,EQ,C'1',OR,
                      1,6,CH,EQ,C' Sect-'),
       OVERLAY=(122:C' ')),
        IFTHEN=(WHEN=NONE,
       OVERLAY=(122:SEQNUM,8,ZD,122:122,8,ZD,MOD,+2,M11,LENGTH=1))
*
 OUTREC IFTHEN=(WHEN=GROUP,BEGIN=(122,1,CH,EQ,C'1'),
          PUSH=(123:1,67))
*
 OUTFIL FTOV,
          OMIT=(122,1,CH,EQ,C'1'),
        IFTHEN=(WHEN=(122,1,CH,EQ,C'0'),BUILD=(123,67,1,121)),
        IFTHEN=(WHEN=(122,1,CH,EQ,C' '),BUILD=(1,121))

Does exactly what's required, and rather a lot faster than the current edit macro, while using a hell of a lot less CPU!

Only puzzled by one issue, why does the data ending up in SYSOUT, when viewed in SDSF using the SE line command have an LRECL=194???
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Feb 21, 2018 8:02 am
Reply with quote

You might want to add an IFOUTLEN parameter to the OUTFIL (as shown in my example) to limit the output record length. From a quick look I believe your OUTREC is extending the record length to 189 and then the FTOV adds the RDW to it to make it 193.
Back to top
View user's profile Send private message
prino

Senior Member


Joined: 07 Feb 2009
Posts: 1306
Location: Vilnius, Lithuania

PostPosted: Wed Feb 21, 2018 5:16 pm
Reply with quote

Arun Raj wrote:
You might want to add an IFOUTLEN parameter to the OUTFIL (as shown in my example) to limit the output record length. From a quick look I believe your OUTREC is extending the record length to 189 and then the FTOV adds the RDW to it to make it 193.

I would have expected the 193, but the "PROF" command while in SDSF's edit shows 194.
Back to top
View user's profile Send private message
Nic Clouston

Global Moderator


Joined: 10 May 2007
Posts: 2455
Location: Hampshire, UK

PostPosted: Wed Feb 21, 2018 7:30 pm
Reply with quote

1 byte for the ASA character - possibly?
Back to top
View user's profile Send private message
Arun Raj

Moderator


Joined: 17 Oct 2006
Posts: 2481
Location: @my desk

PostPosted: Wed Feb 21, 2018 10:28 pm
Reply with quote

prino - If the output is written to a data set you should see the expected results. DFSORT messages in SYSOUT would show the same LRECL in both the cases.

I ran a small test and when the output is routed to SYSOUT, regardless of DFSORT/IDCAMS/IEBGENER, I 'see' the extra one byte in SYSOUT output.
Back to top
View user's profile Send private message
sergeyken

Senior Member


Joined: 29 Apr 2008
Posts: 2022
Location: USA

PostPosted: Thu Feb 22, 2018 10:43 pm
Reply with quote

prino wrote:
Oops, mea culpa, mea maxima culpa!

The current output file, like all others, is FBA with an LRECL=121. However, the '-' that is, among others, present in line 3 of the first sample is not an ASA control character, but part of the to-be-merged line of data. The only characters that should be exempted from the merge process are the '1' Form Feed characters, the minus'es on lines 3, 7 & 18 are data. The '+' on line is the result of an erroneous Cut&Paste, it should have been preceded with a space!
In other words a '1' form feed al always followed by 2n lines of data, of which the "even" lines need to me merged with the preceding "odd" lines, where the first "odd" line is the first line after the line with the '1' ASA character.

From your original post(s) none of these statements were obvious. Readers are free to guess whatever they want.
When you need exact answers, you should ask the exact questions.
Back to top
View user's profile Send private message
prino

Senior Member


Joined: 07 Feb 2009
Posts: 1306
Location: Vilnius, Lithuania

PostPosted: Fri Feb 23, 2018 3:22 am
Reply with quote

At some stage I might add an IFOUTLEN= statement, but right now I've got another far more serious problem, which has nothing to do with SORT (and cannot be solved by it).

The output file, after a further bit of post-processing to convert UTF8 into RTF "\uNNNN" escapes, does not display correctly for CJK characters (not in M$ Word, not in LO Writer) due to the facts that
  • the "Courier New" font does not contain CJK characters,
  • the substitution fonts (different for Word & Writer) have wholly incompatible font metrics,
  • LO Writer has a font-reset bug, making the layout even worse
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> DFSORT/ICETOOL

 


Similar Topics
Topic Forum Replies
No new posts Compare 2 files and retrive records f... DFSORT/ICETOOL 2
No new posts Compare 2 files(F1 & F2) and writ... JCL & VSAM 8
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts Pulling a fixed number of records fro... DB2 2
No new posts To get the the current time DFSORT/ICETOOL 13
Search our Forums:

Back to Top