If there weren't any ASA control characters, i.e. just a series of odd-even records, I could (Yes, I've read Smart DFSORT Tricks) figure out how to remove trailing blanks from the odd ones, and merge the even ones onto them, but how do I go about it now that these poxy ASA's are there, without going into a multi-pass scenario. The process is currently done by an edit macro running in batch, but that's kind of CPU intensive. I could write a little PL/I program, but if it's easy to do with Sort, I'd be ever so happy.
The expected output looks like this, with an LRECL that's "large" enough:
And just in case, the end-of-column markers for the last two columns do not match up as the data contains UTF-8 encoded data. In the world of the white boxes everything will look OK again.
As for the inevitable "Why not output full records in the first place?"
Two main reasons:
this output file was later added to a program
the regression test framework is built around files with an LRECL=121
ASA Character Action ASCII Equivalent
----------------------------------------------------------------------------
blank Advance 1 line (single spacing) CR LF
1 Advance to next page (form feed) CR FF
2–9, A, B, C Advance to vertical tab stop CR VT (approximately)
0 Advance 2 lines (double spacing) CR LF LF
- Advance 3 lines (triple spacing) CR LF LF LF
+ Do not advance any lines before printing,
overstrike previous line with current line CR
Joined: 07 Feb 2009 Posts: 1306 Location: Vilnius, Lithuania
Oops, mea culpa, mea maxima culpa!
The current output file, like all others, is FBA with an LRECL=121. However, the '-' that is, among others, present in line 3 of the first sample is not an ASA control character, but part of the to-be-merged line of data. The only characters that should be exempted from the merge process are the '1' Form Feed characters, the minus'es on lines 3, 7 & 18 are data. The '+' on line is the result of an erroneous Cut&Paste, it should have been preceded with a space!
In other words a '1' form feed al always followed by 2n lines of data, of which the "even" lines need to me merged with the preceding "odd" lines, where the first "odd" line is the first line after the line with the '1' ASA character.
Joined: 07 Feb 2009 Posts: 1306 Location: Vilnius, Lithuania
Arun Raj wrote:
I tried to rearrange the sample input manually to get to the expected output and ended up in this (looks like an extra output record).
An extra record?
Basically the output of the program is a dataset that contains several hundred of split-into-two-lines-per-row tables, separated by '1' ASA control characters. A, let's call it "table section" consists of
a FF record (ASA = '1')
a separator ('+---+...') line
a heading ('| ID | ....') line
another separator line
1 (to 10) data lines
another separator line
1 (to 10) data lines
another separator line
1 (to 10) data lines
a final separator line
and because the lines are longer than the historical 121 characters used in the other output files from this particular program (and we don't want to change the regression testing framework), all records (separator/heading/data) are split up into two parts, the first occupying, after a ' ' (blank ASA control character in column 1) the next 66 characters of the "odd" records, the second occupying (at the moment) characters 1-105 of the "even" records - we're processing UTF8 encoded data with PL/I that will be transferred back to white box world, where the "unaligned" '|' characters will again fall in line on programs that can deal with UTF8. The actual data in these columns cannot exceed 47 bytes (which is already a problem, we have UTF8 encoded strings of 55 bytes, to be dealt with at a later stage)
The current solution is running an edit macro in batch that simply finds a '1' in column 1, takes (for n=1...x, where x is the last line before the next new page) line 2n-1 following that, strips of the trailing blanks and concatenates line 2n to it, obviously all after copying the LRECL=121 dataset to one that can hold the longest resulting LRECL, which in the current set-up would be, back of a napkin calculation, LRECL=198. In reality a VB(259) dataset is used.
Probably less than a hundred lines of PL/I could do it, but it would be nice to see a solution using SORT. Without the '1' ASA characters a simple RESIZE followed by a remove of chars 68-121, and stripping of any railing blanks would do the trick, and that's what I would do in PL/I, or will, if nothing comes up here ;)
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
Quote:
An extra record?
Just wanted to confirm it is a typo while pasting, Field-1 in the 'table' seems to have 5 '####'s in the sample input, but in the output there are only 4 of them. Anyways the last post above explains the requirement in detail. Thank you.
EDIT : logic on a high level : The INREC assigns a sequence number for the 'data records' (except the '1' records). And the 'MOD,+2' logic replaces the sequence numbers with alternate '1' and '0' IDs for data record pairs. The OUTREC logic propagates the data on the first record onto the second record on each pair. The OUTFIL omits the first record on each pair (record-2 already has data from record-1 attached).
before the OUTREC and then later before the OUTFIL statements, and you can see the intermediate data. Doing so made it very easy, a hell of a lot easier than going through the manual, for me to understand what the above is doing. ;)
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
You might want to add an IFOUTLEN parameter to the OUTFIL (as shown in my example) to limit the output record length. From a quick look I believe your OUTREC is extending the record length to 189 and then the FTOV adds the RDW to it to make it 193.
Joined: 07 Feb 2009 Posts: 1306 Location: Vilnius, Lithuania
Arun Raj wrote:
You might want to add an IFOUTLEN parameter to the OUTFIL (as shown in my example) to limit the output record length. From a quick look I believe your OUTREC is extending the record length to 189 and then the FTOV adds the RDW to it to make it 193.
I would have expected the 193, but the "PROF" command while in SDSF's edit shows 194.
Joined: 17 Oct 2006 Posts: 2481 Location: @my desk
prino - If the output is written to a data set you should see the expected results. DFSORT messages in SYSOUT would show the same LRECL in both the cases.
I ran a small test and when the output is routed to SYSOUT, regardless of DFSORT/IDCAMS/IEBGENER, I 'see' the extra one byte in SYSOUT output.
The current output file, like all others, is FBA with an LRECL=121. However, the '-' that is, among others, present in line 3 of the first sample is not an ASA control character, but part of the to-be-merged line of data. The only characters that should be exempted from the merge process are the '1' Form Feed characters, the minus'es on lines 3, 7 & 18 are data. The '+' on line is the result of an erroneous Cut&Paste, it should have been preceded with a space!
In other words a '1' form feed al always followed by 2n lines of data, of which the "even" lines need to me merged with the preceding "odd" lines, where the first "odd" line is the first line after the line with the '1' ASA character.
From your original post(s) none of these statements were obvious. Readers are free to guess whatever they want.
When you need exact answers, you should ask the exact questions.
Joined: 07 Feb 2009 Posts: 1306 Location: Vilnius, Lithuania
At some stage I might add an IFOUTLEN= statement, but right now I've got another far more serious problem, which has nothing to do with SORT (and cannot be solved by it).
The output file, after a further bit of post-processing to convert UTF8 into RTF "\uNNNN" escapes, does not display correctly for CJK characters (not in M$ Word, not in LO Writer) due to the facts that
the "Courier New" font does not contain CJK characters,
the substitution fonts (different for Word & Writer) have wholly incompatible font metrics,
LO Writer has a font-reset bug, making the layout even worse