This topic is supposed to add some clarification to the whole mess in misunderstanding of data formats, and how to process different data correctly.
When I do not understand the stuff of any kind I usually create a tiny test for myself to understand it better, rather than screaming on forums:
"I have a requirement to convert one unknown file to another unknown file. Please, help!!!"
Here is one sample to clarify major numeric data conversions and considerations provided by both SYNCSORT, and DFSORT utilities (because both of them are based on fundamental IBM machine data formats, rather than on sophisticated/messy/boring data definitions invented in hundreds of programming languages).
Training #2
How to handle flexible decimal point for numeric input data
There are many questions on the issue with SORT utilities (either DFSORT, or SYNCSORT - doesn't matter): how to re-order data including numeric fields with decimal point at different positions of this field? Take this as an example:
Any attempt of stupid sort using formats like CH, SFF, UFF, ZD, PD, BI, FI either produced unacceptable results, or even caused SORT utility to fail due to unacceptable data format, with either SORT error message, or System ABEND like well-known S0C7. The best of possible results may look as follows
In order to produce reasonable and meaningful results all free-text numerical data which include: decimal points, and/or +/- signs, and/or spaces, need to be aligned by their decimal point position, and/or converted to one of numeric formats recognized by SORT utilities; those are - SFF, PD, ZD, FI.
If positive/negative values are not the case, then also formats UFF, BI can be used.
Some dilettant methods are often used to handle only specific cases of data; after any minor variation in data the process either fails, or quietly produces wrong results.
Below is one of much more general approach how to achieve this result in a regular professional way.
Code:
* Convert free-text numerics to aligned value columns in 4 formats
INREC IFTHEN=(WHEN=INIT, apply to any record
PARSE=(%1=(ABSPOS=1, detect whole part
ENDBEFR=C'.', ending with dot
FIXLEN=12), max len of the whole part
%2=(ENDBEFR=BLANKS, detect fractional part
* optional specific field separators:
* ENDBEFR=C',', | support CSV with comma
* ENDBEFR=C';', | support CSV non-USA
* ENDBEFR=X'05', | support TAB separator
* ENDBEFR=X'0A', | support LF separator
* ENDBEFR=X'0D', | support CR separator
FIXLEN=2)), max two digits, no more
* Place two parts of number left and right from dec point
OVERLAY=(14:%1,SQZ=(SHIFT=RIGHT), place whole part
C'.', dec '.' (view only)
%2,SQZ=(SHIFT=LEFT, place fraction part
TRAIL=C'00')), fill zeroes if none
HIT=NEXT), allow second conversion pass
* small trick: second pass of field conversion
IFTHEN=(WHEN=INIT, apply to any record
OVERLAY=(38:14,15,SFF,
ADD,+0, dummy arithmetic to fix dec
TO=PD,LENGTH=7, convert to S9(11)V99 COMP-3
53:14,15,SFF,
TO=FI,LENGTH=4, convert to S9(8)V99 COMP-1
62:14,15,SFF,
ADD,+0, dummy arithmetic to fix dec
TO=ZD,LENGTH=8)) convert to S9(11)V99.
*
* Correct sorting by any of 4 converted normalized values
SORT FIELDS=(14,15,SFF,A) sort by aligned and normalized SFF format
*SORT FIELDS=(38,08,PD,A) sort by aligned packed decimal format
*SORT FIELDS=(53,04,FI,A) sort by aligned binary format
*SORT FIELDS=(62,08,ZD,A) sort by aligned unpacked decimal format
*
* Unpack internal formats to print contents
OUTREC IFTHEN=(WHEN=INIT,
OVERLAY=(38:38,7,HEX, unpack COMP-3 as hexadecimal
53:53,4,HEX)) unpack COMP-1 as hexadecimal
*
OUTFIL REMOVECC,
HEADER1=(C' CH/SFF ',
C' WHOLE PART .FRACTION ',
C' SFF->PD ',
C' SFF->FI ',
C' SFF->ZD',
/,C'--as is-----|',
C'--as is----- ----------|',
C'--hex---------|',
C'--hex---|',
C'--as is-')
END
The produced output looks much better than any straightforward sort of such flexible format data.
This should achieve the results in one pass and in simplest and better way specially for -000.00 and +000.00 values and it can be any trailing Decimal and not just 2.
Note: It looks like a bug to me in EDIT , when '- ' passed to EDIT(STTTTT) it changes the sign to + from - which it should not have (for e.g '-.9' or '- .99') . So i have to replace first byte sign from original input to avod that happening.
This should achieve the results in one pass and in simplest and better way specially for -000.00 and +000.00 values and it can be any trailing Decimal and not just 2.
There is no difference from my explanation.
One pass is performed in both cases.
The number of desired fraction part is optional, as expected. 2 was used as example; one can change it to whatever is needed.
There is no any bug in conversion of +/- signs by SFF; no extra efforts are needed.
I am talking about +/- sign conversion by EDIT and not SFF. Look at offset 20 starting and for -.9 or -.99 value. For '- spaces' EDIT converted to '+ Zeroes' instead of '-Zeroes'. In your e.g. -0000 and +0000 came together where it should have been flip flopped based on the signs but that don't make much of a difference.
I am talking about +/- sign conversion by EDIT and not SFF. Look at offset 20 starting and for -.9 or -.99 value. For '- spaces' EDIT converted to '+ Zeroes' instead of '-Zeroes'. In your e.g. -0000 and +0000 came together where it should have been flip flopped based on the signs but that don't make much of a difference.
+0 and -0 are considered as equal values in all IBM-based hardware/software solutions. That's why after any SORT all +0's and -0's always go together as one group, but their order within this group remain the same as in input data (when option SORT ...EQUALS is in effect). Otherwise those values may be mixed in any order within this group of "zero values".
'-0' value can be introduced only manually; after any arithmetic operation, even simple 'MINUSZERO,PD,ADD,0' the result is mandatory converted to '+0'
Since my last example seems to be too complicated (I mixed together several examples into one), here is its simplified version: SORT only, without extra conversion to other possible formats.
Code:
* Convert free-text numerics to aligned value columns in 4 formats
INREC IFTHEN=(WHEN=INIT, apply to any record
PARSE=(%1=(ABSPOS=1, detect whole part
ENDBEFR=C'.', ending with dot
FIXLEN=12), max len of the whole part
%2=(ENDBEFR=BLANKS, detect fractional part
FIXLEN=12)), max 12 digits, no more
* Place two parts of number left and right from dec point
OVERLAY=(14:%1,SQZ=(SHIFT=RIGHT), place whole part
C'.', dec '.' (view only)
%2,SQZ=(SHIFT=LEFT, place fraction part
TRAIL=C'000000000000'))) fill 0's
* Correct sorting by converted value
SORT FIELDS=(14,23,SFF,A) sort by aligned and normalized SFF format
*
OUTFIL REMOVECC,
HEADER1=(C' CH/SFF ',
C' WHOLE PART .FRACTION ',
/,C'--as is-----|',
C'--as is----- ----------')
END
-00000 value was always there in your sample data, I guess if removed then it solves the discussion . Second SFF and ZD gave different results when -0 is not considered by IBM. Third, why in my case EDIT changed the sign form '-' to'+' remains unresolved.
-00000 value was always there in your sample data, I guess if removed then it solves the discussion . Second SFF and ZD gave different results when -0 is not considered by IBM. Third, why in my case EDIT changed the sign form '-' to'+' remains unresolved.
Usually parameter LENGTH=2, or 4, or 8 to generate 16-, 32, or 64-bit binary values.
Surprisingly, when using LENGTH=8, the SORT utility, before creating 64-bits binary, first of all performs intermediate conversion to the field depending on the total number of numeric characters in the input field. The test shows that when the number of input digits is less than 16, intermediate conversion to FI,LENGTH=4 is performed.
For example in case BUILD=(1,15,ZD,TO=FI,LENGTH=8) the maximal possible value of input field would be C'999999999999999' (to be converted to X'00038D7EA4C67FFF' in 64-bit notation).
Nevertheless actually the highest accepted value is C'000002147483647' (which is converted to X'000000007FFFFFFF' in 64-bit notation); any larger input value causes SORT utility to issue error message "OUTREC ARITHMETIC OVERFLOW", and ABEND U0016.