View previous topic :: View next topic
|
Author |
Message |
rajesh1183
New User
Joined: 07 Jan 2008 Posts: 98 Location: Hyderabad
|
|
|
|
Hi,
I am comparing 2 SAS datasets using PROC COMPARE. Here is the code which I used for comapring
Code: |
PROC COMPARE BASE=PROD.AUS_PREMIUM
COMPARE=TEST.AUS_PREMIUM
OUT=TEMPOUT OUTNOEQUAL;
RUN;
PROC PRINT DATA=TEMPOUT;
RUN;
|
default METHOD is EXACT is used for comparision
I get messages in SAS log as below
Code: |
Value Comparison Results for Variables
__________________________________________________________
|| Base Compare
Obs || PRVB PRVB Diff. % Diff
________ || _________ _________ _________ _________
||
24441 || 226177.32 226177.32 1.455E-11 6.434E-15
24553 || 2877354.8 2877354.8 2.328E-10 8.092E-15
__________________________________________________________
|
The data type of PRVB is NUMERIC and it is declared as 18.2
I see no mismatches in the values, but SAS reports that there are differences. Any clue...?
I tried writing the PROD & TEST SAS dataset to a normal PS file and then compared both files using 3.13... I didn't find any differences over there...
Let me know incase if u need any further details... |
|
Back to top |
|
|
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1042 Location: Richmond, Virginia
|
|
|
|
SAS numerics are always 8-bytes double precision (unless modified via the LENGTH command - very rarely used).
It is your format that is 18.2, which is why you are seeing only 2 digits precision.
The values must differ past that point.
Rerun with a modified format to see more digits. |
|
Back to top |
|
|
rajesh1183
New User
Joined: 07 Jan 2008 Posts: 98 Location: Hyderabad
|
|
|
|
When my data is till 2 decimal points, how come the COMPARE can compare beyond this..?? The unfilled decimal points should be filled with ZEROes right..??
Because, out of some 26,000 records, only 14 records were shown as mismatch.
Could you clear my doubt.. |
|
Back to top |
|
|
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1042 Location: Richmond, Virginia
|
|
|
|
COMPARE compares the entire value (EXACT), not just what part of it matches your format.
You never showed your calculation code. Did you round your calcs to 2 places?
If not, how do I know that the 18.2 [/u]format matches the precision of your actual values?
Remove the format, or increase it to show more dec places. |
|
Back to top |
|
|
rajesh1183
New User
Joined: 07 Jan 2008 Posts: 98 Location: Hyderabad
|
|
|
|
There is nothing like calculation code done from my end, Just, I declared PVRB with FORMAT 18.2
I assign PVRB with a value read from DB2 table. That column is defined as DECIMAL(19,2).
then I do a PROC SUMMARY as below:-
Code: |
PROC SUMMARY DATA=WORK.DS1;
OUTPUT OUT = WORK.DS2(DROP = _TYPE_ _FREQ_) SUM=;
BY A
B
C
;
VAR PVRB
;
RUN;
|
|
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8696 Location: Dubuque, Iowa, USA
|
|
|
|
Quote: |
The unfilled decimal points should be filled with ZEROes right..?? |
Where on earth did you come up with this misguided, totally wrong concept from??? SAS uses 8-byte floating point numbers and your values are quite unlikely to have nothing but zeroes after the formatted data. |
|
Back to top |
|
|
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1042 Location: Richmond, Virginia
|
|
|
|
Any chance you're doing these on different platforms; e.g., mainframe and UNIX? |
|
Back to top |
|
|
rajesh1183
New User
Joined: 07 Jan 2008 Posts: 98 Location: Hyderabad
|
|
|
|
All my SAS programs run on Mainframe only... |
|
Back to top |
|
|
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1042 Location: Richmond, Virginia
|
|
|
|
Even though your numbers match to 2 decimal places, since they are stored in binary, they are not all zero bits to the end, due to the way binary and decimal convert.
Can you input these two differing values in a data step using the $binary format:
Code: |
put @1 a $binary64.; |
and let's see what we've got? |
|
Back to top |
|
|
rajesh1183
New User
Joined: 07 Jan 2008 Posts: 98 Location: Hyderabad
|
|
|
|
Here they are...
From Compare dataset
Code: |
0100010100110111001110000001010100011110101110000101000111101010 - For 24441 observation
0100011000101011111001111010101011000010100011110101110000101000 - For 24553 observation
|
From Base dataset
Code: |
0100010100110111001110000001010100011110101110000101000111101001 - For 24441 observation
0100011000101011111001111010101011000010100011110101110000100111 - For 24553 observation
|
Now, I can see that last 4 bits differ in both the records. Now, can u let me know, how to convert this binary to DEC. |
|
Back to top |
|
|
Phrzby Phil
Senior Member
Joined: 31 Oct 2006 Posts: 1042 Location: Richmond, Virginia
|
|
|
|
BTW - I should have said to use binary64. (sans the $), but it does work with the $ version.
Notice that they differ by just one bit - the rightmost - least significant.
Binary arighmetic:
1001 + 0001 = 1010
0111 + 0001 = 1000
Somehow, due to the way the numbers are aggregated by your PROC SUMMARY, and maybe the order each run is getting the numbers, the accumulated values differ by the least possible amount. I don't know why.
Solution: do not use exact, but rather whatever option COMPARE has that tells it how precisely to compare.
It is always interesting and non-trivial when this happens.
To pursue it further to understand it, can you manually (e.g., in data step code) do the same arithmetic SUMMARY is doing for this var, checking each intermediate value via binary64., to see what's going on? |
|
Back to top |
|
|
|