IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

SAS: PROC COMPARE


IBM Mainframe Forums -> All Other Mainframe Topics
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
rajesh1183

New User


Joined: 07 Jan 2008
Posts: 98
Location: Hyderabad

PostPosted: Thu Apr 22, 2010 5:04 pm
Reply with quote

Hi,

I am comparing 2 SAS datasets using PROC COMPARE. Here is the code which I used for comapring
Code:

PROC COMPARE BASE=PROD.AUS_PREMIUM   
             COMPARE=TEST.AUS_PREMIUM
OUT=TEMPOUT OUTNOEQUAL;                       
RUN;                                           
PROC PRINT DATA=TEMPOUT;                       
RUN;                                           


default METHOD is EXACT is used for comparision

I get messages in SAS log as below
Code:

          Value Comparison Results for Variables         
                                                         
__________________________________________________________
           ||       Base    Compare                       
       Obs ||       PRVB       PRVB      Diff.     % Diff
 ________  ||  _________  _________  _________  _________
           ||                                             
    24441  ||  226177.32  226177.32  1.455E-11  6.434E-15
    24553  ||  2877354.8  2877354.8  2.328E-10  8.092E-15
__________________________________________________________


The data type of PRVB is NUMERIC and it is declared as 18.2

I see no mismatches in the values, but SAS reports that there are differences. Any clue...?

I tried writing the PROD & TEST SAS dataset to a normal PS file and then compared both files using 3.13... I didn't find any differences over there...

Let me know incase if u need any further details...
Back to top
View user's profile Send private message
Phrzby Phil

Senior Member


Joined: 31 Oct 2006
Posts: 1042
Location: Richmond, Virginia

PostPosted: Thu Apr 22, 2010 5:13 pm
Reply with quote

SAS numerics are always 8-bytes double precision (unless modified via the LENGTH command - very rarely used).
It is your format that is 18.2, which is why you are seeing only 2 digits precision.
The values must differ past that point.
Rerun with a modified format to see more digits.
Back to top
View user's profile Send private message
rajesh1183

New User


Joined: 07 Jan 2008
Posts: 98
Location: Hyderabad

PostPosted: Thu Apr 22, 2010 5:52 pm
Reply with quote

When my data is till 2 decimal points, how come the COMPARE can compare beyond this..?? The unfilled decimal points should be filled with ZEROes right..??


Because, out of some 26,000 records, only 14 records were shown as mismatch.

Could you clear my doubt..
Back to top
View user's profile Send private message
Phrzby Phil

Senior Member


Joined: 31 Oct 2006
Posts: 1042
Location: Richmond, Virginia

PostPosted: Thu Apr 22, 2010 5:55 pm
Reply with quote

COMPARE compares the entire value (EXACT), not just what part of it matches your format.
You never showed your calculation code. Did you round your calcs to 2 places?
If not, how do I know that the 18.2 [/u]format matches the precision of your actual values?
Remove the format, or increase it to show more dec places.
Back to top
View user's profile Send private message
rajesh1183

New User


Joined: 07 Jan 2008
Posts: 98
Location: Hyderabad

PostPosted: Thu Apr 22, 2010 6:12 pm
Reply with quote

There is nothing like calculation code done from my end, Just, I declared PVRB with FORMAT 18.2

I assign PVRB with a value read from DB2 table. That column is defined as DECIMAL(19,2).

then I do a PROC SUMMARY as below:-

Code:

PROC SUMMARY DATA=WORK.DS1;
OUTPUT  OUT = WORK.DS2(DROP = _TYPE_ _FREQ_) SUM=;
    BY A
         B
         C
    ;
   VAR PVRB
   ;
RUN;
Back to top
View user's profile Send private message
Robert Sample

Global Moderator


Joined: 06 Jun 2008
Posts: 8696
Location: Dubuque, Iowa, USA

PostPosted: Thu Apr 22, 2010 6:19 pm
Reply with quote

Quote:
The unfilled decimal points should be filled with ZEROes right..??
Where on earth did you come up with this misguided, totally wrong concept from??? SAS uses 8-byte floating point numbers and your values are quite unlikely to have nothing but zeroes after the formatted data.
Back to top
View user's profile Send private message
Phrzby Phil

Senior Member


Joined: 31 Oct 2006
Posts: 1042
Location: Richmond, Virginia

PostPosted: Thu Apr 22, 2010 6:26 pm
Reply with quote

Any chance you're doing these on different platforms; e.g., mainframe and UNIX?
Back to top
View user's profile Send private message
rajesh1183

New User


Joined: 07 Jan 2008
Posts: 98
Location: Hyderabad

PostPosted: Thu Apr 22, 2010 6:29 pm
Reply with quote

All my SAS programs run on Mainframe only...
Back to top
View user's profile Send private message
Phrzby Phil

Senior Member


Joined: 31 Oct 2006
Posts: 1042
Location: Richmond, Virginia

PostPosted: Thu Apr 22, 2010 6:42 pm
Reply with quote

Even though your numbers match to 2 decimal places, since they are stored in binary, they are not all zero bits to the end, due to the way binary and decimal convert.
Can you input these two differing values in a data step using the $binary format:

Code:
put @1 a $binary64.;


and let's see what we've got?
Back to top
View user's profile Send private message
rajesh1183

New User


Joined: 07 Jan 2008
Posts: 98
Location: Hyderabad

PostPosted: Thu Apr 22, 2010 8:09 pm
Reply with quote

Here they are...

From Compare dataset
Code:

0100010100110111001110000001010100011110101110000101000111101010 - For 24441 observation
0100011000101011111001111010101011000010100011110101110000101000 - For 24553 observation

From Base dataset
Code:

0100010100110111001110000001010100011110101110000101000111101001 - For 24441 observation
0100011000101011111001111010101011000010100011110101110000100111 - For 24553 observation


Now, I can see that last 4 bits differ in both the records. Now, can u let me know, how to convert this binary to DEC.
Back to top
View user's profile Send private message
Phrzby Phil

Senior Member


Joined: 31 Oct 2006
Posts: 1042
Location: Richmond, Virginia

PostPosted: Thu Apr 22, 2010 9:29 pm
Reply with quote

BTW - I should have said to use binary64. (sans the $), but it does work with the $ version.

Notice that they differ by just one bit - the rightmost - least significant.

Binary arighmetic:
1001 + 0001 = 1010
0111 + 0001 = 1000

Somehow, due to the way the numbers are aggregated by your PROC SUMMARY, and maybe the order each run is getting the numbers, the accumulated values differ by the least possible amount. I don't know why.

Solution: do not use exact, but rather whatever option COMPARE has that tells it how precisely to compare.

It is always interesting and non-trivial when this happens.

To pursue it further to understand it, can you manually (e.g., in data step code) do the same arithmetic SUMMARY is doing for this var, checking each intermediate value via binary64., to see what's going on?
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> All Other Mainframe Topics

 


Similar Topics
Topic Forum Replies
No new posts Compare only first records of the fil... SYNCSORT 7
No new posts Compare two files with a key and writ... SYNCSORT 3
No new posts Compare latest 2 rows of a table usin... DB2 1
No new posts How to compare two rows of same table DB2 11
This topic is locked: you cannot edit posts or make replies. Compare files with duplicates in one ... DFSORT/ICETOOL 11
Search our Forums:

Back to Top