View previous topic :: View next topic
|
Author |
Message |
Sivagurunathan Jagadeesan
New User
Joined: 06 Mar 2007 Posts: 32 Location: India
|
|
|
|
Hi,
I have a general question on efficient file processing, given below is the scenario,
File1 – 265,240,770 records --> main driver file
File2 – 3,489,159 records
I am comparing file-1 with file-2 on a key field which is unique in file-1 and writing the matched/ unmatched records to, two different output files.
Currently I have designed the process in SAS, which is executing for a long duration (approx 5 hrs).
I am planning to redesign this using Easytrieve or COBOL instead of using SAS.
I would like to know which of the two (Easytrieve or COBOL) will be efficient in this case, please let me know.
Thanks,
Siva |
|
Back to top |
|
|
PeterHolland
Global Moderator
Joined: 27 Oct 2009 Posts: 2481 Location: Netherlands, Amstelveen
|
|
|
|
Maybe you show your SAS coding, so we can see if we can tune that. |
|
Back to top |
|
|
Sivagurunathan Jagadeesan
New User
Joined: 06 Mar 2007 Posts: 32 Location: India
|
|
|
|
Hi Peter,
I have given the code below, please dont mind the variable names
OPTIONS OBS=MAX;
/************************************************************
/* DEFINE THE FILE 1 */
/************************************************************
DATA FILE1;
INFILE IWD815A;
INPUT
@1 A $CHAR8.
@9 KEY $CHAR12.
@21 HOLDING PD8.4
@29 CO PD8.2
@45 UNR PD8.2
@63 P PD8.7
@71 DATE $CHAR10.
;
CURRENCY='USD' ;
MKT=CO+UNR;
IN_NBR=SUBSTR(KEY,1,8);
/************************************************************/* DEFINE FILE2
DATA FILE2;
INFILE IWD815B;
INPUT
@1 AAAAAAAAA $CHAR12.
@13 SAAEC $CHAR8.
@25 KEY $CHAR12.
@38 DESCRIPTION $CHAR30.
@158 BBBBBBBCODE $CHAR8.
@166 SUB_CODE $CHAR8.
@265 PRICE PD7.9
@437 SIN $CHAR12.
;
IF KEY=' ' THEN DELETE;
/********************************************************************/
/* MERGE THE TWO DATASETS ON KEY */
/********************************************************************/
DATA SEC_MERGE_FILE;
MERGE FILE1 (IN=ONE)
FILE2 (IN=TWO)
;
BY KEY;
IF ONE AND TWO THEN
DO;
FILE OWD815A;
PUT
---> I write 13 variable to o/p file from the two i/p files
;
END;
IF ONE AND NOT TWO THEN
DO;
FILE OWD815B;
PUT
---> Again, I write 13 variable to o/p file from the two i/p files,
for further processing in the next step of the Procedure
;
END;
RUN;
Apart from this, I have a question on numeric precision in SAS, I have a PD field with length PD12.7 for which the decimal portion is distorted, an e.g.
I/P Value - 50000000001489.8945550
O/P Value - 50000000001489.8823168
I tried different options but couldn’t solve it, can you let me know if this can be handled in SAS
Thanks a lot,
Siva |
|
Back to top |
|
|
PeterHolland
Global Moderator
Joined: 27 Oct 2009 Posts: 2481 Location: Netherlands, Amstelveen
|
|
|
|
So what i see is you copy your files into SAS datasets.
Are they sorted on the key?
I dont see index definitions for your SAS datasets which could
speed up things considerably. by using a keyed merge. |
|
Back to top |
|
|
expat
Global Moderator
Joined: 14 Mar 2007 Posts: 8796 Location: Welsh Wales
|
|
|
|
Also, your sort product would probably be the fastest and most efficient option.
Take a look at the DFSORT forum and see the many examples already available. |
|
Back to top |
|
|
Sivagurunathan Jagadeesan
New User
Joined: 06 Mar 2007 Posts: 32 Location: India
|
|
|
|
Yes the files are sorted on the key field on which they are merged. I am not aware of ‘index definition’ for SAS dataset, will check this out.
The sort is done using DFSORT, this sort step alone executes for 50 mins, for which i/p record count is 405,929,659 and the o/p is approx 50% of the input.
The SORT is as below,
SORT FIELDS=(9,12,CH,A)
INCLUDE COND=(62,1,CH,EQ,C'U')
I will check the DFSORT forum for performance improvements.
Thanks,
Siva |
|
Back to top |
|
|
PeterHolland
Global Moderator
Joined: 27 Oct 2009 Posts: 2481 Location: Netherlands, Amstelveen
|
|
|
|
LIBNAME SASDATA 'SAS library name';
PROC DATASETS NOLIST DETAILS LIBRARY=SASDATA MT=DATA; MODIFY sasdataset name;
INDEX CREATE index name = (field1 ... fieldn);
QUIT;
RUN; |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
1. If all you're doing is outputting the files, change DATA SEC_MERGE_FILE; to DATA _NULL_; -- that will keep SAS from building an output WORK data set of your merged data; if it is not needed then you'll save a lot of time.
2. Restructure your code to have one IF statement -- the more statements SAS has to execute, the longer it'll take. This code should help:
Code: |
IF ONE
THEN DO;
IF NOT TWO
THEN DO;
FILE OWD815B ;
PUT ... ;
END;
ELSE DO;
FILE OWD815A ;
PUT ... ;
END;
END;
|
3. If you don't have BUFFERS on your input and output DD statements, you should add at least BUFFERS=100 to each. This will increase the memory requirement, but will speed processing up (drastically if you don't have any BUFFERS now).
Also, learn to use the BBcode feature -- it makes code much more readable. |
|
Back to top |
|
|
raghavmcs
Active User
Joined: 14 Jul 2005 Posts: 105
|
|
|
|
I have never used SAS compare logic but easytrieve file matching logic I have mostly seen.
As you have pre-sorted the file and after implementing the solution which Robert and Peter provided there could be close finish between these two methods. |
|
Back to top |
|
|
PeterHolland
Global Moderator
Joined: 27 Oct 2009 Posts: 2481 Location: Netherlands, Amstelveen
|
|
|
|
So after sorting and building an index you could use direct merge
to obtain your outputs :
Code: |
%MACRO GETKEY(SRC_DS=,KEY_ID=,OK_PROC=,ERR_PROC=);
SET &SRC_DS KEY = &KEY_ID /UNIQUE;
SELECT (_IORC_);
WHEN (%SYSRC(_SOK)) DO;
LINK &OK_PROC;
END;
WHEN (%SYSRC(_DSENOM)) DO;
_ERROR_ = 0;
LINK &ERR_PROC;
END;
OTHERWISE DO;
PUT 'UNEXPECTED ERROR: _IORC_ = ' _IORC_;
END;
END;
%MEND GETKEY;
DATA _NULL_;
;
SET FILE2;
%GETKEY(SRC_DS=FILE1,KEY_ID=KEY,
OK_PROC=OK_PROC,ERR_PROC=ERR_PROC);
DELETE;
RETURN;
OK_PROC:
FILE OWD815A;
PUT;
RETURN;
ERR_PROC:
FILE OWD815B;
PUT;
RETURN;
RUN;
|
|
|
Back to top |
|
|
neontech Warnings : 1 New User
Joined: 08 Dec 2009 Posts: 11 Location: mumbai
|
|
|
|
Apart from the above techniques.. you can try using the below REXX code.. its quite efficient..
Jobcard//**
//STEP010 EXEC PGM=IKJEFT01,DYNAMNBR=50,
// PARM='%TWOFILE 1/10 - 1/10'
//SYSPROC DD DISP=SHR,DSN=SYSS.COMSVCS.ISPCLIB
//SYSTSIN DD *
//SYSTSPRT DD SYSOUT=*
//FILEONE DD DSN=Your.file.SUSP9,DISP=SHR
//FILETWO DD DSN=HMEAT.CHINTAN.TEST.SUSPENSE,DISP=SHR
//ONEONLY DD DSN=Your.file.BATCH.OUT91,DISP=(NEW,CATLG,DELETE)
//TWOONLY DD DSN=Your.file.TEST.HETNSE,DISP=(NEW,CATLG,DELETE)
//ONEBOTH DD DSN=Your.file.BATCH.OUT92,DISP=(NEW,CATLG,DELETE)
//TWOBOTH DD DSN=Your.file.TEST.OUENSE,DISP=(NEW,CATLG,DELETE)
//ALLDUPS DD DSN=Your.file.BATCH.OUT93,DISP=(NEW,CATLG,DELETE)
//ALLNONDP DD DSN=You.file.TEST.TEST,DISP=(NEW,CATLG,DELETE)
The file keys are passed as a parm to the rexx exec
'%twofile p/l p/l p/l - p/l p/l p/l'
P is the starting position and L = lenght >>> let me know if you need any more help |
|
Back to top |
|
|
expat
Global Moderator
Joined: 14 Mar 2007 Posts: 8796 Location: Welsh Wales
|
|
|
|
neontech wrote: |
Apart from the above techniques.. you can try using the below REXX code.. its quite efficient..
|
I would guess that this is a piece of site specific code. Sure as s**t don't work in my shop. |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10889 Location: italy
|
|
|
|
Quote: |
Apart from the above techniques.. you can try using the below REXX code.. |
the sample You posted is site specific
You did not post any code, so nobody can take advantage of Your reply
Quote: |
it is quite efficient.. |
quite hard to believe
the common agreement is that for more than couple thousands record REXX is quite a dog
-- benchmarked !
for file matching the best option would be in all cases the sort product installed
a sort ( IBM, Syincsort, CAsort, any sort ) is always installed
easytrieve, sas, fileaid,filemanager might not |
|
Back to top |
|
|
|