Efficient File Processing - COBOL or Easytrieve

Sivagurunathan Jagadeesan · New User Joined: 06 Mar 2007 Posts: 32 Location: India

Hi,

I have a general question on efficient file processing, given below is the scenario,

File1 – 265,240,770 records --> main driver file
File2 – 3,489,159 records

I am comparing file-1 with file-2 on a key field which is unique in file-1 and writing the matched/ unmatched records to, two different output files.

Currently I have designed the process in SAS, which is executing for a long duration (approx 5 hrs).

I am planning to redesign this using Easytrieve or COBOL instead of using SAS.

I would like to know which of the two (Easytrieve or COBOL) will be efficient in this case, please let me know.

Thanks,
Siva

PeterHolland · Posted: Fri Dec 04, 2009 12:36 pm

Maybe you show your SAS coding, so we can see if we can tune that.

Sivagurunathan Jagadeesan · New User Joined: 06 Mar 2007 Posts: 32 Location: India

Hi Peter,

I have given the code below, please dont mind the variable names

OPTIONS OBS=MAX;

/************************************************************
/* DEFINE THE FILE 1 */
/************************************************************
DATA FILE1;

INFILE IWD815A;
INPUT
@1 A $CHAR8.
@9 KEY $CHAR12.
@21 HOLDING PD8.4
@29 CO PD8.2
@45 UNR PD8.2
@63 P PD8.7
@71 DATE $CHAR10.
;

CURRENCY='USD' ;
MKT=CO+UNR;

IN_NBR=SUBSTR(KEY,1,8);

/************************************************************/* DEFINE FILE2

DATA FILE2;

INFILE IWD815B;
INPUT
@1 AAAAAAAAA $CHAR12.
@13 SAAEC $CHAR8.
@25 KEY $CHAR12.
@38 DESCRIPTION $CHAR30.
@158 BBBBBBBCODE $CHAR8.
@166 SUB_CODE $CHAR8.
@265 PRICE PD7.9
@437 SIN $CHAR12.
;

IF KEY=' ' THEN DELETE;

/********************************************************************/
/* MERGE THE TWO DATASETS ON KEY */
/********************************************************************/
DATA SEC_MERGE_FILE;
MERGE FILE1 (IN=ONE)
FILE2 (IN=TWO)
;
BY KEY;

IF ONE AND TWO THEN
DO;
FILE OWD815A;
PUT
---> I write 13 variable to o/p file from the two i/p files
;
END;

IF ONE AND NOT TWO THEN
DO;

FILE OWD815B;
PUT
---> Again, I write 13 variable to o/p file from the two i/p files,
for further processing in the next step of the Procedure
;
END;
RUN;

Apart from this, I have a question on numeric precision in SAS, I have a PD field with length PD12.7 for which the decimal portion is distorted, an e.g.

I/P Value - 50000000001489.8945550
O/P Value - 50000000001489.8823168

I tried different options but couldn’t solve it, can you let me know if this can be handled in SAS

Thanks a lot,
Siva

PeterHolland · Posted: Fri Dec 04, 2009 2:17 pm

So what i see is you copy your files into SAS datasets.
Are they sorted on the key?
I dont see index definitions for your SAS datasets which could
speed up things considerably. by using a keyed merge.

expat · Posted: Fri Dec 04, 2009 2:20 pm

Also, your sort product would probably be the fastest and most efficient option.

Take a look at the DFSORT forum and see the many examples already available.

Sivagurunathan Jagadeesan · New User Joined: 06 Mar 2007 Posts: 32 Location: India

Yes the files are sorted on the key field on which they are merged. I am not aware of ‘index definition’ for SAS dataset, will check this out.

The sort is done using DFSORT, this sort step alone executes for 50 mins, for which i/p record count is 405,929,659 and the o/p is approx 50% of the input.

The SORT is as below,
SORT FIELDS=(9,12,CH,A)
INCLUDE COND=(62,1,CH,EQ,C'U')

I will check the DFSORT forum for performance improvements.

Thanks,
Siva

PeterHolland · Posted: Fri Dec 04, 2009 2:55 pm

LIBNAME SASDATA 'SAS library name';
PROC DATASETS NOLIST DETAILS LIBRARY=SASDATA MT=DATA; MODIFY sasdataset name;
INDEX CREATE index name = (field1 ... fieldn);
QUIT;
RUN;

Robert Sample · Posted: Fri Dec 04, 2009 5:50 pm

1. If all you're doing is outputting the files, change DATA SEC_MERGE_FILE; to DATA _NULL_; -- that will keep SAS from building an output WORK data set of your merged data; if it is not needed then you'll save a lot of time.

2. Restructure your code to have one IF statement -- the more statements SAS has to execute, the longer it'll take. This code should help:

raghavmcs · Active User Joined: 14 Jul 2005 Posts: 105

I have never used SAS compare logic but easytrieve file matching logic I have mostly seen.
As you have pre-sorted the file and after implementing the solution which Robert and Peter provided there could be close finish between these two methods.

PeterHolland · Posted: Sat Dec 05, 2009 3:47 pm

So after sorting and building an index you could use direct merge
to obtain your outputs :

neontech · New User Joined: 08 Dec 2009 Posts: 11 Location: mumbai

Apart from the above techniques.. you can try using the below REXX code.. its quite efficient..

Jobcard//**

//STEP010 EXEC PGM=IKJEFT01,DYNAMNBR=50,
// PARM='%TWOFILE 1/10 - 1/10'
//SYSPROC DD DISP=SHR,DSN=SYSS.COMSVCS.ISPCLIB
//SYSTSIN DD *
//SYSTSPRT DD SYSOUT=*
//FILEONE DD DSN=Your.file.SUSP9,DISP=SHR
//FILETWO DD DSN=HMEAT.CHINTAN.TEST.SUSPENSE,DISP=SHR
//ONEONLY DD DSN=Your.file.BATCH.OUT91,DISP=(NEW,CATLG,DELETE)
//TWOONLY DD DSN=Your.file.TEST.HETNSE,DISP=(NEW,CATLG,DELETE)
//ONEBOTH DD DSN=Your.file.BATCH.OUT92,DISP=(NEW,CATLG,DELETE)
//TWOBOTH DD DSN=Your.file.TEST.OUENSE,DISP=(NEW,CATLG,DELETE)
//ALLDUPS DD DSN=Your.file.BATCH.OUT93,DISP=(NEW,CATLG,DELETE)
//ALLNONDP DD DSN=You.file.TEST.TEST,DISP=(NEW,CATLG,DELETE)

The file keys are passed as a parm to the rexx exec
'%twofile p/l p/l p/l - p/l p/l p/l'

P is the starting position and L = lenght >>> let me know if you need any more help

expat · Posted: Tue Dec 08, 2009 5:06 pm

enrico-sorichetti · Posted: Tue Dec 08, 2009 11:41 pm