Joined: 24 Jul 2012 Posts: 10 Location: United States
Hi all. This is my first post in the forums. I have found valuable information here in the past.
My site uses SYNCSORT instead of DFSORT (for whatever reason). I have written an E15 exit, used in a copy, in COBOL. I believe that the program is being loaded, released and then reloaded each time the next record is passed.
I placed a counter in the program and incremented it by 1 for each record passed. At the end I displayed the counter and it was 0. If this is truly happening, it would explain, I think, the severe run time that I have seen in testing this exit.
Joined: 24 Jul 2012 Posts: 10 Location: United States
I have poured over the SYNCSORT manual backward and forward. The manual doesn't give any details as to any compiler options. Here are the guts of the exit code:
Code:
002000 LINKAGE SECTION.
002100 01 WK-RECORD-FLAG PIC 9(8) COMP.
002200 88 FIRST-REC VALUE 00.
002300 88 MIDDLE-REC VALUE 04.
002400 88 END-REC VALUE 08.
002500 01 WK-IN-REC.
002600 05 WK-IN-REC-BYTE OCCURS 1 TO 16126 TIMES
002700 DEPENDING ON WK-IN-LEN PIC X.
002800 01 WK-OUT-REC.
002900 05 WK-OUT-REC-BYTE OCCURS 1 TO 16126 TIMES
003000 DEPENDING ON WK-OUT-LEN PIC X.
003100 01 WK-UNUSED-1 PIC 9(8) COMP.
003200 01 WK-UNUSED-2 PIC 9(8) COMP.
003300 01 WK-IN-LEN PIC 9(8) COMP.
003400 01 WK-OUT-LEN PIC 9(8) COMP.
003500 01 WK-UNUSED-3 PIC 9(8) COMP.
003600 01 WK-EXITAREA-LEN PIC 9(4) COMP.
003700 01 WK-EXITAREA.
003800 05 WK-EXITAREA-BYTE OCCURS 1 TO 256 TIMES
003900 DEPENDING ON WK-EXITAREA-LEN PIC X.
004000*
004000*
004100 PROCEDURE DIVISION USING WK-RECORD-FLAG,
004200 WK-IN-REC,
004300 WK-OUT-REC,
004400 WK-UNUSED-1,
004500 WK-UNUSED-2,
004600 WK-IN-LEN,
004700 WK-OUT-LEN,
004800 WK-UNUSED-3,
004900 WK-EXITAREA-LEN,
005000 WK-EXITAREA.
005100*
005200 IF END-REC
005300 MOVE 8 TO RETURN-CODE
005400 DISPLAY 'RECORD COUNT ' WK-REC-COUNT
005500 GO TO Z1000
005600 END-IF.
005700*
005800 IF WK-IN-REC (23:1) EQUAL '0'
005900 MOVE 0 TO RETURN-CODE
006000 GO TO Z1000
006100 END-IF.
006400*
006500* EXPAND THE MASTER
006600*
006700 MOVE WK-IN-LEN TO WK-COMPRESSED-LEN.
006800 ADD +4 TO WK-COMPRESSED-LEN.
006900 MOVE WK-IN-REC TO WK-COMPRESSED-DATA.
007000*
007300 CALL 'IMMSEP' USING WK-EXPANDED-MASTER
007400 WK-COMPRESSED-MASTER.
007600 MOVE 16126 TO WK-OUT-LEN.
007700 MOVE WK-EXPANDED-DATA TO WK-OUT-REC.
007800 MOVE 20 TO RETURN-CODE.
007900 ADD +1 TO WK-REC-COUNT.
008000*
008100 Z1000.
008200 GOBACK.
I have also compiled this program with NODYNAM option to static link that called module IMMSEP. This purpose of this exit to take our compressed master file and put it through our expansion routine to access the compressed trailers at the end of the record.
According to STROBE, the two "Most Intensively Executed Procedures" were:
Code:
#IEP ** MOST INTENSIVELY EXECUTED PROCEDURES **
MODULE SECTION LINE PROCEDURE START % CPU TIME
NAME NAME NUMBER NAME LOC SOLO TOTAL
I am just puzzled. To sort a sample dataset of 2000 records, it consumes 0.15 CPU minutes. A master file from a test environment used 10.99 CPU for 140000 records and ran 2.5 hours. Our production master has 1.5 million records!
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
It is not compiler options,
There is a chapter in the SyncSort manual on writing exits, which covers writing them in COBOL.
Peter Holland wrote:
Did you use the right parameters to tell Syncsort it is a Cobol exit
If you check your MODS statement, I think you'll find that you have told SyncSort that it is an Assembler program ( by telling it not to Linkedit it, its already done, without realising it is only possible for Assembler programs to be linkedited on the fly).
As you surmised, you COBOL program is being loaded, executed, effectively unloaded, for every record. This wil be slow, and surprisingly enormously slow.
With the correct MODS you will be able to breath again.
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Well, MODS looks OK, assuming the memory size is correct.
What confuses me is the RECORD statement. You are saying F and length of 300, with output length from the E15 as 300.
Yet your E15 that you have shown is obviously for V-type records.
Depending on your version of SyncSort, the RECORD statement may not even be being used. So I'd guess you are on at least 1.4something. Can you confirm what you have?
In the exit itself, you make your output reocds fixed at 16000-whatever, but for variable records.
You then cut the records down, unconditionally, in OUTREC.
Can we assume this is new code?
What is the RECFM of your input dataset?
LRECL of your input dataset?
What does the CALLed sub-program do, in general terms? IO? DB Access? Normalise multiple OCCURS DEPENDING ON?
Can you show the compile options and the Linkedit/binder output?
Why not provide the 300-byte cutdown records in the exit, and do the CONVERT only (assuming the input is V, otherwise I don't know how your exit would work at all)?
What are you getting on the output file?
Can you show the sysout from the small testng step?
A lot of data is being thrown around by the time you get to 1,500,000 records, because you decompress routine requres the RDW to be present (it seems) and you need the data in the output record.
At the moment, I can't help thinking a simple COBOL program would do a lot less data-tossing, even without the improved IO SORT will give. It may be worth considering. Once you have the exit cracked, it gives you a comparison of times, plus an output file to verify the data from your SORT step.
The OMIT is simple, you have "ohter" records which are just truncated/padded to 300, and then extract the data you actually need, from the master, perhaps wihthout having to de-compress.
Joined: 24 Jul 2012 Posts: 10 Location: United States
Quote:
At the moment, I can't help thinking a simple COBOL program would do a lot less data-tossing, even without the improved IO SORT will give.
First off, I can't help but agree. But I've got some systems guys telling me otherwise so I'm having to chase this rabbit.
In regards to the RECORD statement, that was given to me as a suggestion, but after my previous post, I researched that more and arrived a different RECORD statement.
As for the version of SYNCSORT: SYNCSORT FOR Z/OS 1.3.2.2R
Quote:
What does the CALLed sub-program do, in general terms? IO? DB Access? Normalise multiple OCCURS DEPENDING ON?
Ok, this input file is VB up to a max LRECL 16130 with RDW. It is stored in what we call a compressed format. When compressed, each record has a fixed portion that is also present out to 3000 bytes which is always present. We current create a small extract of this file using this method, but the needed fields have all been in this first 3000 bytes.
Of those 3000 bytes, the last several are flags that indicate which sections of the master file are present. We call them trailers. Each record can have different trailers present depending upon which feature that account uses. For example, account 1 may have trailer 1, 3 and 5. When we store the master record, we push the data of trailer 1 and 3 and 5 together and do not store the "empty" space of trailer 2 and 4. This routine that I call here, IMMSEP, interprets these flags and moves the data in the record out to the proper place in the record which will correspond to COBOL copybook of the master file. Because each account record can have differing trailers, the only predictable way to get to this data in trailers is to pass it through this expansion routine.
When the record is returned from IMMSEP, trailer 1, 3 and 5 data will be at the proper position for each trailer. IMMSEP does expect the RDW to be there, you are correct, so I had to make sure it was there when calling the routine.
I have a requirement to extract data now that exists in one of these trailers. Normally, I would write a COBOL program that reads the master, expands it, formats the extract record and writes it out. But again, there are those in power above me that truly believe this is this most efficient way to do it.
In this test exit, I am moving the last possible trailer, which is final 250 bytes of the expanded 16130 byte record, to the output record as a test. The test output file is correct.
Quote:
Can you show the compile options and the Linkedit/binder output?
Can you show the sysout from the small testng step?
Code:
SYNCSORT FOR Z/OS 1.3.2.2R U.S. PATENTS: 4210961, 5117495 (C) 2007 SYNCSORT INC. DATE=2014/073 TIME=13.37.42
FIDELITY INFO C/O BANCWEST CORP. z/OS 1.13.0
SYNCSORT LICENSED FOR CPU SERIAL NUMBER XXXXX, MODEL XXXX 504 LICENSE/PRODUCT EXPIRATION DATE: 02 JUN 2018
SYSIN :
OMIT COND=(27,1,CH,EQ,C'0') 00001500
SORT FIELDS=COPY 00001600
MODS E15=(IMSRTEXP,40864,,C)
OUTREC FIELDS=(5,22,231,3,259,1,1910,1,1294,4,19C' ',
15881,250),CONVERT
RECORD TYPE=V,LENGTH=(16130,16130,300) 00001800
END 00001900
RECORD COUNT 000000000
WER276B SYSDIAG= 1969406, 2418996, 2418996, 447300
WER164B 9,004K BYTES OF VIRTUAL STORAGE AVAILABLE, MAX REQUESTED,
WER164B 0 BYTES RESERVE REQUESTED, 3,236,352 BYTES USED
WER108I SORTIN : RECFM=VB ; LRECL= 16130; BLKSIZE= 27998
WER237I OUTREC RECORD LENGTH = 300
WER110I SORTOUT : RECFM=FB ; LRECL= 300; BLKSIZE= 27000
WER410B 5,888K BYTES OF VIRTUAL STORAGE AVAILABLE ABOVE THE 16MEG LINE,
WER410B 0 BYTES RESERVE REQUESTED, 100K BYTES USED
WER055I INSERT 0, DELETE 1
WER449I SYNCSORT GLOBAL DSM SUBSYSTEM ACTIVE
WER416B SORTIN : EXCP'S=22,UNIT=3390,DEV=5347,CHP=(F0F1F2F3F4F5,1),VOL=BTLV24
WER416B SORTOUT : EXCP'S=1,UNIT=3390,DEV=522C,CHP=(F0F1F2F3F4F5,1),VOL=BTMV10
WER416B TOTAL OF 23 EXCP'S ISSUED FOR COPYING
WER054I RCD IN 2000, OUT 1999
WER169I RELEASE 1.3 BATCH 0513 TPF LEVEL 2.2
WER052I END SYNCSORT - IMSRTEXJ,PS0002,,DIAG=CC00,5300,E000,006C,84FE,69AA,2648,E6E4
As you can see, I did modify the RECORD statement after studying the manual on that. Didn't really see any improvement though.
AND, I have another interesting bit of information that I was wondering about. When I first started testing this, I did not code a PARM statement on with the EXEC. Obviously SYNCSORT assumed based on defaults COBEXIT=COB2. But the SYNCCOB module was not found. I asked the person that supports the product in my shop, but he hasn't not responded.
Out of curiosity, I added the PARM='COBEXIT=COB1' to the EXEC statement to see what would happen. When I did this, SYNCSORT began executing the routine. BUT, I just don't think COB1 should be correct. We clearly use ENTERPRISE COBOL FOR Z/OS 4.2.0. COB1 is supposed to be: "COBEXIT=COB1 specifies that COBOL exits should use either OS/VS COBOL libraries or no libraries at all." While COB2 says "COBEXIT=COB2, the default, specifies that COBOL exits should use VS COBOL II or Enterprise COBOL for z/OS libraries."
I am wondering if that is not the root of my issue here and that the systems guys need to get the additional SYNCCOB and SYNCCBIF modules added per the documentation. I don't even know where to go to double check the SYNCSORT installed options or defaults.
Your insights and thoughts have been helpful to this point.
Thoughts?
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
Thanks for the details.
Yes, I think the problem is the COBEXIT=COB2. I don't know, but my guess at 99.999% is that SYNCCOB or its friend are establishing a reusable COBOL run-time enviroment. Without them, Language Environment is firing up a run-unit, including your programs and lots of others stuff, processing your programs, and closing the run-unit down again, for each record.
If your Support can get COBEXIT=COB2 working such that it finds any SyncSort-supplied modules which it requires, then I think you'll have a runner. SyncSort support will be happy to help if there is some problem in achieving this.
SORT IO is much faster than COBOL IO. There is some overhead in SORT calling the exit. There is some overhead in CALLing your sub-program, but you've minimised that with the static CALL.
Becasue the sub-program requires the RDW, it looks like it was originally for use from Assembler. COBOL records in an FD don't have the RDW, unless you process them as RECFM=U-type and do the "unblocking" yourself (you read, you get a block as a "record", you can see all the RDWs).
There's a horrible trick for getting at the RDW, but no need to go into that unless there is a need.
Because the RDW is not there naturally, you end up with some big MOVEs to firstly prefix the RDW and then remove it. This will be the same in any COBOL program.
I'd suggest a separate version of the sub-program where the record-lengths are passed separately. For the exit, this is then CALLed with the parameters receiving/sending information from/to SORT, and there a no MOVEs related to the data at all.
This different version sub-routine will give any COBOL program a performance boost, as 16,000 x 1,500,000 x 2 bytes are no longer MOVEd around (plus two RDWs I've not counted).
Your 16,000 bytes are also defined with OCCURS DEPENDING ON. There is further overhead for MOVEing those around. It will be faster to MOVE fixed-length data representing the maximum lengths.
The RECORD statement remains interesting. The 300 is not needed (you're not using an E35). I don't think the RECORD is needed at all, since you are not *solely* supplying records from the exit (SORT is passing them to you, read from SORTIN, rather that no SORTIN present and you "doing something to get records" and inserting those into the SORT).
I think this should cover most of the issues, but please let us know how it goes.
Joined: 24 Jul 2012 Posts: 10 Location: United States
Bill,
As an update, my systems guys are working to get the missing modules for SYNCSORT to properly do this with a COBOL exit.
In the meantime, I am attempting to develop an assembler program that will at least solve my immediate need.
I understand that in a COPY, the E15 is actually executed in PHASE 3. If I understand correctly, AFTER INCLUDE/OMIT processing. If so, if the input record has been omitted, does SYNCSORT still execute the E15 exit?
The reason I ask is that I have been used Expeditor to diagnose my assembler program. The very first record on my input file will be omitted by control statements. It looks like in the E15 there is no data at the address of the input record when in expeditor and the only explination I can come up with is that the exit was called even though the record is being omitted.