jzhardy
Active User
Joined: 31 Oct 2006 Posts: 139 Location: brisbane
|
|
|
|
I had a day off work recently and, being something of a nerd decided to run some performance tests on a Z16.
the test was to convert text 256 to upper case. The four tests were:
1. third party library function - generated in Trace.
2. above, but generated in non-trace
3. COBOL - using intrinsic function UPPER-CASE
4. HLASM module I wrote to exploit SIMD instruction set with vectors. Cheated slightly by bulk loading INSTR into V0-V15.
the test harness was written in COBOL and followed the form:
Code: |
MOVE 10000000 TO WS-CNT
PERFORM UNTIL WS-CNT = ZERO
CALL <FUNCTION> USING INSTR,OUTSTR
<repeated 100 times>
SUBTRACT 1 FROM WS-CNT
END-PERFORM
|
the results surprised me and made me wonder if I'd made a coding error, but can't see anything wrong with my approach. ws-cnt was defined as PIC 9(12).
I had to scale up the results for 1 and 2 above because they ran dog slow. Here they are :
1. 100,000 Ops => 3.88 seconds or 1,000,000,000 => 38800.00 seconds
2. 100,000 Ops => 2.43 seconds or 1,000,000,000 => 24300.00 seconds
3. 1,000,000,000 => 46.35 seconds
4. 1,000,000,000 => 11.64 seconds
some questions that arise from this :
- Does the latest enterprise COBOL exploit all the SIMD features across intrinsic functions? (I'm kind of guessing yes from the above)
- Does the common LE exploit SIMD ?
- The times above were end to end duration , not CPU time. I'll redo the test when I get a change.
Clearly, if my results above are true, then the Z16 is not just 'another CPU'. It's an absolute beast. |
|
Allan Winston
New User
Joined: 02 Oct 2021 Posts: 1 Location: United States
|
|
|
|
Regarding whether or not advanced processor features are exploited by the COBOL compiler for this specific code fragment, I would recommend using the LIST compiler option in conjunction with various settings of the ARCH compiler option. While an ARCH value of 14 is for the z16 (and all preceding processors), it may well be that a lower value for ARCH is all that is needed to generate the fastest code. |
|
jzhardy
Active User
Joined: 31 Oct 2006 Posts: 139 Location: brisbane
|
|
|
|
interesting - when I checked the listing, the arch level in effect was 7, not 14 as specified in my PARM parameter.
I suspect this is to do with the fact that I was using CWPCMAIN (the Xpediter compiler) which seems to override ARCH. Possibly a local environment setting rather than a limitation in XPediter.
I'll go back to IGYCRCTL when I get some time ... |
|