IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

exploiting Z16 performance


IBM Mainframe Forums -> PL/I & Assembler
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
jzhardy

Active User


Joined: 31 Oct 2006
Posts: 137
Location: brisbane

PostPosted: Tue Aug 08, 2023 3:58 pm
Reply with quote

I had a day off work recently and, being something of a nerd decided to run some performance tests on a Z16.

the test was to convert text 256 to upper case. The four tests were:

1. third party library function - generated in Trace.
2. above, but generated in non-trace
3. COBOL - using intrinsic function UPPER-CASE
4. HLASM module I wrote to exploit SIMD instruction set with vectors. Cheated slightly by bulk loading INSTR into V0-V15.

the test harness was written in COBOL and followed the form:
Code:
MOVE 10000000 TO WS-CNT
PERFORM UNTIL WS-CNT = ZERO
  CALL <FUNCTION> USING INSTR,OUTSTR
  <repeated 100 times>
  SUBTRACT 1 FROM WS-CNT
END-PERFORM


the results surprised me and made me wonder if I'd made a coding error, but can't see anything wrong with my approach. ws-cnt was defined as PIC 9(12).

I had to scale up the results for 1 and 2 above because they ran dog slow. Here they are :

1. 100,000 Ops => 3.88 seconds or 1,000,000,000 => 38800.00 seconds
2. 100,000 Ops => 2.43 seconds or 1,000,000,000 => 24300.00 seconds
3. 1,000,000,000 => 46.35 seconds
4. 1,000,000,000 => 11.64 seconds

some questions that arise from this :

- Does the latest enterprise COBOL exploit all the SIMD features across intrinsic functions? (I'm kind of guessing yes from the above)
- Does the common LE exploit SIMD ?
- The times above were end to end duration , not CPU time. I'll redo the test when I get a change.

Clearly, if my results above are true, then the Z16 is not just 'another CPU'. It's an absolute beast.
Back to top
View user's profile Send private message
Allan Winston

New User


Joined: 02 Oct 2021
Posts: 1
Location: United States

PostPosted: Wed Aug 09, 2023 4:18 am
Reply with quote

Regarding whether or not advanced processor features are exploited by the COBOL compiler for this specific code fragment, I would recommend using the LIST compiler option in conjunction with various settings of the ARCH compiler option. While an ARCH value of 14 is for the z16 (and all preceding processors), it may well be that a lower value for ARCH is all that is needed to generate the fastest code.
Back to top
View user's profile Send private message
jzhardy

Active User


Joined: 31 Oct 2006
Posts: 137
Location: brisbane

PostPosted: Wed Aug 09, 2023 5:36 am
Reply with quote

interesting - when I checked the listing, the arch level in effect was 7, not 14 as specified in my PARM parameter.

I suspect this is to do with the fact that I was using CWPCMAIN (the Xpediter compiler) which seems to override ARCH. Possibly a local environment setting rather than a limitation in XPediter.

I'll go back to IGYCRCTL when I get some time ...
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> PL/I & Assembler

 


Similar Topics
Topic Forum Replies
No new posts Two where-criteria with GT - Performa... DB2 4
No new posts JOIN STATEMENT PERFORMANCE. DFSORT/ICETOOL 12
No new posts Which SORT utility can improve the Pe... DFSORT/ICETOOL 16
No new posts COBOL Performance Tuning COBOL Programming 6
No new posts CICS Performance statistics CICS 3
Search our Forums:

Back to Top