Joined: 30 Nov 2013 Posts: 917 Location: The Universe
Over in the CICS forum that mentions the use of the GETMAIN and FREEMAIN macros and then goes on to claim the STORAGE macros are faster. This is a lie.
the following program produced these messages -
STORAGE REQUIRED 10% MORE CPU TIME THAN GETMAIN/FREEMAIN
GETMTIME = 6.540824, STORTIME = 7.228047
Code:
GETMTEST CSECT
USING *,12
SAVE (14,12),,*
LR 12,15
LA 15,SAVEAREA
ST 13,4(,15)
ST 15,8(,13)
LR 13,15
TIMEUSED STORADR=GETMSTRT,CPU=MIC,LINKAGE=SYSTEM
L 2,=A(64*1024)
LA 3,128
GETMLOOP GETMAIN RU,LV=(3)
MVC 0(4,1),GETMHDR
ST 1,GETMHDR
BCT 2,GETMLOOP
FREELOOP ICM 1,B'1111',GETMHDR
BZ GETTIME
MVC GETMHDR,0(1)
FREEMAIN RU,LV=(3),A=(1)
B FREELOOP
GETTIME TIMEUSED STORADR=GETMEND,CPU=MIC,LINKAGE=SYSTEM
L 2,=A(64*1024)
OBTGET STORAGE OBTAIN,LENGTH=(3)
MVC 0(4,1),GETMHDR
ST 1,GETMHDR
BCT 2,OBTGET
OBTFREE ICM 1,B'1111',GETMHDR
BZ OBTTIME
MVC GETMHDR,0(1)
STORAGE RELEASE,LENGTH=(3),ADDR=(1)
B OBTFREE
OBTTIME TIMEUSED STORADR=OBTEND,CPU=MIC,LINKAGE=SYSTEM
LG 1,GETMEND
SG 1,GETMSTRT
STG 1,GETMTIME
LG 0,OBTEND
SG 0,STORSTRT
STG 0,STORTIME
SGR 1,0
SR 0,0
M 0,=F'100'
D 0,GETMTIME+4
ST 1,PERCENT
CVD 1,DWORK
ED PCT,DWORK+6
L 1,GETMTIME+4
CVD 1,DWORK
* 3 4 5 6 7
* NNNNNNNNN
ED GETMD,DWORK+3
L 1,STORTIME+4
CVD 1,DWORK
ED STORD,DWORK+3
OPEN (PRINT,OUTPUT)
PUT PRINT,MSG
PUT PRINT,MSG2
CLOSE PRINT
L 13,4(,13)
RETURN (14,12),RC=0
SAVEAREA DC 9D'0'
GETMSTRT DC FD'0'
GETMEND DC 0FD'0'
STORSTRT DC FD'0'
OBTEND DC FD'0'
GETMTIME DC FD'0'
STORTIME DC FD'0'
DWORK DC PL8'0'
PERCENT DC F'0'
GETMHDR DC A(*-*)
PRINT DCB DSORG=PS,MACRF=PM,DDNAME=SYSPRINT,RECFM=VBA,LRECL=126
DC 0D'0'
LTORG ,
MSG DC AL2(MSGL,0),C' STORAGE REQUIRED'
PCT DC 0C' NNN',C' ',X'202120',C'% MORE CPU TIME THAN GETMAIN/F>
REEMAIN'
MSGL EQU *-MSG
MSG2 DC AL2(MSG2L,0),C' GETMTIME ='
GETMD DC 0C' NNN.NNNNNN',C' ',X'202120',C'.',6X'20',C', '
DC C'STORTIME ='
STORD DC 0C' NNN.NNNNNN',C' ',X'202120',C'.',6X'20'
MSG2L EQU *-MSG2
END GETMTEST
The program was run under Hercules, not real hardware. It would be interesting to run it on a regular z/ Arcitecture machine, though I doubt anyone will bother.
The decision about whether to use GETMAIN or STORAGE OBTAIN to obtain virtual storage and FREEMAIN or STORAGE RELEASE to release the storage depends on several conditions:
The address space control (ASC) mode of your program. If it is in AR mode, use the STORAGE macro.
The address space that contains the storage your program wants to obtain or release. If the storage is in an address space other than the primary, use the STORAGE macro.
Whether the program requires a branch entry or a stacking PC entry to the macro service. Using the branch entry on the GETMAIN or FREEMAIN macro is more difficult than using the STORAGE macro. Therefore, you might use STORAGE OBTAIN instead of GETMAIN for ease of coding, for example, when your program:
Is in SRB mode
Is in cross memory mode
Is running with an enabled, unlocked, task mode (EUT) FRR
The branch entry (BRANCH parameter on GETMAIN or FREEMAIN) requires that your program hold certain locks. STORAGE does not have any locking requirement.
If your program runs in an environment where it can issue the FREEMAIN macro (as specified by the conditions listed above), you can use FREEMAIN to free storage that was originally obtained using STORAGE OBTAIN. You can also use STORAGE RELEASE to release storage that was originally obtained using GETMAIN.
Joined: 30 Nov 2013 Posts: 917 Location: The Universe
Well, of course your decision depends on many things.
Many times the decision comes down to whether some mechanism can be put in place to avoid the use of MVS storage management entirely. Most of my private service functions, for example, require the caller to supply a small (as in 100 or 200 byte) work area, or by minimizing register usage, use parts of the caller provided register save area as a work area. Here the decision about storage management is put off to some other agent!
Several years ago I had a program that allocated - like my test program - many thousands of small storage areas that were freed in a block at the end of the program. In testing this process seemed to require a long time, so I instrumented it using TIMEUSED as in my testing program and found it was using close to a minute of CPU time! At the time my solution was to dust off a private function that would sub allocate little storage areas in larger - 4K - storage blocks, and free those 4K blocks in a group at the end. The one minute of CPU time went down to less than a second!
Years ago when the BAKR instruction was published as a possible alternative to the IBM "standard" subroutine entry/exit convention I tried it and found it was much worse! After some thought I realized it was saving 16 32 bit registers AND 16 32 bit access registers in ESA 390. In z/Architecture it is saving 16 64 bit registers AND 16 32 bit access registers. Thanks IBM, I'll stick with conventional mechanisms!
In the fine print associated with the development of XPLINK back in the 1990s, you'll notice there was no mention of BAKR, no doubt for the reasons I had developed in my earlier tests. XPLINK did make the point that the "standard" linkage convention required too many registers to be saved and restored, something I now think about in my own work, though this often frees up tiny amounts of register save area storage for use as a work area.
Just last week I researched the now hopelessly obsolete Burrough B5500, a very innovative machine in the 1960s. The discussion started me thinking about register oriented architectures like System/360 compared to the nearly universal stack architectures we see now. Back then my thought was stack used portions of the stack as a pseudo register in storage, which seemed to me at the time as slower than a real register. Now I realize stack architectures do not have a wealth of registers, which simplify (and speed up, for that reason) status saving (as in interrupts, or subroutine calls) and this may be why they are now so popular. Well, just a thought.
Thank's a lot for these detailed descriptions an your elaborations.
Just as you, years ago when IBM unnounced some new instructions regarding 64bit and z/Arch i took a shot of some of those things, but decided to keep to my conventional methods.
I dont't see no need in using relative large destination branch instructions
like BRAS/BRXH or multiply/divide like ML/DL, when programming some
normal commercial applikations., as complex as they could be. Using the same approved techniques - highly structured - well designed - as programming in COBOL.
As i saw in your profile, you're a retired professional. My deep respect, you're still keep on beeing busy oneself with application development and system engeneering.
I had retired two years ago and now i'm back again on some migration projekts about assembler programming. I's some kind of addicting.
Exceptionally when a snug income is assured.
Seams to me, that assembler is not dead as all, as announced so many years ago. Most of the banks here in Germany still have some programms in stock.
Joined: 30 Nov 2013 Posts: 917 Location: The Universe
When the relative branch instruction came out in the 1990s, my analysis is the main beneficiary would be the compilers, which do tend to write relatively large blocks of code compared to Assembler programmers. For a couple of years, say from 2010 to 2013, I got in a jag where I just used relative branch instructions, but them I just mostly switched to conventional BC instructions.
In 1991, when I was unemployed for a considerable period, I learned C, and learned the C qsort library function function. By the end of 1991 I was back in real machines and using Assembler again. In 1993 I wrote QSORT. Like qsort it uses a compare function. After a while, I realized that by using BRC instructions I did not need a base register for the compare function, so I could cut down the registers I use, and the registers I had to save and restore. Shades of XPLINK. QSORT calls the compare function using conventional linkage -
CALL compare,(address1,address2)
Most programs that use QSORT call it multiple times, with multiple compare functions. The compare functions pretty much use a common startup sequence, usually -
SAVE 14
LM 14,15,0(1)
LR 1,15
LA 15,1
* Insert compare code here.
Register 15 has an initial return code, the qsort staples - negative, 0, positive, with the same meanings.
All of the compare functions share a common suffix, generally
Sometimes you get to SC0100 as a fall through when there is no less than or greater than branch, SC0200 or SC0300 are generally branched to by JL SC0200 or JH SC0300. The compare function is performance critical, so everything I can save is a step in the right direction. I do not save and restore registers 0 and 1 because QSORT doesn't need them. The actual CALL macro in QSORT is
LR R15,R5
CALL (15),((R9),(R10)),MF=(E,PARMLIST)
Registers 14, 15 and 1 are trashed by the macro, and QSORT does not actually use register 0. Register 5 has the address of the compare routine. Obviously registers 2 through 13 cannot be altered by by the compare function, but it doesn't use them anyway.
Now if the compare function needs its own base register, it would be coded differently, probably
USING *,2
SAVE (14,2)
LR 2,15
LA 15,1
...
RETURN (14,2),RC=(15)
Joined: 30 Nov 2013 Posts: 917 Location: The Universe
Those fractions of seconds do add up. And, surprise, surprise, they're only noticed on mainframes. On toy and baby machines, no one attempts to measure this kind of stuff, so no one cares because it can't be measured. This is one reason projects tend to gas out more often on toy and baby machines because no one checks this in advance. Measurement tools, though they are disappointing in their repeatability, have been in mainframes for decades, though the discipline to actually use them seems to have disappeared, probably because it is not in CS curricula (because it can't be done on toy and baby machines. These guys seem to depend on Moore's "law" to cover up their lapses.
The raw outer shell in my QSORT isn't all that great. Actually, I stole it from K & R, where it was simplified for demonstration purposes. But I can partly make up for it by being efficient in other places, like the compare function.
What I was trying to demonstrate is YOU HAVE TO NOTICE THE SMALL STUFF. ST 14 is a lot faster than STM 14,2. L 14 is much faster then L 14/LM 0,2. You will see a difference if you call the compare function 100,000 times.
I Think this is still a endless debate. I still see some unavoidable necessity to to keep a close watch at the processing time of large and complex SQL-Statements. Could be more critical than some assembler instructions.
Could be measured with Strobe and Omegamon.
But like i said, i'm not the guy who cares about numbers of bytes or some seconds at run time. I don't use bit-switches with TM anymore. I see no need to care about, doing commercial programming. Some seconds give or take doesn't earn a victory or a golden watch.
Have a nice weekend, regards, UmeySan
Goedendag Peter
Hartelijk dank voor bemoediging.
Plezant weekeinde, groet, UmeySan