View previous topic :: View next topic
|
Author |
Message |
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
Hi,
My production CICS is throwing a storage violation. I am trying find the module which caused the storage violation. As we know, the abending module need not be the module which caused the storage violation.
I checked the trace and for a particular task the leading and trailing Check zone filed is not same.
Below message from trace table,
Code: |
06002 QR SM 030B SMGF *EXC* Storage_check_failed_on_freemain_request FREEMAIN,00144008,TASK
|
what i understand from the above message is, a freemain request has failed and 030B indicates a storage zone check failure and 144008 is the address.
what i want to find is,
how to find the program that issues a getmain?
i have a lot of getmain's in the trace. how to find the getmain associated with this free main?
Please help. |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
Look for the GETMAIN with 144008 address. |
|
Back to top |
|
|
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
Any suggestions on how to find the predtor? |
|
Back to top |
|
|
Robert Sample
Global Moderator
Joined: 06 Jun 2008 Posts: 8700 Location: Dubuque, Iowa, USA
|
|
|
|
I'd start by looking at the program and its call sequence. Perhaps there's a mis-match in DFHCOMMAREA sizes, which can cause the problem. However, storage violations are difficult to resolve precisely because the program causing the problem may not be related to the program that takes the abend.
Look at what other programs executed at the same time -- whether or not this is relatively easy to do depends upon the tools in use at your site. Probably the worst case would be needing to look at the SMF data to determine the programs (but if your site doesn't generate SMF data for CICS, that would be worse), although this could be simpler if your site uses MXG or one of the other SMF management / formatting tools.
Unfortunately, there is no cookbook approach that resolves storage violations -- it requires a combination of debugging expertise with the ability to follow the trace and look at the storage (via IPCS usually) to figure out the root cause. |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
Back to top |
|
|
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
Thanks for your comments.
1) we dont use any tools.
2) i just have the trace table with me. i dont have anything else.
3) i am not able to replicate this in test, this is working fine in test. i will not be able to run an EDF.
I will try and will post the solution if i find. Thanks. |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
Back to top |
|
|
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
Bill,
thanks for your comments. i have checked the recovery and protection options its the same.
AS this is a storage overwrite, i understand that this is because of the data overflow.
usually data overflow happens when tables are used. will this overflow happen during a TS queue write? |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
Are you relying upon hard-coded indices/subscripts in the program to keep the tables from overflowing?
The short answer, regardless of how the Storage Check Zone is being compromised, it could occur due to table overflow, TSQ WRITE, etc.
It sounds to me like the program is wiping out the low-order SCZ of the program's GETMAINed Working-Storage. I would look at this area of Working-Storage for a possible culprit.
Instead of using hard-coded indices/subscripts for table-maximums, dynamically calculate this maximum at program-start and use this calculated value throughout the logic.
Although I've posted this example numerous times before, here it is again -
Code: |
03 WS-FWORD PIC 9(08) COMP.
03 WS-TABLE-REC.
05 WS-TABLE-ENTRIES OCCURS 500 TIMES
INDEXED BY X-WS-TE, X-WS-TE-MAX
PIC X(100).
DIVIDE LENGTH OF WS-TABLE-REC BY LENGTH OF WS-TABLE-ENTRIES (1)
GIVING WS-FWORD.
SET X-WS-TE-MAX TO WS-FWORD.
|
In newer versions of COBOL, specifying the first occurrence (1) as the Divisor, in the DIVIDE is redundant (can be omitted), but it can be left this way, won't hurt anything or use WS-TABLE-ENTRIES without the (1).
For me, old habits die hard.... |
|
Back to top |
|
|
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
Thanks Bill.
My program is writing a TS queue
Code: |
EXEC CICS WRITEQ TS
QUEUE('Q1')
FROM('....')
Length(x)
|
What happens if the length of the data is greater than 'X' in the above code. will it cause a storage violation?
Note: the above WriteQ statement is executed more than 1500 times. |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
If you're specifying a data-variable from WS for the LENGTH keyword, ensure it's a binary-halfword. If it's packed-decimal (or other than a halfword), the value is accepted as-is and used by the translator for the API. If, for example, the length of the FROM area were 256 (X'0100') but the variable was defined as two-bytes packed-decimal, the length-value passed would be X'256C' and this could wreak all sorts of havoc.
Instead of specifying a LENGTH, omit it and allow the translator to use the LENGTH OF Special-Register of the FROM area. You can then be assured the correct length-value is being used.
Is the FROM area located near the bottom of WS? |
|
Back to top |
|
|
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
Bill,
Apologize for the delay. Took sometime for me to find exactly what is going on,
The violation is thrown in an assembler program. This program links to another and it goes through several modules. When it gets back, after validation and processing, the assembler pgm writes a ts-q. After the TS-Q write, the assembler program issues freemain. The storage violation is thrown when a free main is issued.
Commarea given by this program to its successors have several pointers. I believe all these pointers are assigned an address in the flow. So after a 'CICS retrun' of this base assembler program, all the address assigned to the comm-area-pointers and the area itself goes out-of-scope/Void and hence a freemain is issued by CICS.
One of this freemain throws an SV.
I dont know if this an explicit freemain or an implicit (by implicit i mean, free main issed by cics after cics return is executed).
I know the address it is trying to free but i dont see a matching getmain with that address.
How do i find the program which issues a GETMAIN for this address?
If the above explanation is not clear, please let me know. |
|
Back to top |
|
|
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
In my Trace I have the below information,
Code: |
74674 QR AP 00E1 EIP EXIT FREEMAIN
74674 QR AP 00E1 EIP ENTRY RETURN
74674 QR AP E160 EXEC ENTRY RETURN
|
I understand what the above mean, but in the right side of the above cics instructions i have the below,
Code: |
AT X'80140F08',ASM
AT X'80140F08',0,0,ASM
OK 00F4,00000000 ....,00000C04 ...
0004,00144018 .. .,09000E08 ...
ASM
|
I dont know what the above mean. If some one can shed light on the above, it will be of great use. |
|
Back to top |
|
|
Nic Clouston
Global Moderator
Joined: 10 May 2007 Posts: 2454 Location: Hampshire, UK
|
|
|
|
Please note: neither CICS or Assembler EVER throw anything. They issue/display things. Throw is something that Java (and possibly other curly-whirly languages) does. |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
Shankarm,
In the TRACE, look for a matching GETMAIN whose acquired storage address matches that of the address used in the attempted FREEMAIN. If you can't find it, run the transaction under CEDF. One screen past PROGRAM INITIATION of the Assembler program, press <F5> and you're now looking at the start of the program's WS. Subtract 8 from this address and you should be looking at the top SCZ (both top and bottom match). Try to determine the GETMAIN length and find the last byte. The bottom SCZ will be the 8-Bytes after the last-byte addressable by the program. Write this address down somewhere. Just before program termination, go into WS and see if the top and bottom SCZ's are still surrounding the GETMAINed area. If you're missing one (or both), then the application-code needs to be reviewed.
What could be happening is that the dynamic/GETMAIN storage is acquired implicitly under DSECT DFHEISTG (normally assigned to R13) at the start of the Assembler program (similar to COBOL WS). Upon program termination, an implicit FREEMAIN is issued and the SV is raised. It could be that one or both of the SCZ(s) have been written over and therefore, CICS can't fulfil the FREEMAIN. Frequently, it's the low-order SCZ that gets clobbered.
The SIT parm STGRCVY can be activated along with the LE option "CHECK" (via transaction CLER). With STGRCVY, a System Dump is still produced, but the corrupted SCZ(s) are restored from a special subpool stack and the program continues without coming to a screeching halt. However, there is overhead involved with STGRCVY and it should be used only in non-Production environments as SV's should be worked-out before moving the code into Production.
You never did confirm that the LENGTH associated with the WRITEQ TS is a binary-halfword. If it's other than that (lets say 2-Bytes packed-decimal), instead of CICS using a halfword of X'05DC' (decimal 1500), it used a packed-decimal value of X'500C' (the high order 1 is truncated), which is decimal 20492. Goodbye bottom SCZ, with an guaranteed SV soon to follow.
Have you gone through the IBM Storage Violation Webinar DEBUG Link, which was provided in an earlier response?
PS. Storage Check Zones are sometimes referred to as "Rabbit Ears". |
|
Back to top |
|
|
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
Bill,
I went through the IBM debug link but my dump doesn't have all the data mentioned in the link.
Storage recovery and protection is active in the region..
In the writeq my program uses direct number hard-coded
(e.g) length(8192) ...
Currently am working with my cics admin team.. we are using the storage violation trap to fix this.. am waiting for response from the admin...
The problem is that my transaction goes through many modules and the violation is at the end of the transaction. to reach the point where violation happens, i have to press the enter key atleast 10000 times...
Again, Thanks for helping out, i will update the forum shortly... |
|
Back to top |
|
|
Bill Woodger
Moderator Emeritus
Joined: 09 Mar 2011 Posts: 7309 Location: Inside the Matrix
|
|
|
|
Autorepeat, if your "enter" has it, and a piece of old listing paper as a wedge. |
|
Back to top |
|
|
enrico-sorichetti
Superior Member
Joined: 14 Mar 2007 Posts: 10886 Location: italy
|
|
|
|
unless the TS works in a paperless environment |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
Shankarm,
Are you writing the TSQ from the Assembler program's DFHEISTG Dynamic-Storage? If you're using R13 (normal default) as the DATAREG assignment in DFHEIENT and you can address byte-01 of the 8192-byte area, you won't trample on the low-order SCZ.
In the translator Prologue code, what is the value of "DC AL2(DFHEIEND-DFHEISTG)"? This is the total-length of Dynamic-Storage and is the length-value passed for Storage Allocation via a CICS GETMAIN, using VCON DFHEAI0.
The calculated DFHEIUSR storage will be the above AL2 minus X'0180' (this is where user Dynamic-Storage begins, doubleword-aligned) or just simply =AL2(DFHEIEND-DFHEISTG). |
|
Back to top |
|
|
Bill O'Boyle
CICS Moderator
Joined: 14 Jan 2008 Posts: 2501 Location: Atlanta, Georgia, USA
|
|
|
|
In the previous post "=AL2(DFHEIEND-DFHEISTG)" needs to be "=AL2(DFHEIEND-DFHEIUSR)" for calculating user Dynamic-Storage. |
|
Back to top |
|
|
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
Bill,
below is what i found from the job logs
Code: |
00000A 0458 409+ DC AL2(DFHEIEND-DFHEISTG) |
I dont know what the above means. sorry.. I'm new to assembler.
Code: |
000180 95+DFHEIUSR DS 0D
|
The issue is resolved now. Below is what i did to resolve the issue,
Note: In the below description, i am talking about only one ASM pgm..
- I ran an EDF and found which is the module throwing the violation. In my case it a Assembler-CICS module.
- In my program (ASM pgm) i saw a lot of address variables(pointers) in the comm area, i know that the value for these pointers are assigned in the transaction.
-The transaction goes through 100+ programs. I assumed that the violation happened in the area pointed by one the pointers in the ASM program. As we know the memory is valid till a freemain is issues in the transaction and the current ASM program has pointers which is pointing to
the memory area getmained in the transaction. So when a 'CICS return' is issued in the program. All the pointers go void as the pointers are not passed to the calling program which Link's (CICS LINK) to the ASM pgm am referring to. As all the pointers are not valid after the cics return from the current ASM program, i thought an implicit freemain will be issued and when a free main is issued, am SCZ check is done by CICS before the freemain. As there was a violation, SCZ check failed and a storage violation is thrown.
-The above was my assumption and this was the point where i assumed something and misdirected all the experts in this forum.
- I assumed that a freemain will be issued implicitly but i was wrong. A getmain is valid till the transaction is complete or an explicit freemain is issued.
- But from the CICS trace i found that the violation happens in a free main.
- Bill pointed out in one of his post that an implicit freemain is issued for dynamic assembler storage (i.e.) variables in DSECT.
- So, i started to suspect the DSECT variables in ASM pgm.
- When i checked there was a mismatch in variable length. A varaible of lenght 8192 is defined in the linkage of a cobol program and the address was assigned to a pointer. The address in this pointer is later pointed by a DSAECT in the ASM program i was referring to. But the lenght of the DSECT was 8191 (one byte short).
-The Dsect was followed by CSECT. As per the above, 8192 was assigned to 8191. Hence the violation.
- I Just put a 'CL1' in the DSECT of the assembler program. which makes the DSECT 8192 lenght and the issue was resolved.
If you are reader and if you have any question or need any clarifications about the about post, please feel free to disturb me... any updates to this post will trigger a mail to me.. i will try to help you if i can. |
|
Back to top |
|
|
shankarm
Active User
Joined: 17 May 2010 Posts: 175 Location: India
|
|
|
|
Eventhough the issue is resolved, I still have few unanswered questions...
If this one byte length difference was causing the violation, this violation must have occured a long time back. The ASM program i was referring to in the above post was last modified in 1984. Why was the program running for 25+ years without any issues.. but now abended??
Below are the comments from one of the experts .. he is experienced.. i dont actuelly understand what he is saying..
I wrote the below:
I told him that, there is a length difference in the load and i asked him if the production source is correct??
And also told him about the lenght difference and CL1 is told in the previous post..
I didn't do the below... hope this helps you..
He wrote:
Quote: |
On Monday, I looked at the load modules for <ASM PGM> in PRODUCTION and TEST. The 8-byte difference in size(A70 vs A78) was due to a CICS stub, not your assembler program. The CSECT for <ASM PGM> was A2C in both load modules. I wouldn’t worry about the CICS stub, the old load module was assembled with an old release of CICS.
The thing to be concerned about are the DSECTS(normally implemented by copybooks, includes, etc.). These describe storage not yet acquired and will not impact the size of the generated load module. If you assemble a program with 2 different versions of a copybook, you will generate 2 load modules of the same size. But, the load module created using the wrong copybook will probably abend. What will change is the offset In the executable instruction, not the size of the load module.
I notice in your job out on SDSF(COMPLIE JOB43669) you used the NOALIGN assembler parameter. I would take out the CL1 and reassemble it with ALIGN. That should put <variable in DSECT> on a full word boundary.
|
|
|
Back to top |
|
|
|