Hi, actually I'm a beginner in IMS tuning, and reading docs for me is better, but I already digged through tonns of it, learned numeros new things, but still have no idea what how to locate where does my problem come from.
It would be great if someone hints me where to dig. Some times I even think that it could be an internal IMS defect caused by memory leak...
The problrem is that some times I get 878 abend while batch program processing. The only message displayed is "DFS629I IMS XFP TCB ABEND - SYS 878 IVP1"
Configuration is like this: IMS version 12, z/os 1.13, batch is executed for initial load of DEDB, Fast path 64bit buffer manager is enabled, it's buffer size is maximal 2047M. The number of segments inserted is about 11million, total size of loaded data is 1,9GB.
BMP program is written to make checkpoints either after getting FW or every N inserted segments (N is about 1500).
The only dependency I figured out for now is that if NBA is bigger than 2000 in one BMP, or if there are several BMPs that use total NBA > 2000, and the scope of data inserted is long enough, I'm getting abend after about 200-300 checkpoints.
If NBA is 2000 or less, I do not get abend - at least while loading my 2GB of data, may be if I had 20, I got, who knows...
Forgot to tell about another observation.
(Some clarification about what does the program do:
BMP takes the data from prepared GSAM input file. one record consists of 3 fields: DSN=44char, Type=3-char, data=1917char and each record is inserted in IMS database like 3 segment - root, child, child of child.
I described it so detailed to show that each input record has the same structure, and causes the insertion of the same segmnets of same size.)
And back to BMP. I start the BMP program with NBA=2000. The first FW I get after 2000 input records - as expected. Make checkpoint. Next FW appears after 19xx , the next after even smaller number, and so on.
For example, 100th checkpoint happens 1800 records read after 99th. But on my input data, BMP program ends successfully.
If I use NBA=8000, the behaviour is the same. Sooner or later the interval between checkpoints decreases until about 4000, and 878 abend happens.
This abend says that GetMain failed, so I decided to check what happens with CSA and ECSA. Both of them have a plenty of place - IMS have 1GB, uses 100-160MB, some times up to 600MB. ECSA has about 4500MB, and 2GB used. So, seems that no real lack of memory.
Joined: 06 Jun 2008 Posts: 8696 Location: Dubuque, Iowa, USA
S878 generally has a reason code, which tells much about the reason for the shortage of storage. Look in the job output or the console log to find the reason code.
True, usually s878 comes with 2-digit reason code, but not this time.
May be I lost some thing, see only one line that could have RC -
DFS629I IMS XFP TCB ABEND - SYS 878 IVP1
here is a system log -
Code:
20:32:07.43 JOB00265 00000090 +DFS0542I *EXTENDED CHECKPOINT* DAY=327,TIME=173207146201+0300 IVP1
20:33:12.28 STC00249 00000090 DFS3257I ONLINE LOG NOW SWITCHED - FROM DFSOLP02 TO DFSOLP03 IVP1
20:33:12.29 STC00249 00000090 DFS3257I ONLINE LOG NOW SWITCHED - FROM DFSOLS02 TO DFSOLS03 IVP1
20:33:12.35 JOB00267 00000281 $HASP100 IVPGNJCL ON INTRDR IBMUSER FROM STC00251
IMS12RC1
20:33:12.38 JOB00267 00000290 IRR010I USERID START2 IS ASSIGNED TO THIS JOB.
20:33:12.40 STC00249 00000090 DFS2484I JOBNAME=IVPGNJCL GENERATED BY LOG AUTOMATIC ARCHIVING IVP1
20:33:12.59 JOB00267 00000281 ICH70001I START2 LAST ACCESS AT 20:23:00 ON TUESDAY, NOVEMBER 22, 2016
20:33:12.61 JOB00267 00000090 $HASP373 IVPGNJCL STARTED - INIT 4 - CLASS E - SYS S0W1
20:33:12.62 JOB00267 00000281 IEF403I IVPGNJCL - STARTED - TIME=20.33.12
20:33:12.72 JOB00265 00000090 +DFS0540I *EXTENDED CHECKPOINT* PROGRAM=DOCCARD ,CKPTID= 000025,
REGID=0007,DAY=327,TIME=1733121 IVP1
20:33:12.73 JOB00265 00000090 +DFS0542I *EXTENDED CHECKPOINT* DAY=327,TIME=173312174829+0300 IVP1
20:33:38.13 JOB00267 00000290 - --TIMINGS (MINS.)--
----PAGING COUNTS---
20:33:38.14 JOB00267 00000290 -JOBNAME STEPNAME PROCSTEP RC EXCP TCB SRB CLOCK SERV PG
PAGE SWAP VIO SWAPS
20:33:38.14 JOB00267 00000290 -IVPGNJCL 00 45220 ****** .08 .4 23435 0
0 0 0 0
20:33:38.15 JOB00267 00000090 IEF404I IVPGNJCL - ENDED - TIME=20.33.38
20:33:38.15 JOB00267 00000290 -IVPGNJCL ENDED. NAME-IBMUSER TOTAL TCB CPU TIME= .17
TOTAL ELAPSED TIME= .4
20:33:38.17 JOB00267 00000281 $HASP395 IVPGNJCL ENDED
20:33:38.20 00000281 $HASP309 INIT 4 INACTIVE ******** C=ABCDE
20:50:43.34 STC00249 00000090 DFS2361I XCF GROUP CLOSED SUCCESSFULLY. IVP1
20:50:43.34 STC00172 00000090 HWSD0282I COMMUNICATION WITH DS=IMSDSNM CLOSED; M=DREC
20:50:43.35 STC00249 00000090 DFS3896I ATTEMPTING TO PURGE OLDS BUFFERS IVP1
20:50:47.36 STC00249 00000090 DFS616I OLDS BUFFERS SUCCESSFULLY PURGED IVP1
20:50:47.37 STC00249 00000281 DFS629I IMS XFP TCB ABEND - SYS 878 IVP1
20:50:47.38 STC00249 00000281 DFS629I PSW AT ERROR = *** NO SDWA *** IVP1
20:50:47.45 00000090 IEA045I AN SVC DUMP HAS STARTED AT TIME=20.50.47 DATE=11/22/2016 754
754 00000090 FOR ASIDS(0070,007F,0071,0041)
754 00000090 ERROR ID = SEQ00282 CPU8B ASID0070 TIME20.50.41.6
754 00000090 QUIESCE = YES
20:51:04.45 STC00032 00000090 DSNL511I -DBAG DSNLIENO TCP/IP CONVERSATION FAILED 755
755 00000090 TO LOCATION ::10.0.40.132
755 00000090 IPADDR=::10.0.40.132 PORT=49209
755 00000090 SOCKET=RECV RETURN CODE=1121 REASON CODE=76650446
20:51:04.56 STC00032 00000090 DSNL511I -DBAG DSNLIENO TCP/IP CONVERSATION FAILED 756
756 00000090 TO LOCATION ::10.0.40.22
756 00000090 IPADDR=::10.0.40.22 PORT=53719
756 00000090 SOCKET=RECV RETURN CODE=1121 REASON CODE=76650446
20:51:04.64 STC00032 00000090 DSNL511I -DBAG DSNLIENO TCP/IP CONVERSATION FAILED 757
757 00000090 TO LOCATION ::10.0.40.22
757 00000090 IPADDR=::10.0.40.22 PORT=53893
757 00000090 SOCKET=SENDMSG RETURN CODE=140 REASON CODE=76697242
20:51:36.13 STC00035 00000090 IEA043I SVC DUMP REACHED MAXSPACE LIMIT - MAXSPACE=00000500 MEG
Actually, what happens in log - IMS makes checkpoint, switches online logs, again checkpoint, IVPGNJCL starts that writes online logs to offline logs - and here is a beginning of a problem - look at the time line at left column.
15 minutes later, abend message and IMS starts to fall down.
Today I got a dump again, and now found the reason code. Thanks for hint to look in the job log, more It was in job log of control region, not bmp. Unfortunately, I'm unable to read dumps. The only thing I see that PSW at entry for ABEND is from 31bit address space.
The RC=10, so I went to an old topic ibmmainframes.com/about38154.html unfortunately they do not tell what CSA worked for that case.
I understand that there is something wrong with virtual storage - need to tune CSA/SQA size, or IMS buffer pools. I believe that only buffers that provide space for NBA/OBA should be considered, because changes of NBA change behaviour of the system.
But, in my case we use Fast path 64bit buffer pool manager, and do not need to tune pools manually like needed in older IMS (surely, if I undestand correct how does this new manager works).
The only parameter could be changes in FP64 is pool size(if I haven't missed smth else), we tried different sizes - from 100MB till maximal 2047, no changes.
CSA/SQA were different too - now they are
CSA = 1MB(tried 4) ECSA = 1GB SQA = 1MB(tried 4) ESQA = 50MB
We decreased sizes from 4 to 1 because thought that areas take 8mb below the line giving only other 8mb to other programs, and usually 640KB are enough.
Actually, we could give more space to IMS in CSA or other areas - from 16MB CSA system takes may be 1 MB, user programs hopefully fit in 2, so 10-12MB is real.
But I'd better become sure that exactly this is a cause for abend.
So, folks, are there any minds what I could try locate the problem?
Big thanks from me for any hints.
Joined: 06 Jun 2008 Posts: 8696 Location: Dubuque, Iowa, USA
From the MVS System Codes manual:
Quote:
10
There is not enough virtual private area storage available to satisfy the request. This is due to one of the
following:
v
A program is requesting virtual storage, but not subsequently freeing the storage.
v
The region size specified is too small.
v
An installation exit is requesting virtual storage from a V=R region before initialization of the region has
completed.
The third one is not likely to be the problem since initialization appears to be complete. Therefore, either you're not giving it enough region or a program is not releasing storage. Note that the private area extends from the top of the prefixed storage area (PSA) to the beginning (or bottom end) of the common service area (CSA) -- and hence making changes to your CSA size may be making things worse instead of better.
If the problem is in 24-bit storage, you may have a real problem because there's only so much room below the line and if you're using it all, there's not a lot of places to get more from.
If the problem is in 31-bit storage, then you probably can adjust things (as long as you have enough storage assigned to the LPAR; if not, you may have to change the LPAR storage allocation - stealing from other LPARs if necessary).
The problem will NOT be in 64-bit storage; 64-bit storage allocation works in a completely different way (see memory objects on the IBM web site if you want to find out more) such that many concepts that apply to 24-bit and 31-bit memory have no meaning for 64-bit memory.
Joined: 20 Oct 2006 Posts: 6966 Location: porcelain throne
if it is an old application - written with old cobol - you may have to compile a bunch to 32bit to get them to load above the line.
if your application is old enough, you are looking at a retirement project.