What is the most complex abend you've faced and its solution

elixir1986 · New User Joined: 10 Nov 2015 Posts: 45 Location: USA

Was asked in interview a very generic question - What is the most complex mainframe abend you've faced and how you solved it?

Constructive responses appreciated

Rohit Umarjikar · Posted: Sat Jun 08, 2024 5:00 am

Updating a VSAM failed due to no EXTENTS / space issue .. since the updates were critical , you can not trust the data in VSAM , so recovery is to delete/ define VSAM AND restore the VSAM data from last back up (2-3 hours back before the update cycle began ) and rerunning the jobs after with correct sequence and overrides as necessary.. This may be application specifics but that’s something no one wants it to happen..

prino · Posted: Sun Jun 09, 2024 3:18 am

After an IDMS resize, we DB keys would exceed 2^32, aka would become negative as they were represented by "FIXED BIN (31)" in PL/I. Lots of programs used "if DB_KEY < 0" rather than "if DB_KEY = 0", which should have been used from the onset.

We lost a complete business day, as there wasn't a solution, and spend the night scanning, changing, and recompiling several dozens of programs.

One of my "stuff a whole working week in two days" sessions.

Pedro · Posted: Sun Jun 09, 2024 7:23 am

elixir1986, they were asking about YOUR experiences. If you reply with somebody else's abend scenario, they will get you on follow-up questions.

Rohit Umarjikar · Posted: Sun Jun 09, 2024 11:05 am

Pedro · Posted: Sun Jun 09, 2024 10:05 pm

re: Constructive responses appreciated

Now that you know such a question can be asked, you can prepare for the subsequent interviews. Write down, now, what your answer would be and then review your notes before every interview so that you can answer the question more successfully.

vasanthz · Posted: Mon Jun 10, 2024 6:20 pm

We were two sysprogs, and my better experienced colleague was on a holiday.

All our mainframe tapes were being backed by Dell Data Domain.
During a data domain upgrade in the middle of the day, the data domain did not come up as expected and lost access to all Mainframe tapes.

Had to route SMF offloads into DASD to prevent data loss. We didn't have HALT in SMFPRM, so that worked well in our benefit

Db2 Active logs were not being offloaded into archive logs, so had to add volumes, then add same volumes in DR replication.
Route Db2 archive logs to DASD.

Assist application teams in changing their programs so they dont access tapes.

It was during the day time, so not a lot of batch was accessing tapes.

A memorable day for sure.

Joerg.Findeisen · Posted: Mon Jun 10, 2024 8:28 pm

@vasanthz: That sounds familiar to me. Always fun to handle.

Calm seas don't make sailors.

Pete Wilson · Posted: Tue Jun 11, 2024 12:49 pm

One of the weirdest things I've encountered was during a data centre move which also included some changes to the DASD set up in the new target site. We had moved all the DASD and were trying to IPL the first LPAR at the new site but it kept failing and going into a wait state. The wait code indicated some issue with the nucleus but it appeared fine and we just couldn't see what the issue was. Even IBM were struggling to determine what it was. In the end using a one pack system we found that in the mastercatalog the SYS1.NUCLEUS showed as having device type 3380, when in fact it had been created on a new 3390 device as part of the build of the new site. It was recataloged correctly, along with all the other datasets on the volume and the LPAR IPL'd fine. That was a real head scratcher!