IBM Mainframe Forum Index
 
Log In
 
IBM Mainframe Forum Index Mainframe: Search IBM Mainframe Forum: FAQ Register
 

Dump Reading and Testing


IBM Mainframe Forums -> ABENDS & Debugging
Post new topic   Reply to topic
View previous topic :: View next topic  
Author Message
Akatsukami

Global Moderator


Joined: 03 Oct 2009
Posts: 1788
Location: Bloomington, IL

PostPosted: Sat Mar 09, 2013 10:20 pm
Reply with quote

A lesson for all the software engineers on this board. Pay attention, kusomushi.

We had a DB2 stored procedure executed as part of a started task that began abending with U0016 several times a day in January. Nobody knew why, nobody was willing to try to determine why and risk failing. It fell to my team.

The message from Language Environment (LE) indicated that the abend was coming from M1234; reading the source code, however, indicated that this module did nothing but call the LE abend service CEE3ABD. Looking at the call chain, however, showed that M1234 was being called from M5678. As it was being called from two places in that module, I had to look at the PMAP to determine which one.

The code was approximately
Code:
ACCEPT WS-DATE-IN FROM DATE
MOVE 'YYMMDD' TO WS-FORMAT-1
MOVE 'YYYYMMDD' TO WS-FORMAT-2
CALL 'MDATE' USING WS-FORMAT-1,
                  WS-FORMAT-2,
                  WS-DATE-IN,
                  WS-DATE-OUT,
                  ERROR-STRUCTURE.
IF some condition THEN
  CALL 'MDATE' USING WS-FORMAT-1,
                     WS-FORMAT-2,
                     WS-DATE-OUT,
                     WS-DATE-OUTER,
                     ERROR-STRUCTURE
  IF ERROR-CODE OF ERROR-STRUCTURE IS NOT EQUAL TO 0 THEN
    MOVE 16 TO ABEND-CODE
    CALL 'M1234' USING ABEND-CODE
  END-IF
END-IF.

(Incidentally, this code antedates z/OS by some years. You wouldn't have to go to through this rigmarole to get a system date with a 4-digit year today.)

The error is obvious, but I got the base locator and the offsets of the data items from the DMAP and looked them up in the dump anyway, to confirm my analysis.

The error is not obvious, you say? Well, the first call to MDATE assumes an input format (WS-FORMAT-1) of YYMMDD, as you'll get from ACCEPT...FROM DATE, and gives an output format (WS-FORMAT-2) of YYYYMMDD. So does the second call...but the input date in this call (WS-DATE-OUT) is already in YYYYMMDD format. So 20120101 is treated as valid (although it's incorrectly interpreted as 1 December 2020, as not 1 January 2012), but 20130101 is treated as invalid, and causes M1234 to be called and an abend to occur.

Lessons learned:
  • Knowing how to spell COBOL is not the same as knowing how to program in COBOL; I read the source code.
  • Despite analysis aids, being able to read at least an LE-formatted dump is still a necessary skill; ERROR-STRUCTURE is not printed out.
  • Running one test case and getting a return code of zero does not constitute testing. If a date after 31 December 2012 (gotten from, e.g., TICTOC) had been used, the error would have been obvious even to the employees of the defunct Indian outsourcer who last touched the module. But it wasn't, and it wasn't. Moreover, that the date was being interpreted incorrectly should have been noticed. But, hey, it runs to RC=0, so everything must be copacetic, right?
  • This process cannot, absolutely, positively cannot, be done in *Sort.
Back to top
View user's profile Send private message
Bill Woodger

Moderator Emeritus


Joined: 09 Mar 2011
Posts: 7309
Location: Inside the Matrix

PostPosted: Sun Mar 10, 2013 4:39 pm
Reply with quote

Presumably to changed it to "assign business-/data-date from run-control file to W-DATE-OUTER (OUTER?)".

The date code you've shown can easily be done in DFSORT, but I assume your point is "*Sort should not be your 'weapon of choice' for writing all programs because your version-control for 'JCL' is deficient".

The original coder should have realised that if an error was encountered it could only possibily be from a coding fault. Then double-check the code, find the problem, fix it. Make the code for the abend "different", something to indicate "dumb-ass coding error" rather than a general "date error problem". A diagnostic message should be produced, if possible, but at least stored. Meaningful names for FORMAT-1 and FORMAT-2 (or replacement of the literals with meaningfully-named "constants", if FORMAT-1 and FORMAT-2 are part of a "structure") would have made it more difficult to code incorrectly in the first place. And, when the code is copied, ensure that all the "meaningful" information is correct in its new location.
Back to top
View user's profile Send private message
Akatsukami

Global Moderator


Joined: 03 Oct 2009
Posts: 1788
Location: Bloomington, IL

PostPosted: Sun Mar 10, 2013 6:05 pm
Reply with quote

Bill Woodger wrote:
Presumably to changed it to "assign business-/data-date from run-control file to W-DATE-OUTER (OUTER?)".


"Run control date"? What's that? icon_razz.gif In fact, this logic is executed only if a business date passed as a parameter is spaces!

Quote:
The original coder should have realised that if an error was encountered it could only possibily be from a coding fault. Then double-check the code, find the problem, fix it. Make the code for the abend "different", something to indicate "dumb-ass coding error" rather than a general "date error problem". A diagnostic message should be produced, if possible, but at least stored. Meaningful names for FORMAT-1 and FORMAT-2 (or replacement of the literals with meaningfully-named "constants", if FORMAT-1 and FORMAT-2 are part of a "structure") would have made it more difficult to code incorrectly in the first place. And, when the code is copied, ensure that all the "meaningful" information is correct in its new location.


All of the above (with the exception of the diagnostic message; one is generated but not output; I found it in the dump, one of the data that confirmed my analysis). Since I didn't write the code, I have no great impulse to defend it icon_wink.gif I will note, though, that the original code dates from 1995 (at least according to the standard "flower box" describing it), and has been "enhanced" several times, most recently in 2008 by anonymous software engineers from Satyam. The original code is no longer available (neither are any of the programmers), so it can't be told which flaws are original and which were introduced later by people who didn't understand what they were doing.
Back to top
View user's profile Send private message
Ed Goodman

Active Member


Joined: 08 Jun 2011
Posts: 556
Location: USA

PostPosted: Mon Mar 11, 2013 7:25 pm
Reply with quote

Hey, don't blame the outsourcer. I'd bet the contract didn't SAY that the code had to work on December 31st.
Back to top
View user's profile Send private message
View previous topic :: :: View next topic  
Post new topic   Reply to topic View Bookmarks
All times are GMT + 6 Hours
Forum Index -> ABENDS & Debugging

 


Similar Topics
Topic Forum Replies
No new posts Reading dataset in Python - New Line ... All Other Mainframe Topics 22
No new posts Find the size of a PS file before rea... COBOL Programming 13
No new posts Rexx program reading a DSN then write... CLIST & REXX 4
No new posts Related to Unit Testing Testing & Performance 2
No new posts Reading subsequent rows in a query. DB2 12
Search our Forums:

Back to Top