I am sure my needs are not unique, but I am having difficulty finding tools to help.
Basically, many people everyday collect data from various sources to monitor the performance of our applications. For example, we check the elapsed run time for several critical jobs using Control-D, we use LISTCAT to check the allocation for certain critical files, we check the transaction counts from the CICS log, etc. all of this is done manually and the numbers are plugged into a spreadsheet and mailed out to a distribution list everyday. The purpose is to let certain people know about whether the application has met the service level agreements (e.g. online up time, etc.) and to monitor the file size and performance issues.
How can I automate this process instead of relying on manual effort?
Joined: 04 Mar 2005 Posts: 85 Location: In my tiny cubicle ...
Nick, you are certainly not alone in your endeavor. We too are starting a long-term process of managing our core business systems and of providing feedback to management as to how well we are meeting our defined Service Levels.
Today, we use a series of jobs that are considered to be part of the "critical path" for the applications. We capture the start and end times of these jobs and feed that information to tables in Oracle. We also track and post the up and down times of CICS regions, as well as the completion times of certain schedules, critical data transmissions, and recently, the status of our scratch tape pool for the VTL. All of this information goes out to the Operations web page, where the discrepancies between the desired service level objectives and the real-time data are highlighted when they are skewed.
Moving beyond this, we are implementing Formula by Managed Objects to be our centralized "monitor of monitors". The status of system tasks, CICS regions, databases, and batch jobs are all handled on the mainframes with BMC's AutoOPERATOR and are fed to the Formula server. We use BMC's OSPI to run transactions and to report on their success to Formula. Network monitoring comes from Netview. Monitoring of the distributed systems, routers, and switches comes from Tivoli. The DBA's for Oracle just installed a product called Foglight which will help them monitor the performance and reliability of the entire Oracle environment.
Currently, I am working to install the Managed Objects BEM (Business Experience Manager) product so we can perform synthetic testing of entire applications and look for throughput problems, response problems, system unavailability, etc. All of this will feed into Formula. Next, we will be installing their BSLM (Business Service Level Manager) product, which will be able to condense and report to management on all of this accumulated information from with Formula.