This blog covers the key knowledge areas of Incident Management.
- Overview of Incident Management
- Incident Lifecycle
- Incident Escalation
- Functional vs. Hierarchical Escalation
- Benefits, Costs, and Challenges
First things first, before we go deeper into each area, it's best to have a good idea of basic, fundamental terminologies. For those of you who plan on taking the ITIL exam, this will be very important to know. For example, what is an “Incident?"
According to ITIL Glossary:
Incident: (Service Operations) An unplanned interruption of an IT Service or a reduction in the quality of an IT Service. Failure of a configuration item that has not yet impacted service is also an incident. For example: Failure of one disk from a mirror set.
Incident Management: (Service Operations) The process responsible for managing the lifecycle of all incidents. The primary objective of incident management is to return the IT Service to users as quickly as possible.
These two definitions are very important to know and are quite frequently asked on the ITIL exam.
Overview of Incident Management
- To lower or even eliminate effects or occurrences of IT interruptions, disturbances or quality reduction
- Must get users back to work ASAP
- Hire appropriate specialists to record, classify, and allocate all incidents
- Progress should be monitored
- Incidents must ALL be resolved
- Process must be closed AFTER resolution
Objectives and Benefits of Incident Management
- Reduce impact on business
- Improve user productivity
- Obtain independent, customer-focused incident monitoring
- Make SLA (Service Level Agreement) info readily available
- Lower cost of doing business
- React quickly and efficiently
- Active monitoring of user domains and services
- Keep problems from escalating
- Improved user/customer satisfaction
- More accurate operations
- Prevent “too many cooks”
We have already defined “Incidences.” Let’s look at the “Problem” now.
Problems cause incidences. Incidences do not become problems. The problem is the underlying, unknown cause of one or more incidences. For example, one of the hard drives failed. If a disk fails, that is the incident. The problem however, is the underlying cause of the incident. Is it a configuration issue? Hardware issue? It is very important to remember the difference between an incident and a problem.
Incident Records vs. Problem Records:
- Incident Records are end-user focused
- Incident Reporting Management focuses on measurements of downtime and business impact
- Problem Records start with IT Management's decision to invest IT resources and end-user downtime for root cause investigation and implementation of resolution
- Problem Records focus on Internal IT Processes
- Management Reporting on problems focusing on root causes and implementing structural resolutions
As an incident manager, while you are putting together your Service Desk Solution and Incident Handling, below is a typical Incident Life Cycle Diagram that you possibly could be working from.
The very first step is to Detect and Record. This is the original acceptance for the request to the Service Desk. At that point, the Incident Record is created. The second step is to classify and support. For example, is the issue software related or hardware related, or is it a service request? The incident is coded by the type of the incident, status, or its impact on the organization. Is it urgent? Is it critical? What’s its priority? How does it relate to the Service Level Agreement? From there, the user may be given suggestions from the Service Desk based on the script, or perhaps, based on expertize to solve the incident or work around the issue, even if it’s just a temporary solution. If it’s a Service Request, at that point we would go to a different process since the Service Request does not involve the Investigating and Diagnosing steps.
If there are no known solutions for this particular incident, then it will be investigated and diagnosed further by the Service Desk group or go through the escalation process. If the solution was found, the issue can then be “resolved." The “Resolve and Recover” step must take place. We have to have a solution to the incident. Once we have resolution to the incident, we move on to Closure. The user is asked if they are satisfied with the resolution. If they are, we close the incident. We might have a survey or a follow up if we wish to improve this process.
Along every step of the way, we should have ongoing tracking and progress monitoring.
Here, we have two different escalation types: Horizontal Escalation and Vertical Escalation (this is asked frequently on ITIL Foundation test as well). In a nutshell, Incident Escalation is passing the issue on to a higher expertise/authority level personnel.
Every organization should have an escalation model. Let’s review in simple terms to demonstrate.
At very basic level, first we have Tier 1. This tier is responsible for recording and detecting the incident. This will come in the form of some type of request. It could be from a user, customer, system, application etc. If the request is a Service Request, for example, new employee needs a domain account setup, then this type of request, as mentioned previously, would go through a Service Request Procedure. Once the incident reaches our Tier 1 line, it will go to the next step, Support. If our Support Desk was able to resolve the incident, we go right to the Resolution Recovery and close the incident.
However, if the Support Desk was not able to resolve the incident, escalation comes in, what we call Tier 2. This escalation, as mentioned earlier, can be horizontal or vertical.
- Functional (Horizontal) Escalation is a process of bringing in co-workers, in house personnel with special skillsets or additional access privileges.
- Hierarchical (Vertical) Escalation involves going to upper level tiers. For example, when dealing with a network problem, it would involve us contacting our data center which is outside of our organization.
From the Service Desk design standpoint, we would want to develop Horizontal Escalation first to make sure we have personnel to support our users.
Benefits, Costs and Challenges of Incident Management
Let’s take a look at the benefits of Incident Management first:
- Reduced business impact from problems and incidents, including service requests allows for greater effectiveness
- It will allow you to put in place proactive enhancements and changes that will make your business run smooth
- Business-focused SLA management
- By improving monitoring and performance, we receive better service quality
- Better staff utilization and efficiency
- By learning from incidents and problems, it will allow us to eliminate future incidents and service requests
- Help us build improved customer/user satisfaction
Everything comes with price and challenges. Incident Management has them too:
- Initial planning and implementation – communication, training process
- Training and ramp-up
- Hardware and software tools that will be used (monitoring system, ticketing system)
- Users/staff bypass procedures; if the process is too complex or too bureaucratic
- Incident backlog – process overload
- Too many escalations
- Lack of clear definitions, SLAs, commitment from staff and management
As you can see, there are many benefits to implementing an incident management process and every organization should have a clearly defined incident management process if they wish to effectively support their internal and external clients.