Thursday, December 9, 2010

ITIL: Incident versus Problem Management

ITIL: Incident versus Problem Management

I've noticed that although ITIL has been around for more than 20 years, there's still a lot of confusion about what Incident Management is all about and how it differs from Problem Management.

I hope that the following list, which I'll keep adding to, will help you to understand and if necessary "sell" these differences. Given more time I'll start categorizing them under People, Processes, Products, and Partners (or feel free to do this for me).

Feel free to dispute any item on this list, and also feel welcome to send me more recommendations to be added to this list. It's only a start and I won't rest until I've found at least 100 differences between these two processes!










Incident ManagementProblem Management
Mainly reactiveMainly proactive
Strong focus towards business and user communityStrong focus towards IT and technology experts
Uses the Known Error Database (KEDB)Populates the Known Error Database (KEDB)
Restores services as quickly as possible Less emphasis on speed, more emphasis on finding real solutions
Not responsible for creating known error recordsResponsible for creating known error records
Predominantly applies temporary fixes, also known as workarounds or band-aid fixesIs all about finding more structural permanent solutions
Typically deals with single individual incidentsPerforms analysis on large volumes of incidents to detect trends and/or patterns
Applies a high level of people languageApplies a high level of technical language
Has a strong relationship with SLAsHas a strong relationship with OLAs and contracts
Processes reoccurring incidentsEliminates reoccurring incidents
Frequency and impact of related incidents typically not taken into account when prioritizing a (new) incident Frequency and impact of related incidents typically taken into account when prioritizing a (new) problem
Users are able to generate incidents Users are not able to generate problems
Increases support costs due to the repetitive nature of resolving repetitive incidents without providing a structural long-term solutionReduces support costs with resolving repetitive incidents in a structural long-term manner
incident records may be the sameproblem records should be unique
The focus is short-termThe focus is long-term
Escalates incidents to other teams (still part of the incident management process) to ensure timely service restorationSubmits change requests into the change management process with proposed solutions that eliminate known errors
Does not influence the number of incidents that are reported by usersDoes influence the number of incidents that are reported by users
Investigation and diagnosis are often performed in parallelInvestigation and diagnosis are often performed sequentially
An incident can be closed although it may be unclear what has caused it (the so called root cause is often unknown)Problems cannot be closed without a clear understanding of its root cause
Major incident reviews are not mentioned as part of ITIL's incident management process flow (incident model)Major problem reviews are mentioned as part of ITIL's problem management process flow (problem model)
Many incidents may be linked to the same problemMany problems are typically not linked to the same incident
Not responsible for maintenance of the Known Error Database (KEDB)Responsible for maintenance of the Known Error Database (KEDB)
Doesn't improve the overall stability of the IT infrastructureDoes improve the overall stability of the IT infrastructure
Able to boost user satisfaction short-termAble to boost user satisfaction long-term
Process members often "static"Process members often "dynamic"
Most effort comes from lower (and typically cheaper) level support teamsMost effort comes from higher (and typically more expensive) support teams
Incident resolution techniques are more repetitive across incidentsProblem resolution techniques are more unique for each problem
Often includes full-time rolesOften includes part-time roles
Often performed with use of internal resourcesOften performed with the support of external resources
Predominantly operates at a user levelPredominantly operates at an enterprise level
Has access to many effective commercial of the shelves (COTS) incident management systemsHas access to fewer effective commercial of the shelves (COTS) problem management systems