Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Transcript of Untitled Prezi
Highly Trained People
High quality Data
Strong Problem Solving Skills
Good Process Compliance
A Service and Quality Culture
A Continuous Improvement ethos
What is Problem?
Standardised Problem Management Process
Standardised performance measurements
Conforms to IT Service Management framework
Executed by a multi-step process
What is an Incident?
What is a Change?
What's the difference between an Incident and a Problem?
Problem Management manages a problem throughout it's the lifecycle, from detection to resolution.
The primary objectives of Problem Management are to prevent problems and resulting incidents from happening, to eliminate recurring incidents, and to minimize the impact of incidents that cannot be prevented.
Problem Management includes all activities needed to diagnose the
root cause of incidents, and submitting change request to resolve those
problems. It also maintains information about problems and workarounds for use by Incident Management.
Effective Problem Management helps to ensure that availability and quality of services meets the business needs. It also helps to save money by reducing the number of incidents and the effort needed to resolve them.
This results in less downtime for the business.
Mandatory for all permanent and temporary staff involved in Problem Management, including, but not limited to:
- ISS staff,
- Business users,
- Third parties (providing IT)
The cause is not usually known at the time a Problem Record is created.
A problem is defined as the cause of one or more incidents.
Definition of a Problem
Quicker time to Recover from Incidents
Improved availability of ISS Services and user experience by reducing the frequency and severity of business impacting interruptions
• Common understanding of the Problem Management Process
Baseline established for improvement measurement
Baseline established for audit purposes
• Significantly increased availability of IT Services by reducing the number and duration of incidents
• Higher productivity of staff by reducing unplanned works caused by incidents through prioritising problem solving activities in line with requirements
• ISS organisation proficient in problem analysis and troubleshooting methodologies providing industry recognised quicker resolution times of problems
• A knowledgebase of known error information enabling reduced resolution times for recurring incidents
• User and Business confidence that a process exists, is maintained and actively managed to keep IT services working as intended, leading to improved User satisfaction with IT Services
• Confidence that all recorded problems will be addressed and progressed in line with the associated priority supported by toolset enabled escalation and quality checks
• The proactive identification of errors and weaknesses in the infrastructure, thus removing potential failures before they impact on Users and Business
All of ISS will be provided with access to Axios assyst providing web access to view problems, obtain statuses and collaborate with other ISS staff.
Failed changes that require investigation by Problem Management
Respond to issues after they occur
Take inputs from Major Incidents and Post Incident Reviews
Need for Incident Management to keep Problem Management in mind during processing and documenting
Analyse Incident trends
Recurring incidents flagged up by Helpdesk staff
Driven by need for Improvement for a given Service
Use FMEA (Failure Mode and Effects Analysis)
Tools to assist with Establishing Root Cause
ITIL (r) Problem Management describes how to manage problems through the lifecycle but does not describe how to establish root cause.
There are a range of tools and aids available to staff, including:
Kepner Tregoe http://www.kepner-tregoe.com/
Ishikawa (Fishbone Analysis)
For some of these, and others, go to http://www.mindtools.com/pages/main/newMN_TMC.htm#cause
ITIL is a registered Trade Mark of the Office of Commerce in the United Kingdom and other countries.
An Incident is an unplanned interruption to a service or a reduction in the quality of the service being delivered against defined Service Level Targets.
Definition of an Incident
An RFC is a request that has been made to make a technical change to any in-scope environment. It is the method by which changes are recorded, controlled and communicated to the affected, or potentially affected users.
A change can also be defined as the addition, modification, replacement, or removal of a configuration item or asset, or attribute of those recorded in the Configuration Management Database (CMDB)
Definition of a Request for Change (RFC)
PROBLEM MANAGEMENT WILL COVER:
•Capture important problem information during the incident lifecycle using Situation Appraisal which will add value to Problem Analysis by the 2nd / 3rd line teams.
Establishing the root cause of incidents so that preventive actions can be completed to eliminate recurrence
• Proactive and pre-emptive identification and resolution prior to incidents occurring
• Record known errors, solutions and workarounds
• Requesting changes to implement preventive solutions resulting from effective problem analysis
Problem Management will:
• Identify and take ownership of problems for all ISS Services
• Take steps to reduce the business impact of incidents and problems that have previously occurred through established workarounds
• Identify the root cause of problems and initiate activity aimed at establishing workarounds and permanent solutions
• Improve user perception of all ISS Services through the reduction of repeat incidents
• Increased first-time fix rate at the Service Desk as workarounds can be deployed to increase the speed of IT service restoration
• Proactively utilise incident and problem trend data to identify business impacting issues and implement solutions via change management to prevent re-occurrence
• Pre-emptive initiatives aimed at preventing any occurrences of incidents through collaboration with Event Management, suppliers and other processes identifying early detection of potential service impacting issues
• Produce Problem Management reporting as part of scorecard and dashboard service performance to agreed targets
• Conduct regular assessment of the effectiveness of problem management, identify and implement improvements providing positive business capabilities and outcomes
Changes to effect permanent updates
Updates (i.e. not requiring an RFC)
Workarounds (to effect faster incident recovery in the future)
Minimising the impact of problems affecting availability of IT services whilst.....
Minimising expenditure of resource and maintaining the highest level of user satisfaction.
Trend data related to repeat incidents and their impact
Validates Problem Management developed workaround, fixes
Requests for Change RFCs should be raised during problem resolution so that the impact of unrelated RFCs can be understood and any risks identified and mitigated
Configuration Items must be updated following the resolution of any problem controlled through Change Management
Problems with a capacity related root cause will result in an update to the Capacity Plan
Problem Management identifies and captures risks into the IT Risk Register that require ISS and Corporate governance (as appropriate) to risk management policy
Known Error records are within the Known Error Database KEDB contained and managed within the Knowledge Management System KMS
Inputs come from a range of sources; staff, tools and other processes.
There are multiple ways in which detection can be triggered.
Proactively ISS will raise problem records following analysis of repeat incidents where the root cause is unknown and the team believe the same issue will continue to recur or persist.
Other proactive detection methods include the detailed analysis of a single Incident by the Service Desk that highlights an underlying problem exists, or an incident that has a wider impact than initially reported.
Automated detection will commonly be via Event Management tools revealing a need to raise a problem record.
The Problem Initiator raises a Problem Candidate Record in Axios assyst.
Axios assyst has the facility to ensure all information is date and time stamped with the team linking all related incident records. The information captured within problem records can be modified for each user to make them specific.
• Unique reference number
• Description of problem
• Problem Champion assigned
• Potential impact of problem
• Voice of the Customer / Actual impact of problem
• IT Service affected
• Problem category
• Problem priority
• Date and time the problem first occurred
• Date and time the problem was first reported
• Reported by
• Scope of problem
• Current workaround
• Root cause
• Problem status e.g. Open / Pending / Resolved / Closed
• Allocation status of problem e.g. Unassigned / Assigned
• Description of activities undertaken to resolve the problem by actionee
• Associated incident record(s)
• Associated change record(s)
• Date and time of resolution, by whom
• Closure/cause category
• Closure date and time, by whom