Problem Manager 1, Engineering Operation
Reporting to the Senior Director of Change, Process, and Tooling in the Residential Reliability Engineering Organization, this individual will be responsible for the development, implementation and continued improvement of reactive and proactive ITIL-standard Problem Management practices. They will work to identify recurring Operational issues, determine root cause, and develop and implement problem solutions (including the shaping of the release process) to prevent recurrence of issues. They will construct methodologies by which to determine success and improvement opportunities with key metrics such as ticket state transitions, time in queue, and time to resolve.
Relying heavily on Incident, Change, and Problem ticket data, the manager would also be responsible for aggregating and analyzing telemetry data across the Residential Reliability Product group, building dashboards and connecting related data points to create a cohesive picture of the health of our Residential Operational capabilities to triage incidents and deploy changes.
The right candidate must value collaboration and transparency in supporting the development of a culture and associated processes focused on blameless incident review to identify opportunities for permanently solving problems and known errors as well as driving analysis to identify and proactively solve underlying problems prior to impacting customers.
- Execute proactive and reactive problem management analysis to minimize future problems.
- Monitor trouble tickets to identify impacted system and application outages that require follow-up analysis.
- Drive data collection for reviews of major outages and track problem tasks to completion.
- Engage with appropriate technical and business teams to assist in the resolution of problem records
- Review Problem Management policy and knowledge documentation on a regular basis to ensure relevance and accuracy
- Facilitate task forces aimed at addressing a problematic issue with an unknown root cause.
- Manage queues within ServiceNow and monitor all open and aging cases daily.
- Develop and review compliance metrics and KPI's to identify areas to mature the problem process, policies, and training material.
- Leverage ServiceNow Performance Analytics and ServiceNow reports to identify repeat incidents
- Bachelors' Degree or equivalent
- Engineering, Computer Science
- Generally requires 6-9 years related experience
- Problem or Incident Management experience
- Root Cause Analysis techniques
- Familiarity with the ServiceNow tool; configuration of reports and dashboards.
- Metrics aggregation for multiple sources using SQL.
- Familiarity with metrics graphing tools such as Tableau, or other similar BI tools.
- Understanding and parsing large and disparate data sets.
Comcast is an EOE/Veterans/Disabled/LGBT employer