Engineer 2, Web and Mobile App Support
The resource is a member of the Residential Reliability Engineering Support Team responsible for developing and maintaining standard operating procedures (SOP's) specific to our Xfinity Home product. The Incident Manager will ensure that all incidents are identified, triaged and resolved within the Service Level Agreement. Additionally, this position will be responsible for ensuring that all root cause analysis is promptly and properly documented for high severity incidents and delivered to the respective Product owners. This position will interface with Comcast Product, Change, Problem, Release, Engineering, Marketing and Operations Management teams.
- Lead technical investigation and triage of production issues; analyze logs, perform end-to-end investigation including but not limited to network, software and infrastructure issues
- Leads technical outage bridges and engages appropriate resources to drive issues to closure
- Document triage and training procedures (including enhancing existing procedures)for complex application workflows (including API's and endpoints)
- Draft engineering production support readiness documentation
- Actively manage relationship with key stakeholders, markets and resolver groups
- Respond to service-level issues and work to restore normal service operations as quickly as possible
- Develop procedures for incident triage and management, metric and measure creation, management and administration of monitoring tools
- Oversee the timely execution of scheduled and repeatable processes such as periodic system validations, daily triage, and system monitoring and event log management
- Work with architecture, development and engineering teams to identify root cause for incidents and create an action plan for resolution
- Monitor systems and services for most efficient operation, identifying fault conditions as well as opportunities for further optimization
- Analyses problems in design, configuration, data flow, and data state within a highly complex multi-product provisioning system
- Assist in training and developing junior engineers and offshore resources
- Identify and lead the implementation of creative process and technology solutions within the team
- Provide mentorship and team development opportunities
- Assist in representing Production Support to the organization ensuring that high-availability and the ability to identify customer-facing issues is included in the development or deployment of new products and services.
- Identify and recommend opportunities for "clean-slate" process improvement with regards to incident management, fault monitoring, triage procedures and issue escalation
- Maintain escalation and contact lists for mission critical systems and services
- Consistent exercise of independent judgment and discretion in matters of significance
- Regular, consistent and punctual attendance. Must be able to work nights and weekends, variable schedules(s) as necessary
- Bachelor's degree or equivalent work experience is required.
- Generally requires 3 to 7 years of experience
- Strong understanding of ITIL and Incident and Problem Management experience.
- Experience defining, implementing, and monitoring IT service level processes.
- Experience in application development and engineering a plus
- An understanding of Cloud infrastructure (Network and Server architecture)
- Experience with monitoring technologies such as OIV, Splunk, Op5 and the Haystack tools is a plus
- Must be able to work nights and weekends as part of an after-hours on-call support schedule
Comcast is an EOE/Veterans/Disabled/LGBT employer