Comcast Careers

Engineer 3, Web and Mobile App Support

Philadelphia, PA
Technology (Technology - IT)

Job Description

Business Unit:

Job Summary:

The resource is a member of the Residential Reliability Engineering Support Team responsible for developing and maintaining standard operating procedures (SOP's) specific to our Xfinity Home product. The Incident Manager will ensure that all incidents are identified, triaged and resolved within the Service Level Agreement. Additionally, this position will be responsible for ensuring that all root cause analysis is promptly and properly documented for high severity incidents and delivered to the respective Product owners. This position will interface with Comcast Product, Change, Problem, Release, Engineering, Marketing and Operations Management teams.

Core Responsibilities:

  • Lead technical investigation and triage of production issues; analyze logs, perform end-to-end investigation including but not limited to network, software and infrastructure issues.
  • Document training and triage procedures (including enhancing exiting training and triage procedures) and complex application workflows (including API's and endpoints.)
  • Draft Residential Engineering production support readiness documentation.
  • Actively manage relationship with key stakeholders, markets and resolver groups.
  • Respond to service-level issues and work to restore normal service operations as quickly as possible
  • Assist in training and developing junior Engineers
  • Identify and lead the implementation of creative process and technology solutions within the team
  • Provide mentorship and team development opportunities
  • Assist in representing Production Support to the organization ensuring that high-availability and the ability to identify customer-facing issues is included in the development or deployment of new products and services.
  • Identify and recommend opportunities for "clean-slate" process improvement with regards to incident management, fault monitoring, triage procedures and issue escalation
  • Develop procedures for incident triage and management, metric and measure creation, management and administration of monitoring tools
  • Oversee the timely execution of scheduled and repeatable processes such as periodic system validations, daily triage, and system monitoring and event log management
  • Work with architecture, development and engineering teams to identify root cause for recurring incidents and create an action plan for resolution.
  • Monitor systems and services for most efficient operation, identifying fault conditions as well as opportunities for further optimization
  • Maintain escalation and contact lists for mission critical systems and services
  • Consistent exercise of independent judgment and discretion in matters of significance
  • Regular, consistent and punctual attendance. Must be able to work nights and weekends, variable schedules(s) as necessary

Job Specification:

  • Bachelor's degree in Networking Engineering, Business or equivalent work experience is required.
  • Strong understanding of ITIL and Incident Management practices.
  • Generally requires 5 to 7 years of experience
  • 5+ years' experience in an Enterprise 24x7 Network Operations Center or Production Support environment.
  • Minimum 3 years' Customer Service experience, Incident and Problem Management required.
  • Minimum 3 years' experience defining, implementing, and monitoring IT service level processes.
  • Technical expertise in network and server administration with hands on experience.
  • Experience working in a large (1000+ server) and complex operations environments.
  • An understanding of Cloud infrastructure (Network and Server architecture).
  • Experience with monitoring technologies such as OIV, Splunk, Op5 and the Haystack tools is a plus

Comcast is an EOE/Veterans/Disabled/LGBT employer