Reliability Engineer 3
Job Title: & Reliability Engineering – 3
Education Qualification & Work Experience Criteria:
- Bachelor's Degree in Engineering (Computer Science or IT or equivalent technical discipline)
- Minimum 5 Years of experience in an IT or product development or telecom or equivalent work environment.
This position is responsible for supporting the lifecycle, modification, and operational support of monitoring tools within a global enterprise environment. This role will act as a liaison to our Infrastructure Team, engineering and business customers by providing capabilities assessments, aligning monitoring product to business need, and engineering technical solutions. Experience with application performance monitoring (APM), End User Experience Monitoring, Infrastructure Monitoring, event correlation, and automated remediation are highly desirable. Projects and other work as directed.
- Architecture, implementation, administration, and operation of monitoring infrastructure for Comcast’s next generation video delivery offering.
- Work with Agile application engineering and operations teams to design and implement monitoring for consumption by a 24x7 Network Operations Center (NOC)
- Work with engineering leadership to tune alarms and response procedures
- Leverage open source tools and best practices where required to implement world-class systems, infrastructure and application monitoring.
- Experience building and managing APIs with a good understanding of monitoring platforms
- Familiar with new technologies in machine learning and works to implement new solutions for monitoring
- Oversee the offshore team of monitoring engineers in the day-to-day systems monitoring lifecycle.
- 3 + years’ experience with the Zabbix and/or Nagios monitoring engine (experience with Op5 commercial package a plus)
- 3 + years’ experience with Nagios extensions and the Nagios open source community
- Good Knowledge on AWS services-Redshift, Kinesis Firehose, S3, SQS, Lambda, API Gateway, Dynamo DB, AWS machine learning.
- Experience in NodeJs is a big plus.
- Intermediate to advanced Perl and Bash skillset for extending core Nagios functionality and implementing plugins
- Experience monitoring large-scale application deployments (tens of thousands of application nodes across tens of data centers.)
- Superior command of the *nix command line and GNU utilities
- Experience managing offshore teams a plus
- Ability to work in a dynamic environment with multiple times per day application deployments required.
- Dynamic scripting experience with Ruby and/or Python preferred
- Experience designing and implementing dynamic monitoring systems for elastic virtual private (Amazon/Openstack) clouds is a plus.
- 10 -15% travel is required to rotate with CIEC team and travel to the USA and contractor sites in India as needed.
Comcast NBCUniversal is an equal opportunity, Veterans, Disabled and LGBT employer&