Comcast Careers

Site Reliability Engineer E4

Chennai, IN
Technology (Technology - IT)

Job Description

Business Unit:




CPE Development

Job Title

Eng 4, Software Development Engineer (Software Reliability Engineering)

Position Type

Full Time


Comcast India

Working Location

Chennai, India


Comcast is looking for a talented and dedicated Software Development Engineer to be part of our cloud networking team. Our SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning. The product line is built adopting Google's SRE methodology and to help enhance the way America views entertainment as part of the Xfinity range of products.

Comcast's development environment is advanced and highly integrated. It uses and develops industry standard tools that are combined effectively to support a fast-moving, agile development cycle. The combination of our quest on stability vs agility, operation work vs software engineering, proactive vs reactive work coupled with an effective use of required infrastructure, allows Comcast to deliver features and products against aggressive timelines.

We are looking for engineers with the characteristics like thoroughness and dedication, belief in value of preparation and documentation in addition to awareness of what could go wrong, coupled with a strong desire to prevent it.

Core Responsibilities

As a key technical member of the team, you would

  • Build, utilize and improve tools to automate the deployment, administration and monitoring of Splice Machine's cloud-native platform using Docker, AWS, Azure, Google Cloud Ansible, Influx, Grafana and ELK stack among other powerful big data tools.
  • Develop software and processes for better utilization of underlying cloud resources.
  • Work with the development team to harden, enhance, document, establish process and generally improve the operability and supportability of our systems.
  • Troubleshoot and resolve live production issues by analyzing logs from different sources.
  • Write and maintain scalable system administration and monitoring.
  • The opportunity to go outside your normal duties and work on our blog, attend hackathons and conferences, speak at events, contribute to StackOverflow and open source development or anything else you're interested in that can add to our community


Key requirements and experience include:

● Minimum of 8 years of experience with engineering background on Computer Science, Electronics Engineering, Computer Engineering or related field

● Minimum of 1 years of background in Site Reliability Engineering

● Strong DevOps background for Linux based systems

● Demonstrated ability to write programs using Java based technologies/Scala.

● Experience in shell scripting using Python or Shell

● Must have experience managing cloud production distributed application stack in AWS/Azure/Google cloud

● Docker and Container Orchestration experience

● Basic understanding of software best practices such as Agile-based development, coding guidelines, code reviews etc.,

Preferred Requirements:

● Good knowledge of AWS; EC2 instances, setting up accounts/users, and troubleshooting issues. Azure and Google Cloud is desired.

● Experience using configuration management tools like Ansible, Chef or Puppet is a big plus

● NoSQL/any other database experience

Compliance Disclaimer:

Comcast NBCUniversal is an Equal Oppurtunity, Veterans, Disabled and LGBT employer.