DevJobs

SRE production engineer

Overview
Skills
  • Bash Bash ꞏ 3y
  • Python Python ꞏ 3y
  • Ruby Ruby
  • Go Go
  • GitHub Actions GitHub Actions
  • Jenkins Jenkins
  • GCP GCP ꞏ 3y
  • Docker Docker
  • Kubernetes Kubernetes
  • Terraform Terraform
  • Ansible Ansible
  • Puppet Puppet
  • Grafana Grafana
  • Chef Chef
  • Prometheus Prometheus
  • GitLab CI
  • CloudFormation
  • Stackdriver
Job Description

We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team. The ideal candidate will have extensive experience with Google Cloud Platform (GCP), proficiency in Python and Bash, and strong capabilities in script writing. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of our infrastructure and applications.

Key Responsibilities

  • Infrastructure Management:
  • Monitor and maintain scalable and reliable infrastructure on Google Cloud Platform (GCP).
  • Implement different scripts using bash, python for regular maintainace. .
  • Monitoring and Incident Response:
  • Develop and maintain monitoring, alerting, and incident response systems to ensure system reliability and performance.
  • Respond to incidents, troubleshoot issues, and perform root cause analysis to prevent future occurrences.
  • Automation and Scripting:
  • Write and maintain scripts in Python and Bash to automate operational tasks and processes.
  • Develop tools and frameworks to enhance the efficiency and reliability of the engineering team.
  • Performance Optimization:
  • Analyze system performance and implement optimizations to improve efficiency and reduce latency.
  • Conduct capacity planning and load testing to ensure systems can handle peak traffic.
  • Collaboration and Communication:
  • Work closely with development teams to ensure seamless integration and deployment of applications.
  • Provide guidance and support to engineers on best practices for reliability and scalability.
  • Security and Compliance:
  • Ensure systems and applications comply with security policies and industry standards.
  • Implement and maintain security controls to protect data and infrastructure.

Qualifications

  • Education:
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • Experience:
  • 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role.
  • Proven experience with Google Cloud Platform (GCP), including Compute Engine, Kubernetes Engine, Cloud Storage, BigQuery, Elasticsearch, Prometheus .
  • Technical Skills:
  • Proficiency in Python and Bash for scripting and automation.
  • Strong knowledge of infrastructure as code (IaC) tools such as Terraform, Ansible, or CloudFormation.
  • Experience with monitoring and observability tools like Prometheus, Grafana, and Stackdriver.
  • Familiarity with CI/CD pipelines and tools such as Jenkins, GitLab CI, or GitHub Actions.
  • Understanding of networking, security, and system administration principles.
  • Soft Skills:
  • Excellent problem-solving skills and attention to detail.
  • Strong communication and collaboration skills.
  • Ability to work independently and as part of a team in a fast-paced environment.

Preferred Qualifications

  • Experience with containerization and orchestration tools such as Docker and Kubernetes.
  • Knowledge of additional programming languages like Go or Ruby.
  • Experience with configuration management tools like Puppet or Chef.
  • Certification in Google Cloud Platform (e.g., Professional Cloud DevOps Engineer).

Fortinet