DevJobs

SRE Engineer

Overview
Skills
  • Docker Docker ꞏ 5y
  • Kubernetes Kubernetes ꞏ 5y
  • Grafana Grafana
  • Cloud platforms ꞏ 3y
  • Datadog
  • Prometheus Prometheus
Description

Key Responsibilities:

  • Ensure critical systems meet uptime and performance SLAs (Service Level Agreements) and SLOs (Service Level Objectives)
  • Participate in on-call rotations, lead post-mortems, and drive root cause analysis
  • Implement redundancy, failover, and high availability strategies to keep services running smoothly.
  • Build and maintain robust monitoring, alerting, and observability systems (e.g., Prometheus, Grafana, Datadog)
  • Ensure the security of infrastructure and pipelines by implementing best practices for access control, encryption, and vulnerability management.
  • Collaborate with DevOps/Dev teams to build, maintain, and improve CI/CD pipelines
  • Have fun with a great team while tackling hard challenges.

Requirements

  • 5 years of experience designing, deploying, maintaining, and troubleshooting large-scale distributed systems.
  • Hands-on experience with infrastructure services such as caching systems, message queues, distributed storage, and load balancers.
  • Proven experience in building and maintaining monitoring solutions using tools like Prometheus, Grafana, or equivalent platforms.
  • 5 years of hands-on experience with containerization technologies like Docker and orchestration tools like Kubernetes.
  • At least 3 years of experience working with cloud platforms
  • Understanding of network security principles (e.g., segmentation, firewalls, VPNs, zero trust)
  • Familiarity with securing cloud resources: encryption, security groups, secrets management, etc
  • Cloud certifications – Advantage
  • Bachelor's degree (Computer Science, Computer Engineering, Data science) - Advantage
DriveNets