DevJobs

DevOps Tech Lead

Overview
Skills
  • Python Python
  • Go Go
  • Bash Bash
  • Kafka Kafka
  • Redis Redis
  • Elasticsearch Elasticsearch
  • Jenkins Jenkins
  • GitHub Actions GitHub Actions
  • CI/CD CI/CD
  • AWS AWS ꞏ 5y
  • Kubernetes Kubernetes
  • Helm
  • Terraform Terraform
  • Grafana Grafana
  • CloudFormation
  • CloudWatch
  • EKS
  • Lambda
  • Prometheus Prometheus
  • RDS
  • S3
  • AI-assisted tools
  • ArgoCD
  • FinOps
  • Loki
  • ML pipelines
  • AI
  • AIOps

We are looking for a hands-on DevOps Tech Lead to join our R&D organization and lead the design, scalability, and reliability of our SaaS platform — a large-scale, multi-tenant system fully operated on AWS (EKS, Lambda, S3, RDS, Kafka, Redis, and more).

The DevOps Tech Lead will be responsible for shaping our infrastructure roadmap, optimizing CI/CD, guiding engineers on best practices, and ensuring our system runs securely, efficiently, and at scale — while driving automation and intelligent operations, leveraging AI-assisted tools and observability.


Why Join Us:

  • Be a key player in scaling and modernizing a global cyber intelligence SaaS serving leading enterprises.
  • Collaborate with top-tier engineers and architects driving automation and intelligent operations.
  • Take ownership and lead initiatives that directly affect uptime, reliability, and efficiency.
  • Work in an environment that encourages innovation, experimentation, and adoption of AI and automation in day-to-day operations


Major Responsibilities:

  • Lead the DevOps domain: define architecture, automation strategy, and reliability goals for the entire R&D organization.
  • Own infrastructure scalability and performance: ensure our Kubernetes (EKS)-based environments are resilient, efficient, and cost-optimized.
  • Develop and maintain CI/CD pipelines using GitHub Actions, Jenkins, or ArgoCD to support fast, reliable, and automated delivery.
  • Drive observability and reliability initiatives: monitor system health via Prometheus, Grafana, and CloudWatch; define metrics, alerts, and SLOs.
  • Leverage AI/automation tooling (e.g., anomaly detection, alert classification, cost prediction) to enhance monitoring, response, and efficiency.
  • Manage infrastructure as code (Terraform, Helm, CloudFormation) and enforce IaC best practices.
  • Collaborate with engineering teams to design infrastructure for new services, improve developer experience, and ensure secure deployments.
  • Ensure system uptime and production readiness: lead root cause analysis, incident response, and capacity planning.
  • Mentor DevOps engineers on cloud architecture, observability, and automation excellence.
  • Continuously evaluate emerging technologies, including AI-driven ops tools, to improve scalability, reliability, and delivery velocity.


Requirements:

Must-Have:

  • 5+ years of experience as a DevOps / SRE / Infrastructure engineer, with at least 2 years in a technical leadership role.
  • Proven experience managing large-scale SaaS systems on AWS (EKS, RDS, Kafka, Redis, S3, Lambda, CloudWatch).
  • Deep understanding of Kubernetes architecture and container orchestration at scale.
  • Hands-on experience with Terraform, Helm, and CI/CD automation (GitHub Actions, Jenkins, or ArgoCD).
  • Strong scripting skills in Python, Bash, or Go.
  • Familiarity with monitoring and alerting tools (Prometheus, Grafana, Loki, ELK).
  • Experience using or integrating AI-assisted tools (e.g., for observability, auto-remediation, or developer productivity).
  • Excellent troubleshooting skills and a proactive mindset for reliability and performance optimization.

Nice-to-Have:

  • Experience in multi-environment / multi-tenant SaaS or cybersecurity / threat intelligence systems.
  • Knowledge of AI/ML pipelines or AIOps concepts.
  • Background in cost optimization and FinOps practices.
  • Familiarity with Kafka scaling, Redis clustering, and AWS service-level tuning.


CyberInt