DevJobs

Streaming Infrastructure DevOps Engineer

Overview
Skills
  • Python Python
  • Bash Bash
  • Kafka Kafka
  • Docker Docker
  • Kubernetes Kubernetes
  • Helm
  • Istio
  • Terraform Terraform
  • RabbitMQ RabbitMQ
  • Grafana Grafana
  • Prometheus Prometheus
  • Confluent Cloud
  • Confluent for Kubernetes
  • GitOps
  • Pulumi
  • GovCloud

Department: Engineering

Location: Tel Aviv

Description

An operations-first engineer with deep expertise in running, scaling, automating, and monitoring streaming infrastructure like Kafka and RabbitMQ.


At Armis, streaming is at the heart of our product. We operate hundreds of streaming applications that transform, aggregate, analyze, and enrich the most valuable data we collect from our clients. We process billions of events and petabytes of raw data daily — and we’re just getting started.

Our mission is to provide a rock-solid, scalable, and secure infrastructure foundation that empowers our engineers to build and operate streaming services with confidence. This includes managing the full lifecycle of Kafka and RabbitMQ clusters, automating deployments, securing system access, and building out observability and monitoring capabilities that scale with our growth.

We are seeking a skilled and motivated DevOps engineer with deep familiarity in the streaming ecosystem to join our elite infrastructure team. If you're excited by the challenge of operating mission-critical systems at scale and optimizing the developer experience through automation and tooling, we’d love to hear from you.


What you'll do:

  • Automate Deployment and Operation
    Oversee deployment of Kafka and RabbitMQ clusters (including Confluent Cloud & CFK). Build automation pipelines to ensure repeatability and resiliency across environments.

  • Monitor and Support Production Systems
    Own production stability of global Kafka clusters. Handle on-call rotations, incident management, troubleshooting, and scaling challenges.

  • Improve Infrastructure Observability
    Build and maintain observability systems: dashboards, alerting pipelines, metrics collection (Prometheus, Grafana, etc.).

  • Optimize System Performance
    Collaborate with peers on benchmarking and optimization initiatives. Work on tuning Kafka brokers, cluster configurations, and runtime parameters.

  • Provide Developer Support and Training (Infra-focused)
    Help developers configure topics, quotas, and consumers appropriately. Train service owners to interpret monitoring data and avoid pitfalls.

  • Develop and Maintain Infrastructure
    Contribute to building infrastructure tools and scripts (IaC, Helm charts, etc.) that make provisioning and managing clusters reliable and efficient.

  • Secure Infrastructure Access
    Configure and maintain secure access patterns across streaming infrastructure, ensuring proper authentication and role-based access controls are enforced for both developers and services.


What we expect:

  • 8+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.
  • Deep hands-on Kafka experience, including deploying, maintaining, scaling, and monitoring clusters.
  • Experience with RabbitMQ.
  • Extensive experience with Docker, Kubernetes, Helm, and GitOps-style deployments.
  • Infrastructure as Code experience (Terraform, Pulumi, etc.).
  • Strong skills in scripting and automation (Python, Bash, etc.).
  • Familiarity with Confluent Cloud, Confluent for Kubernetes, and similar tools.
  • Solid understanding of authentication and authorization mechanisms in distributed systems.
  • Production support mindset – with proven troubleshooting and incident resolution history.
  • Collaboration and communication skills – especially with dev teams depending on platform support.
  • Experience with Istio Service Mesh (bonus).
  • Experience with GovCloud (bonus).

Bonus Qualities:
  • Mentorship and leadership experience in infrastructure or SRE teams.
  • Contributions to automation or monitoring open-source tooling.
  • Active participant in SRE or DevOps communities.
  • Conference speaker or internal tech trainer.
  • Technical writing about infrastructure automation or reliability.
Armis