Site Reliability Engineering Manager

No longer accepting applications

Job TypeHybrid

Experience Deep experience

Job PositionCloud/DevOps

UpdatedDec 09, 2025

LocationHod HaSharon

SalaryN/A

We’re building a lean, high-impact Site Reliability Engineering (SRE) function at the core of

our SaaS platform’s production reliability and long-term quality strategy and we’re looking for

a hands-on Team Leader to drive it.

What You’ll Do:

Lead and scale a small SRE team (2–3 total) with end-to-end ownership of
observability and diagnostics across production.
Design and implement a central observability platform supporting engineering,
support, and NOC teams.
Write production-grade code and automation to enhance system reliability, tooling,
and platform resilience.
Drive operational excellence: incident response, alerting, monitoring, and continuous
reliability improvements.

Your Toolbox:

Deep experience in SRE or Production Engineering, ideally in cloud-native SaaS
environments.
Strong coding skills in languages such as Python, Node.js, or TypeScript,
you’re expected to build, not just configure.
Mastery of monitoring, logging, and distributed tracing (e.g., Prometheus, Grafana,
OpenTelemetry).
Solid understanding of CI/CD, Kubernetes, infrastructure as code, and scalable
operations.
A “builder” mindset handson, practical, and quality obsessed.

This role is perfect for someone who wants to define and own a strategic reliability function

from day one.