DevJobs

Senior SRE

Overview
Skills
  • Bash Bash
  • Perl Perl
  • Python Python
  • Ruby Ruby
  • Elasticsearch Elasticsearch
  • Kubernetes Kubernetes
  • Terraform Terraform
  • Grafana Grafana
  • CloudFormation
  • Public Cloud
  • Datadog
  • OpenTelemetry
  • Prometheus Prometheus
About Atera

Atera is inventing a new way of managing IT end-to-end for IT professionals and teams worldwide.

By creating an AI-powered IT platform, Atera's all-in-one Remote Monitoring and Management (RMM) Helpdesk, Ticketing, and Reporting solution helps more than 23,000 IT pros achieve 10X operational efficiency, cut down time-to-resolution, and deliver better outcomes faster. Located in the heart of Tel Aviv, our team of passionate, like-minded individuals is driven by a shared mission to unleash everyone's potential and constantly innovate. We create an open, transparent, and supportive environment that gives our teams the autonomy, resources, and freedom to thrive.

This is a full-time and onsite (hybrid-remote) role at our Tel Aviv office.

Atera is looking for a motivated senior site reliability engineer to join us and build the framework for the engineering ops to scale.

Responsibilities:

  • Build tools and automation to monitor system health, performance, and reliability, ensuring quick detection and resolution of any anomalies or issues.
  • Write high-quality infrastructure-as-code that automates the provisioning, deployment, scaling, and effective monitoring, alerting, and logging solutions.
  • Work with other engineers to ensure that new services are well-designed, properly monitored, and have well-defined SLIs and achievable SLOs
  • Build and maintain observability pipelines using tools like Prometheus, Grafana, OpenTelemetry, and distributed tracing systems
  • Proactively track our capacity, quotas, and other performance limits to plan for growth.
  • Participate in a 24x7 on-call rotation to handle product availability issues as well as urgent customer support escalations.
  • Investigate and resolve incidents and outages, performing root cause analysis to identify systemic issues and implement preventive measures.
  • Develop and maintain disaster recovery plans and perform regular testing to ensure data integrity and business continuity.

Requirements:

Requirements:

  • 3 + years of experience as an SRE in large-scale, cloud-based production environments
  • Strong experience in designing, implementing, and managing monitoring processes.
  • Familiarity with observability tools (Prometheus, Grafana, ELK, Datadog, OpenTelemetry)
  • Experience in at least one scripting language (Python, Ruby, Perl, Bash) and infrastructure as code technologies (e.g., Terraform, CloudFormation)
  • Strong abilities to lead, design, and execute cross-organization projects
  • Experience in managing container and infrastructure orchestration tools (e.g., Kubernetes, Terraform)
  • Hands-on experience administering public clouds
  • Background in high-scale, high-throughput telemetry or data ingestion systems - Advantage
  • Experience designing SLO frameworks from

Some About Our Benefits

Atera is highly collaborative and, yes, fun! To support you at work (and play), we offer some fantastic perks: ample time to learn from your teammates and contemporaries, time off to relax and recharge, community volunteer days, an annual budget to support your learning & growth, a company-paid trip, and lots more.
Atera Networks