Senior Site Reliability Engineer

No longer accepting applications

Overview

Job TypeHybrid

Experience senior

Job PositionCloud/DevOps

UpdatedOct 20, 2025

LocationTel Aviv District

SalaryN/A

Skills

Python
Go
Kafka
Django
Cassandra
MySQL
MongoDB
Redis
Linux
Microservices
GitHub Actions
Jenkins
AWS
Kubernetes
Helm
RabbitMQ
Networking
Splunk
Terraform
SLOs
SQS
Storage
Application Design
Tracing
Metrics
Distributed monitoring
Containers
Compute
New Relic
RDS
Lambda
S3
ElasticCache
ElastiCache
EKS
Terraspace

Why Work For Us

Grubhub, part of Wonder Group Inc, is all about connecting hungry diners with our network of over 375,000 merchants nationwide. Innovative technology, user-friendly platforms and streamlined delivery capabilities set us apart and make us an industry leader in the world of online food ordering. When you join our team, you become part of a community that works together to innovate, solve problems, grow, work hard and have a ton of fun in the process!

About the Opportunity:

Grubhub, a leader in connecting diners with restaurants nationwide, is seeking a Senior Site Reliability Engineer to join our Campus and On-Site team. This role is crucial for simplifying the dining experience for students across the US. You will be instrumental in architecting resilient and self-healing solutions, managing AWS infrastructure, closing observability gaps, designing scaling approaches, and shaping incident management processes. Your contributions will span the entire development lifecycle, encompassing the building and maintenance of CI/CD pipelines. Collaboration with other SRE teams is vital for guidance, knowledge sharing, and fostering camaraderie.

About the Team:

Our On-Site SRE team is dedicated to building more resilient and self-healing solutions. You'll contribute to managing AWS infrastructure, addressing observability challenges, designing scalable systems, and refining incident management processes. We emphasize close collaboration with other SRE teams for mutual support, knowledge exchange, and team spirit. You will also partner with service owners to design and build robust CI/CD pipelines and contribute to the long-term architectural vision of our products.

The Day to Day:

As an SRE within the "Runtime Engineering" organization, you will co-own critical production service designs, ensuring their high reliability. You will actively drive improvements in reliability and observability using SLOs and telemetry data. Your responsibilities include developing and enhancing internal tools and automation software to effectively and safely maintain production services. You will also lead reliability-focused practices, including Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Design, and Incident Postmortems. As a senior engineer, you will also be responsible for mentoring junior engineers.

What You'll Need:

Experience:
Senior SRE: 4+ years of experience
SRE II: 2+ years of experience
Technical Skills:
Deep knowledge of CI/CD tools (e.g., Jenkins, GitHub Actions).
Software engineering experience in Python, Go, or a similar object-oriented language.
Proficiency with datastores (MySQL, Mongo, Cassandra, Redis) and message brokers (Kafka/SQS/RabbitMQ).
Experience with Microservice Architecture and Application Design.
Distributed monitoring experience, including SLOs, metrics, and tracing.
Working knowledge of Kubernetes-based software solutions and their ecosystem.
Working knowledge of Cloud technologies (AWS, Compute/Containers, Storage, Linux, networking).

Soft Skills:
Strong technical writing, documentation, and communication skills.
Experience with highly trafficked web-based services.

About Our Tech:

The On-Site tech stack primarily utilizes Python, with some services written in Go, for tooling, automation, and service code. We leverage Django as our primary web framework. For monitoring, we use New Relic and Splunk. Our robust infrastructure is built with Infrastructure as Code (IaC) using Terraspace (wrapped around Terraform). Our services run on Kubernetes, deployed via Helm. Our cloud technologies encompass various AWS services, including EKS, S3, ElastiCache, and Lambda. Data technologies include MongoDB, MySQL (RDS), Redis (ElasticCache), RabbitMQ, and Kafka. CI/CD is managed through Jenkins. The On-Site tech stack handles a significant portion of Grubhub's daily orders and is rapidly growing. Your role will be pivotal in ensuring the platform's scalability to support our continuously expanding customer base, evidenced by the addition of 30 new campuses and a 25% year-over-year increase in order volume.

Perks:

We offer flexible PTO, comprehensive health programs, abundant opportunities for learning and career growth, and engaging events led by our Culture Crew. Grubhub is an equal opportunity employer committed to diversity and inclusion. We value innovation, problem-solving, calculated risk-taking, hard work, and, most importantly, having a lot of fun!

Grubhub

Similar jobs

Integration Engineer

AshdodJul 05, 2026
Computer System Engineer

Migdal HaEmekJul 02, 2026
DevOps Engineer

Tel Aviv DistrictJun 14, 2026
SysOps Engineer

Yoqneam IllitApr 13, 2026
מהנדס/ת DEVOPS ותמיכת תשתיות מחשוב

HaifaJun 29, 2026
Engineering Team Leader

Tel Aviv-YafoJun 19, 2026
Devops

RehovotJun 22, 2026
Development Team Lead

Tel Aviv DistrictJul 01, 2026

Your Account

Your Account