Lightrun shapes the Future of Autonomous Application Remediation—Software That Fixes Itself. As the market leader in Developer Observability, Lightrun is revolutionizing how developers and SREs address high-impact business challenges. We’re shifting from reactive troubleshooting to real-time, AI-driven remediation—minimizing downtime and ensuring mission-critical systems stay operational.
Recognized by Fast Company and G2, and trusted by industry leaders like Citi, SAP, AT&T, ADP, Booking Holdings, Inditex, and more, Lightrun is redefining observability. Join us to build cutting-edge AI-powered solutions, collaborate with world-class engineers, and shape the future of autonomous software.
We are looking for a strong Senior DevOps Engineer to join the hunt!
Responsibilities
- Actively participate in hands-on technical tasks, contributing to the development, deployment, and maintenance of our observability platform.
- Design, implement, secure and maintain the infrastructure, CI/CD pipelines, and deployment processes to support the Lightrun platform on SaaS/ST/On-Prem deployments.
- Take full end-to-end responsibility for all infrastructure-related projects.
- Collaborate with the development, SE, QA, and product teams to ensure smooth integration, testing, and deployment of software releases.
- Drive automation and scalability efforts to optimize system performance, reliability, and availability.
- Continuously monitor and improve the observability, monitoring, and logging systems to ensure the stability and performance of the platform.
- Stay up-to-date with the latest industry trends and technologies, and evaluate their potential impact and benefits for Lightrun.
- Manage and maintain Lightrun environments across AWS and other public cloud providers.
Qualifications
- 5+ years of experience in a DevOps role, with a strong background in infrastructure management, CI/CD methodologies, and automation.
- Strong hands-on experience in both cloud and on-premise environments, such as AWS, Azure, GCP, and working with on-premise infrastructure.
- Proficiency in infrastructure management and deployment using cloud platforms and on-premise solutions.
- Solid understanding of containerization technologies (e.g., Docker, Kubernetes) and orchestration tools.
- Experience with infrastructure as code tools like Terraform or CloudFormation for both cloud and on-premise environments.
- Proficiency in scripting languages like Python, JavaScript, Go or Bash.
- Excellent knowledge of Linux-based systems
- Hands-on experience with monitoring and observability tools like Prometheus, Grafana, DataDog or ELK stack for both cloud and on-premise environments.
- Familiarity with agile development practices and the ability to work in a fast-paced, collaborative environment.
- Excellent communication skills and the ability to collaborate effectively with cross-functional teams.
- Ability to work independently
- Can-do attitude