חדש באתר! העלו קורות חיים אנונימיים לאתר ואפשרו למעסיקים לפנות אליכם!
Our Company is where we transform vision into reality. It's where ideas become technologies, and cutting-edge technologies become solutions for animal care and management.
We support farmers by providing real-time actionable information to help them manage their herds. It provides pet owners with smart devices and data that give them a better understanding of their pets’ activity and health needs, enriching relationships. It helps conservationists safeguard natural environments and wildlife.
Leveraging decades of Technological Research & Development experience across many markets, technologies and species, along with development environments and Quality Assurance procedures, we're always inventing new ways to look after the health and well-being of animals. Our decades of experience keep us ahead of the curve by leveraging advanced Technological Solutions from enhancing the precious bond between people and their pets, to advancing animal healthcare and wildlife preservation.
We are looking for an exceptional Senior Site Reliability Engineer (SRE) to help establish and lead the technical practices of SRE within our CloudOps team. This is a hands-on role for an experienced professional who can implement SRE principles, build frameworks and tools to ensure system reliability, and mentor others in adopting these practices.
If you are passionate about operational excellence, love solving complex technical challenges, and thrive in highly collaborative environments, this is the role for you.
What You’ll Do:
Define and Build the SRE Function
Help to define and implement the SRE principles and practices.Partner with development and DevOps teams to create Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs) for critical services.Advocate for and implement system architectures that prioritize reliability, scalability, and fault tolerance.Develop Automation and Resilience
Build automation tools to reduce toil, streamline operations, and improve reliability using Infrastructure as Code (IaC) tools like Terraform and CrossPlane.Implement self-healing systems, automate incident detection and response, and integrate chaos engineering practices to test system resilience.Drive Observability and Monitoring Excellence
Incident Response and Problem Solving
Contribute to Continuous Improvement
Requirements:
Technical Expert
Advanced skills in automation tools like Terraform and proficiency in scripting or programming languages (e.g., Python, Go, Bash).Problem Solver and Collaborator
Preferred
Familiarity with chaos engineering tools like Gremlin or LitmusChaos.