Responsibilities Include:
- Build and Maintain Production Systems: Construct and sustain highly available and scalable production systems, ensuring they meet our rigorous standards for reliability and performance.
- Manage Core Production Systems: Oversee core production systems, adapting to frequent changes and updates efficiently.
- Rapid Problem Resolution: Swiftly identify and rectify issues in production systems, with a particular focus on navigating and resolving challenges in complex environments.
- Develop and Optimize Tools: Create and enhance tools to improve our development and operational pipelines, thereby boosting both the efficiency of operations and the effectiveness of software deployments.
- Innovate and Improve Processes: Continuously seek opportunities to advance our technologies and methodologies, ensuring a swift and consistent delivery cycle.
- Large-Scale Observability Platform Management: Take charge of our extensive observability platform, which handles billions of signals daily, by ensuring its robust performance, scalability, and effectiveness in providing actionable insights.
- Stay Ahead in Technology: Actively stay abreast of emerging technologies and market trends. Evaluate and adopt new technologies that can enhance our operational capabilities and maintain our competitive edge.
Requirements:
- Microservices Architecture: At least 5 years of experience in software development and Infrastructure management in a dynamic, global, multi-server environment, with a strong grasp of microservices architecture.
- Python Development and Scripting: Expertise in Python for software development and scripting, essential for automating processes and workflows.
- Infrastructure as Code: Proficiency in infrastructure as code systems, like Terraform, familiarity with advanced GitOps based IaaS deployment pipelines (Bonus).
- AWS Cloud Management: Extensive experience in managing Amazon Web Services (AWS) cloud environments, demonstrating expertise in AWS services and practices.
- Kubernetes Orchestration: Experience in managing complex large scale kubernetes deployments using managed services such as AWS EKS.
- Code Management and Testing: Solid background in code versioning tools, code review processes, and automated testing frameworks.
- Team Player and Communication Skills: Excellent interpersonal and communication abilities, capable of effective teamwork and collaborative problem-solving.