If you are looking to join a high paced team, ready to tackle every challenge with an "Everything is possible" mindset - this is for you!
We're looking for a Platform Engineer that will tackle our hardest architecture, data and AI/ML challenges. As a core member of the R&D, you will play a pivotal role in building our infrastructure for scalability, reliability, and efficiency, ensuring the seamless deployment and monitoring of our entire production stack - from compute to Kubernetes clusters and AI models.
Key Responsibilities:
- Infrastructure Management: Manage our cloud environment to ensure scalability, security, and performance.
- AI/ML Lifecycle: Lead the establishment and refinement of MLOps practices to streamline the deployment, monitoring, and management of LLMs and other AI models in production.
- Observability Stack: Implement and manage monitoring and observability solutions to maintain system health and allow for fast debugging of production incidents.
- Data Pipelines and Analytics: Build and maintain data pipelines and analytics databases as part of the architecture to support scale and ease of use.
Requirements:
- 5+ years of experience working with AWS & Azure using IaC tools.
- Deep understanding of Kubernetes and containerized environments.
- Hands-on experience with data pipeline and analytics technologies like Kafka, Airflow, Snowflake.
- Hands-on experience with monitoring & observability tools like Grafana, Prometheus and OpenTelemetry.
- Hands-on experience with PostgreSQL.
- Familiarity with AI/ML (including LLMs) model deployment challenges and solutions - Advantage.