MLOps / DevOps Engineer – Agentic AI Model Development
Location
Hybrid (3 days office / 2 days home)
Reports to: Head of DevOps (Works closely with Data Science, Engineering & AI teams)
About Us
We are building next-generation agentic health AI models — tools that empower clinicians, researchers, and patients by enabling smarter, safer, and more personalised healthcare. You will be part of our DevOps team, collaborating closely with Data Science, Engineering, and AI teams to turn innovative model ideas into robust, production-grade systems.
Role Overview
As an MLOps / DevOps Engineer, you will design, build, and maintain the full lifecycle of machine-learning model development, deployment, and operations. You will bring proven, hands-on MLOps experience with the Databricks platform, combined with strong DevOps skills in automation, infrastructure as code, and CI/CD. This role sits within the DevOps team, working cross-functionally to enable scalable, secure, and efficient AI model delivery as part of our agentic AI platform.
Key Responsibilities
- Architect and build end-to-end MLOps pipelines: data ingestion, feature engineering, model training, validation, deployment, monitoring, and retraining.
- Lead Databricks platform operations: workspace setup, cluster management, job orchestration, Delta Lake, and MLflow integration.
- Partner with Data Science and Engineering teams to operationalize model prototypes into production, ensuring scalability, reproducibility, and maintainability.
- Define and implement Infrastructure as Code (IaC) using Terraform, managing cloud resources (AWS) for MLOps infrastructure.
- Design and manage CI/CD pipelines for model code, data pipelines, and infrastructure changes — integrating automated testing, version control, code review, and deployment.
- Build and maintain monitoring, logging, and alerting systems for deployed ML models (e.g., data drift, model drift, performance degradation).
- Ensure all deployments adhere to security, access control, cost optimization, and reliability standards defined by the DevOps team.
- Evaluate and implement emerging tools, frameworks, and practices in the MLOps/DevOps ecosystem.
- Document architecture, pipelines, and operational workflows; train data science and engineering teams in MLOps best practices.
Required Qualifications & Experience
- 5+ years of experience in MLOps and DevOps.
- MLOps experience is a must, including supporting the full model development lifecycle — from experimentation to production, monitoring, and retraining.
- Proven, hands-on experience with Databricks (workspaces, clusters, notebooks, jobs, Delta Lake, MLflow).
- Strong understanding of machine learning operations and lifecycle automation.
- Strong DevOps foundation: Terraform (IaC), CI/CD, Git workflows, automated testing, deployment pipelines, monitoring, and alerting systems.
- Proficiency with AWS Cloud services for compute, storage, and networking.
- Solid programming and scripting skills (Python, Bash, PowerShell).
- Familiarity with data engineering concepts and tools (ETL/ELT, orchestration frameworks, or workflow automation).
- Excellent collaboration and communication skills to work with cross-functional teams and align model delivery with operational standards.
- A strong problem-solving mindset, focus on scalability and automation, and a continuous improvement approach.
Preferred / Advantageous
- Experience with containerization and orchestration (Docker, Kubernetes).
- Knowledge of model governance, compliance, and monitoring frameworks.
- Experience with AI and agentic system development.
- Experience in healthcare, life sciences, or health-tech domains.
Why Join Us
- Be part of a mission to build next-generation agentic health AI models.
- Collaborate with world-class Data Science, Engineering, and AI teams.
- Work with leading technologies: Databricks, AWS, Terraform, CI/CD, and MLOps frameworks.
- Drive architecture and process decisions that shape the future of our agentic AI platform.
- Enjoy a collaborative, learning-driven environment focused on innovation and impact.
What We Value
- Hands-on ownership and curiosity.
- Focus on reliability, reproducibility, and automation.
- Strong cross-team collaboration.
- Passion for AI, data, and building systems that matter.