AI-OPS Engineer – Infrastructure (Azure Focus)
Department: IT Infrastructure – Tools & Collaboration
We are opening a new position in our Infrastructure organization and establishing a dedicated AI-OPS function.
This is a foundational role with real ownership, hands-on responsibility, and close collaboration with development and AI teams.
As an AI-OPS Engineer, you will be responsible for the day-to-day operation, stability, security, and scalability of AI infrastructure, across cloud and platform layers, with a strong focus on Azure-based environments.
This role sits at the intersection of Cloud Infrastructure, DevOps, Security, and AI Platforms.
Key Responsibilities
- Operate and maintain AI infrastructure environments (primarily Azure), ensuring availability, performance, and scalability
- Work closely with developers and AI teams to resolve infrastructure-related issues: permissions, security, networking, deployments, and access
- Support deployment, operation, and troubleshooting of AI models and services
- Design and operate Azure-based AI infrastructure, including end-to-end Azure Functions and APIs for AI models (Azure AI Factory)
- Manage Kubernetes / AKS and Docker environments supporting AI workloads
- Implement and maintain Infrastructure as Code using Terraform
- Monitor systems, investigate incidents, and perform root-cause analysis
- Handle cloud security configurations, access control, and compliance requirements
- Participate in defining infrastructure methodologies and best practices for AI workloads
- Take part in infrastructure consolidation and centralization initiatives
Required Experience
- 4–8 years of experience in Cloud Infrastructure Operations, with a strong background in Azure - Must
- Hands-on experience with Linux administration - Must
- Proven experience with Terraform and Infrastructure as Code - Must
- Experience with Kubernetes / AKS and Docker - Must
- Solid understanding of cloud networking and cloud security - Must
- Experience working closely with development teams and debugging cloud issues related to development and deployment - Must
- Scripting experience with Python and/or Bash - Must
- Strong troubleshooting and problem-solving skills
- Certifications in Azure or Kubernetes
- AI is a relatively new domain in the organization — hands-on experience is a plus, but strong infrastructure background with AI exposure or familiarity is acceptable