DevJobs

Cloud Infrastructure & AI Engineer

Overview
Skills
  • Python Python
  • PowerShell PowerShell
  • JavaScript JavaScript
  • C++ C++
  • Node.js Node.js
  • Elasticsearch Elasticsearch
  • Azure DevOps Azure DevOps ꞏ 3y
  • CI/CD CI/CD
  • AWS AWS ꞏ 5y
  • Docker Docker
  • Terraform Terraform
  • Networking Networking
  • Grafana Grafana
  • YAML ꞏ 3y
  • OpenAI
  • Anthropic
  • AWS Config
  • RAG
  • Azure OpenAI
  • Prompt engineering
  • CIS benchmarks
  • OpenTelemetry
  • OpenSearch
  • Inspector
  • CloudFormation
  • EKS
  • MCP
  • Loki
  • LLM
  • Lambda
  • KMS
  • IAM
We’ve come a long way since we first opened our doors, but our mission has always stayed the same: to provide world class solutions for the travel industry. Travel Booster is a constantly improving ERP solution that frees users from the complexities of creation so they can do more, faster. We are looking for a hands‑on engineer to own our cloud infrastructure and CI/CD pipelines on AWS with Azure DevOps (Pipelines & Releases). You will take over existing infrastructure and pipelines written in PowerShell and Python, stabilize & improve them, and lead AI/LLM enablement across Dev, QA, and Support—including building internal agents, integrating with LLMs (via MCP and other frameworks), and driving AI adoption in day‑to‑day workflows. You’ll partner closely with R&D to streamline builds, deployments, environment management, and observability, while also pushing upgrades and optimizations in AWS services.

What will your job look like?

Key Responsibilities

CI/CD & Cloud Infrastructure

  • Own, maintain, and enhance Azure DevOps Pipelines & Releases (multi‑stage YAML/classic) for builds, deployments, and scheduled tasks.
  • Refactor existing PowerShell, javascript and Python pipeline scripts for reliability, security, and performance.
  • Implement pipeline templates, gates, approvals, artifact versioning, and automated rollbacks.
  • Introduce pipeline observability (dashboards, alerts, metrics) and enforce branching, tagging, and release standards.
  • Operate and improve AWS environments (Prod/Stage/Dev/QA), including networking, compute, storage, databases, security, and cost management.
  • Implement Infrastructure as Code (IaC) (Terraform or AWS CloudFormation); manage parameter stores/secrets.
  • Automate environment lifecycle: provisioning, configuration, blue/green & canary releases, and disaster recovery.
  • Drive updates and improvements across services (e.g., EKS/EC2/Lambda, RDS/DynamoDB, ElastiCache, S3, CloudFront/ALB/ELB, Route 53, CloudWatch, Systems Manager, IAM, KMS).
  • Standardize and automate Dev/QA/SIT/UAT environments; ensure parity with Production where relevant.
  • Build ephemeral preview environments per PR; manage test data seeding/anonymization.
  • Collaborate with QA to integrate test orchestration (smoke/regression) into pipelines with gates and quality metrics.
  • Implement monitoring/alerting (CloudWatch, Prometheus/Grafana), logging (CloudWatch Logs/OpenSearch), and distributed tracing (X-Ray/OpenTelemetry).
  • Harden IAM (least privilege, scoped roles), secrets management, key rotation, and compliance checks.
  • Track and optimize AWS spend (rightsizing, autoscaling, lifecycle policies).

AI/LLM Enablement

  • Build and deploy AI-powered features using LLMs (OpenAI, Azure OpenAI, Hugging Face).
  • Implement MCP-based integrations for context-aware workflows.
  • Design and build internal AI agents (for Dev/QA/Support) to assist with triage, runbooks, test data generation, log analysis, and developer productivity.
  • Integrate LLMs via secure patterns (e.g., Model Context Protocol (MCP), retrieval‑augmented generation, prompt engineering, guardrails, and observability).
  • Create tooling that plugs AI into pipelines/workflows (PR summaries, release notes, changelog generation, incident post‑mortems).
  • Establish data governance for AI: access control, PII handling, prompt/data sanitization, and auditability.
  • Write production-grade code for AI workflows, including prompt engineering, function calling, and agent orchestration.
  • Integrate AI services into ERP, web apps, and backend systems.

Requirements:

All you need is...

Required Skills & Qualifications

  • 5+ years working with AWS in production (multi‑account, multi‑env).
  • 3+ years with Azure DevOps (Pipelines/Releases), including YAML pipelines.
  • Strong javascript, PowerShell and Python for automation and pipeline tasks.
  • Hands‑on experience with IaC (Terraform or CloudFormation).
  • Solid understanding of networking, security (IAM/KMS), containers (Docker; EKS preferred) or serverless (Lambda).
  • Experience integrating LLMs/AI into engineering workflows (agents, prompt engineering, RAG, MCP or similar).
  • Proven track record of CI/CD best practices, release orchestration, artifact/version management, and rollback strategies.
  • Experience with OpenAI/Azure OpenAI, Anthropic, or other LLM providers; familiarity with MCP.
  • Knowledge of Node.js build/deploy pipelines (to align with parts of our stack).
  • Exposure to C++ build systems (beneficial for our business logic components).
  • Observability stack: OpenTelemetry, Grafana, Loki/ELK/OpenSearch.
  • Security/compliance tooling (e.g., CIS benchmarks, AWS Config, Inspector).
  • Experience with fine-tuning and deploying models in production.
  • Excellent interpersonal skills, Proactive, Fast learner and team player
  • Knowledge in Agile and Scrum - Advantage
  • Excellent English writing and verbal skills – Mandatory

Travel Booster is an equal opportunity employer. We welcome applicants from all backgrounds and are committed to fostering a diverse and inclusive workforce
Travel Booster