DevJobs

DevOps Team Lead - Applied AI Engineering Group

Overview
Skills
  • Bash Bash
  • Python Python
  • Go Go
  • Elasticsearch Elasticsearch
  • Jenkins Jenkins
  • GitHub Actions GitHub Actions
  • AWS AWS
  • Kubernetes Kubernetes
  • Helm
  • Linkerd
  • Istio
  • Terraform Terraform
  • Ansible Ansible
  • Grafana Grafana
  • EKS
  • Prometheus Prometheus
  • Pulumi
  • OpenSearch
  • S3
  • Vault
  • VPC
  • Lambda
  • IAM
  • GitLab CI
  • Flux
  • EC2
  • Datadog
  • CloudFormation
  • AWS Secrets Manager
  • ArgoCD
At Dream, we redefine cyber defense vision by combining AI and human expertise to create products that protect nations and critical infrastructure. This is more than a job; it’s a Dream job. Dream is where we tackle real-world challenges, redefine AI and security, and make the digital world safer. Let’s build something extraordinary together.

Dream's AI cybersecurity platform applies a new, out-of-the-ordinary, multi-layered approach, covering endless and evolving security challenges across the entire infrastructure of the most critical and sensitive networks. Central to our Dream's proprietary Cyber Language Models are innovative technologies that provide contextual intelligence for the future of cybersecurity.

At Dream, our talented team, driven by passion, expertise, and innovative minds, inspires us daily. We are not just dreamers, we are dream-makers.

The Dream Job:

It starts with you - a technical leader who’s passionate about building resilient, automated infrastructure and growing high-performing teams. You care about operational excellence, developer experience, and enabling AI-driven teams to move fast with confidence. You’ll lead the DevOps team in architecting and operating the compute and networking infrastructure that powers our AI platform - from CI/CD pipelines to Kubernetes clusters to observability systems.

If you want to lead a team that builds the infrastructure foundation for mission-critical AI systems, join Dream’s mission - this role is for you.

The Dream-Maker Responsibilities:

  • Lead and grow the DevOps team - hiring, mentoring, and developing engineers while fostering a culture of ownership and continuous improvement.
  • Define compute and networking infrastructure strategy across cloud and on-prem environments; drive architectural decisions that balance reliability, security, cost, and velocity.
  • Own the platform’s deployment, scaling, and operational posture - ensuring systems meet demanding SLAs for government and national-scale customers.
  • Build and evolve CI/CD pipelines for application and service deployments with automated testing, security scanning, and rollback capabilities.
  • Drive infrastructure-as-code practices for compute, networking, and orchestration - ensuring reproducible, auditable, and version-controlled infrastructure across all environments.
  • Enable AI-native operations - support agentic deployment pipelines, self-healing infrastructure, and secure sandboxing for model experimentation.
  • Establish observability, alerting, and incident response practices that provide visibility into system health and enable fast recovery.
  • Partner with Engineering, Data Platform, Data Engineering, and Security teams to align infrastructure capabilities with platform needs.
  • Establish infrastructure characteristics (availability, latency, throughput) that enable data freshness, correctness, and low-latency pathways for AI training/inference, retrieval, and agentic workflows.
  • Ship paved-road developer tooling - shared templates, CI/CD workflows for services, IaC modules for compute and networking, and runbooks - to standardize best practices across engineering teams.
  • Collaborate with Engineering, Data Platform, Data Engineering, Security, Product, AI/ML, Data Science, and Analytics to align infrastructure capabilities with evolving platform and data product needs.

The Dream Skill Set:

  • 8+ years in DevOps, SRE, or infrastructure engineering, with 2+ years leading teams or technical functions. Hands-on experience building and operating infrastructure at scale.
  • Container orchestration - Kubernetes (EKS, on-prem), Helm, service mesh technologies like Istio or Linkerd
  • Cloud & infrastructure - AWS services (EC2, EKS, S3, IAM, VPC, Lambda), hybrid cloud architectures, on-prem infrastructure
  • Infrastructure-as-Code - Terraform, Pulumi, or CloudFormation; GitOps practices with ArgoCD or Flux
  • CI/CD - GitHub Actions, GitLab CI, Jenkins, or similar; artifact management, deployment strategies (blue-green, canary)
  • Observability - Prometheus, Grafana, ELK/OpenSearch, Datadog, or similar; distributed tracing, log aggregation, alerting
  • Security & compliance - Secrets management (Vault, AWS Secrets Manager), network security, compliance automation, air-gapped environments
  • Scripting & automation - Python, Bash, Go; configuration management with Ansible or similar

Never Stop Dreaming...:

If you think this role doesn't fully match your skills but are eager to grow and break glass ceilings, we’d love to hear from you!
Dream Security