DevJobs

DevOps Engineer

Overview
Skills
  • DevOps DevOps ꞏ 5y
  • CI/CD CI/CD
  • AWS AWS
  • Azure Azure
  • GCP GCP
  • Kubernetes Kubernetes
  • Administering public clouds
  • Building CI/CD pipelines for applications and micro-services
  • GitOps
  • IAC
  • Linux systems internals and networking
  • Building and maintaining On-Premise servers
  • Deep learning frameworks
  • Deep learning training pipelines
  • Managing and optimizing company-wide network infrastructure
  • Working with small edge devices such as Nvidia Jetson devices
Deci.AI is on a mission to empower AI developers by providing them with robust tools to create innovative AI-based solutions. Our goal is to guarantee that these models not only excel in production but also unlock their full potential.

Join us in shaping the future of AI, where innovation meets empowerment at every level.

What You’ll Do

You have the opportunity to join a small but crucial team that enables Deci AI's growth while maintaining quality. As a member of this team, you fully own our production training, optimization, and inference infrastructure, which is growing rapidly. Your work will help support our cutting-edge AI technology, and you will collaborate with our research and engineering teams on the latest hardware and software frameworks. This position requires an ownership mindset and a hands-on, can-do approach, and an eagerness to learn while working across multiple projects and technologies.

Requirements:

  • 5+ years of experience as a DevOps Engineer or equivalent experience
  • A strong background in both traditional DevOps release process and modern infrastructure, with a thorough understanding of industry best practices like CI/CD, IAC, and GitOps
  • Experience with building CI/CD pipelines for applications and micro-services
  • Experience with K8S
  • Hands-on experience administering public clouds (AWS, GCP, or Azure)
  • In-depth understanding of Linux systems internals and networking

Preferred qualifications:

  • Experience with deep learning frameworks and deep learning training pipelines
  • Experience building and maintaining On-Premise servers
  • Proficiency in managing and optimizing company-wide network infrastructure, ensuring seamless connectivity and efficiency
  • B.Sc. in Computer Science or a similar technical field
  • Experience working with small edge devices such as Nvidia Jetson devices

Responsibilities:

  • Develop, deploy, maintain, scale, and monitor our hybrid (cloud and on-premise) Deep Learning training research and development environment.
  • Manage our multi-cloud Cloud-native production deployments (SaaS) running on Kubernetes clusters.
  • Build critical production components from scratch.
  • Evaluate new cloud-native technologies and vendor products to continuously improve our SaaS offerings.
  • Work closely with developers and researchers across the company to design and deliver new features, applications, and services. Develop tools and processes to enable them to automate everything.
  • Design and manage different monitoring and observability tools for troubleshooting and resolving production issues.
  • Solve problems in mission-critical services by creating solutions to prevent problem recurrence and automating remediation procedures.
  • Provide operational support for day-to-day activities involving deployments of services, configurations of service interaction, etc.
Deci AI