DevJobs

Staff Backend Engineer – AI Algorithm Platform

Overview
Skills
  • Python Python
  • Ruby Ruby
  • Go Go
  • AWS AWS
  • Kubernetes Kubernetes
Cloudinary empowers companies to deliver exceptional digital experiences by managing the entire media lifecycle at scale. Within Cloudinary’s R&D, the Research Group leads the development of cutting-edge algorithms for media understanding, generation, and optimization. We are seeking an experienced Staff Backend Engineer to lead the engineering efforts behind our homegrown platform for serving and operating production-grade AI models and AI based algorithms. This is a mission-critical role for someone passionate about building highly-scalable, GPU-aware, cloud-native systems that act as the connective tissue between algorithm research and product innovation. You will play a pivotal part in re-designing and evolving the platform, while supporting both research and application teams across the organization, and contributing to MLOps initiatives.

Key Responsibilities

Platform Ownership

  • Own the architecture, stability, scalability, and performance of the system
  • Design and implement platform features that support both synchronous low-latency and asynchronous compute-heavy algorithm execution
  • Enhance GPU management, scheduling, and resource allocation for optimal performance and cost-efficiency
  • Ensure robust Kubernetes-based deployment and observability for a highly dynamic system



Cross-Team Collaboration

  • Act as the technical bridge between Research and Application teams by translating requirements into scalable system designs
  • Collaborate closely with algorithm developers to streamline model deployment processes
  • Partner with backend engineers (primarily working in Ruby and Go) to integrate the research group algorithms into Cloudinary services



Engineering Excellence

  • Advocate for high standards in code quality, observability, testing, and security
  • Guide engineering integration efforts when consuming the different platform APIs
  • Provide mentorship, support, and best practices to other engineers interacting with the platform
  • Take part in general R&D efforts, supporting a broader production environment



Platform Extension and MLOps

  • Contribute to the evolution of MMS to support a wider range of algorithmic workloads and model types
  • Help shape tooling and infrastructure for model versioning, rollout, monitoring, and testing
  • Collaborate with DevOps and Infrastructure teams to maintain operational excellence, system observability, and robust infrastructure support



Your Qualifications

  • 8+ years of experience in software engineering, with 3+ years working on infrastructure/platforms involving ML/AI, GPU, or data-heavy systems
  • Proficiency in Python and familiarity with backend languages such as Ruby and/or Go
  • Strong understanding of Kubernetes internals and experience running GPU workloads in production environments
  • In-depth knowledge of AWS services
  • Experience architecting systems that support both real-time and asynchronous processing pipelines
  • Familiarity with the ML lifecycle and MLOps practices, including CI/CD for models, monitoring, and rollback strategies



Bonus Qualifications

  • Experience working in research-driven environments or alongside data scientists, algorithm research team and ML engineers
  • Contributions to open-source projects related to model serving, Kubernetes operators, or ML platforms
  • Experience supporting systems with diverse user groups across engineering and research disciplines



Why Join Us?

  • Opportunity to build and scale a one-of-a-kind platform powering state-of-the-art media algorithms
  • Collaborate with world-class research, engineering, and product teams
  • Have a direct impact on product experiences used by millions of developers and end-users
  • Be part of a culture that values creativity, autonomy, and continuous improvement



Cloudinary