DevJobs

Team Leader – AI Datacenter Orchestration Tools

Overview
Skills
  • Python Python
  • Angular Angular
  • Node.js Node.js
  • PyTorch PyTorch
  • React React
  • TensorFlow TensorFlow
  • RESTful API RESTful API
  • Docker Docker
  • Kubernetes Kubernetes
  • Pytest
  • gRPC
  • RCCL
  • Robot Framework
  • ROCm
Description

Location: Tel Aviv

#Hybrid

DriveNets is a leader in high-scale disaggregated networking solutions. Founded in 2015, DriveNets modernizes the way service providers, cloud providers and hyperscalers build networks. Supporting the largest network in the world, more than half of AT&T’s backbone traffic is running on DriveNets’ Network Cloud open disaggregated architecture. Raising $587 million in three funding rounds, DriveNets is disrupting the networking market from high-scale architecture to AI platforms, and is bringing onboard the most talented people. We are seeking people that want to make an impact on the world’s leading communication networks and are experienced in networking architecture or AI infrastructure solutions.

Job Summary

We are seeking a highly skilled and motivated Team Leader to build and lead a new team dedicated to developing orchestration tools and software solutions for AI datacenters.

The main goal of this team is to design and deliver customer-focused orchestration platforms that simplify the deployment, management, and monitoring of large-scale AI workloads.

This role combines technical leadership with hands-on development, covering the entire AI datacenter ecosystem — including switches, hosts, smart NICs, GPUs, ROCm, and RCCL. The team will primarily develop in Python, complemented by modern full-stack technologies for user interfaces and control systems.

Key Responsibilities

  • Lead and mentor a team of engineers building orchestration tools that manage complex AI datacenter infrastructures.
  • Define the team’s vision, roadmap, and architecture for orchestration solutions that enhance customer experience and operational efficiency.
  • Design and implement distributed control and orchestration systems using Python and full-stack frameworks.
  • Collaborate with networking, compute, and AI acceleration teams to integrate orchestration capabilities across all datacenter components (switches, NICs, GPUs, and software stacks).
  • Work closely with product, QA, and DevOps teams to identify customer requirements and translate them into scalable, production-grade orchestration platforms.
  • Ensure software reliability, scalability, and maintainability through strong design principles, testing, and CI/CD practices.
  • Foster a culture of innovation, technical excellence, and cross-functional collaboration.

Requirements

  • 5+ years of software development experience, including 2+ years in a team leadership or technical lead role.
  • Strong proficiency in Python for backend, orchestration, and systems integration.
  • Proven experience in designing and implementing orchestration or control-plane systems for datacenter or cloud environments.
  • Deep understanding of datacenter infrastructure — networking, compute, storage, or GPU acceleration.
  • Hands-on experience with containers, orchestration frameworks, and CI/CD pipelines (Kubernetes, Docker, etc.).
  • Excellent problem-solving, leadership, and communication skills.

Preferred Qualifications

  • Experience with AI workloads and GPU software stacks (ROCm, RCCL, PyTorch, TensorFlow).
  • Familiarity with control-plane architectures, distributed systems, or cluster management frameworks.
  • Background in telemetry, resource scheduling, or performance optimization for large-scale systems.
  • Knowledge of microservices, REST/gRPC APIs, and cloud-native architectures.
  • Practical experience with full-stack development (React, Angular, Node.js, or similar).
  • Experience with testing frameworks (pytest, Robot Framework, etc.).
DriveNets