DevJobs

Senior Developer, AI Inference Storage Systems

Overview
Skills
  • Python Python
  • Rust Rust
  • CUDA
  • NCCL
  • NIXL
  • RoCM
  • ai-dynamo
  • InfiniBand
  • LMCache
  • lmcache.ai
  • MPI
  • NVLink
  • RoCE
  • TensorRT-LLM
  • vLLM

Senior Developer, AI Inference Storage Systems


Position Overview

We are looking for an experienced senior developer to design and build high-performance storage & networking systems optimized for AI inference workloads, particularly large language models (LLMs). This role involves developing scalable, GPU-accelerated solutions integrated with storage infrastructure that integrates tightly with modern AI inference frameworks and distributed architectures.


Key Responsibilities

  • Design and implement scalable storage solutions tailored for AI/ML inference pipelines.
  • Optimize data pipelines, caching, and I/O patterns to maximize GPU utilization and minimize inference latency.
  • Research and prototype innovative storage–compute co-design approaches for transformer-based models.
  • Stay current with advancements in distributed storage, high-performance networking, and AI inference technologies.
  • Contribute to open-source AI infrastructure projects where applicable.
  • Experience with high scale distributed systems and ML systems.


Required Qualifications

  • Expert-level Python and proficient Rust programming skills.
  • Strong knowledge of distributed storage architectures, object storage, and high-performance filesystems.
  • Hands-on experience with GPU acceleration technologies (CUDA, NCCL, NIXL, RoCM) and GPU memory management.
  • Familiarity with AI/ML frameworks and transformer model architectures.
  • Excellent problem-solving, debugging, and performance optimization skills.
  • Self-motivated, able to work independently in fast-paced, innovative environments.


Way to stand out

  • Experience with high-performance networking protocols (InfiniBand, RoCE).
  • Knowledge of HPC technologies (MPI, NVLink).
  • Experience with LLM frameworks like vLLM / TensorRT-LLM / ai-dynamo .
  • Experience with LMCache lmcache.ai

LightBits Labs