DevJobs

Senior Software Engineer / Architect – AI Inference Storage

Overview
Skills
  • Python Python
  • Rust Rust
  • PyTorch PyTorch
  • Kubernetes Kubernetes
  • IB
  • iWARP
  • RoCE
  • open-source contributions
  • performance profiling
  • networking stack
  • Linux kernel programming
  • TensorRT
  • drivers
  • Triton
  • block layer
  • benchmarking
  • vLLM
  • RDMA
  • memory management
  • O
  • NIXL
  • Linux system programming
  • kernel interfaces
  • GPUDirect
  • GPU direct storage
  • distributed systems
  • async I
  • AI frameworks

Senior Software Engineer / Architect – AI Inference Storage

Position Overview

We are looking for an experienced senior developer to design and build high-performance storage & networking systems optimized for AI inference workloads, particularly large language models (LLMs). This role involves developing scalable, GPU-accelerated solutions integrated with storage and network infrastructure that integrates tightly with modern AI inference frameworks.

Key Responsibilities

  • Build RDMA data paths (RoCE/IB/iWARP) and integrate with the RDMA software stack.
  • Implement GPU direct storage pipelines (NVIDIA GPUDirect, NIXL, future GPU-access technologies).
  • Design and operate core components of a stateful distributed system (consensus, recovery, failover).
  • Write Rust for high-performance components and Python for automation / AI integration.
  • Advanced method in Linux system programming (async I/O, memory management, kernel interfaces).
  • Prototype → production: own the path from design to real deployments.
  • Mentor engineers and set strong coding and design standards.


Required Qualifications

  • Deep experience with RDMA protocols and software stack internals.
  • Proven work with GPU direct storage / GPUDirect / NIXL or similar direct GPU I/O.
  • Strong Rust and Python development skills.
  • Track record in distributed systems (stateful, fault tolerant, horizontally scalable).
  • Advanced Linux systems programming knowledge.


Way to stand out

  • Kubernetes for deploying and scaling stateful storage.
  • Linux kernel programming (drivers, block layer, networking stack).
  • Familiarity with AI frameworks (PyTorch, vLLM, TensorRT, Triton).
  • Performance profiling, benchmarking at scale.
  • Contributions to open-source projects.
LightBits Labs