Senior Software Engineer / Architect – AI Inference Storage
Position Overview
We are looking for an experienced senior developer to design and build high-performance storage & networking systems optimized for AI inference workloads, particularly large language models (LLMs). This role involves developing scalable, GPU-accelerated solutions integrated with storage and network infrastructure that integrates tightly with modern AI inference frameworks.
Key Responsibilities
- Build RDMA data paths (RoCE/IB/iWARP) and integrate with the RDMA software stack.
- Implement GPU direct storage pipelines (NVIDIA GPUDirect, NIXL, future GPU-access technologies).
- Design and operate core components of a stateful distributed system (consensus, recovery, failover).
- Write Rust for high-performance components and Python for automation / AI integration.
- Advanced method in Linux system programming (async I/O, memory management, kernel interfaces).
- Prototype → production: own the path from design to real deployments.
- Mentor engineers and set strong coding and design standards.
Required Qualifications
- Deep experience with RDMA protocols and software stack internals.
- Proven work with GPU direct storage / GPUDirect / NIXL or similar direct GPU I/O.
- Strong Rust and Python development skills.
- Track record in distributed systems (stateful, fault tolerant, horizontally scalable).
- Advanced Linux systems programming knowledge.
Way to stand out
- Kubernetes for deploying and scaling stateful storage.
- Linux kernel programming (drivers, block layer, networking stack).
- Familiarity with AI frameworks (PyTorch, vLLM, TensorRT, Triton).
- Performance profiling, benchmarking at scale.
- Contributions to open-source projects.