Senior SW Engineer – AI Infrastructure & Optimization
We are looking for a Senior Software Engineer to help build and optimize large-scale, high-performance GenAI infrastructure and inference systems on Kubernetes.
As AI workloads increasingly move toward Kubernetes-native infrastructure, we are building systems that support distributed inference, performance optimization, reliability, observability, and production-grade deployment at scale.
This role is ideal for an engineer who can reason deeply about systems, performance, tradeoffs, and reliability, and who is comfortable owning difficult technical decisions end-to-end.
You will work across inference serving, distributed systems, optimization, and Kubernetes-native AI infrastructure.
What You’ll Do
- Build and optimize high-performance Kubernetes-native GenAI inference systems
- Work with modern inference stacks such as vLLM, SGLang, TensorRT-LLM, and related tooling
- Work with Kubernetes-native distributed LLM inference frameworks such as llm-d and NVIDIA Dynamo
- Design and implement optimization algorithms and performance improvements
- Improve reliability, observability, deployment, and operational maturity of AI systems
- Make architectural decisions and take ownership of technical outcomes
- Collaborate with a small, senior engineering team focused on performance and production quality
Requirements
- :
Minimum 5 years of experience as a Software Engineer, with strong software engineering and system design skill - s.Programming experience in Go and Pyth
- onHands-on experience with the Kubernetes ecosystem, including Operators, service meshes, GitOps, Gateway API, and OpenTelemet
- ryExperience with cloud platfor
- msStrong understanding of optimization algorithms and performance engineeri
- ngAbility to independently drive technical initiatives from concept to producti
- onStrong systems thinking and debugging skil
- lsComfort operating in environments with high autonomy and responsibili
tyNice to Ha
- veExperience with modern LLM inference frameworks such as vLLM, SGLang, or TensorRT-L
- LMExperience with distributed LLM inference frameworks such as llm-d or NVIDIA Dyna
- moContributions to open-source Kubernetes or ML infrastructure projec
- tsGPU performance optimization and profiling experien
- ceFamiliarity with CUDA, NCCL, or Triton kerne
- lsExperience running GenAI systems at scale in producti
on