NVIDIA is seeking a highly skilled Senior Performance Engineer to join our Performance and R&D organizations. In this role, you will help build and evolve systems that support performance analysis, telemetry, and optimization for large-scale GPU- and CPU-based clusters used in AI and high-performance computing environments. You will work closely with hardware, networking, firmware, and software teams to collect, analyze, and interpret performance data from live systems. This is a fast-paced R&D environment where system behavior and requirements evolve rapidly, requiring adaptable engineering solutions and strong analytical thinking.
What You’ll Be Doing
- Profile, benchmark, and analyze AI and HPC workloads on GPU and CPU clusters
- Explore performance characteristics of high-performance networking and collective communications (e.g., NCCL, RDMA, MPI, RoCE)
- Identify performance bottlenecks across networking, compute, memory, and system architecture
- Develop and enhance performance analysis, benchmarking, and diagnostic tools
- Define performance test plans and establish expectations for new technologies and platforms
- Collaborate across hardware, firmware, networking, systems, and software teams to provide actionable performance insights
- Support telemetry collection and data refinement efforts to enable accurate performance analysis
- Maintain high standards for data quality, reproducibility, and traceability of performance results
What We Need To See
- B.Sc. or M.Sc. in Computer Science, Computer Engineering, Software Engineering, or equivalent experience
- 5+ years of experience in performance analysis, systems engineering, or HPC/AI infrastructure
- Demonstrated expertise in performance analysis skills and methodologies
- Hands-on experience with high-performance networking (RDMA, MPI, NCCL, congestion control)
- Strong understanding of system performance metrics (latency, throughput, resource utilization)
- Exposure to hardware, firmware, or embedded telemetry environments
- Strong analytical, problem-solving, and communication skills
- Ability to work effectively in cross-functional, fast-paced R&D teams
Ways To Stand Out From The Crowd
- Knowledge of CUDA, NCCL internals, and congestion control algorithms
- Deep system-level understanding of CPU architectures, GPUs, HCAs, memory, and PCIe
- Experience with NVIDIA GPUs, CUDA, and deep learning frameworks such as PyTorch or TensorFlow
- Experience with cloud platforms
- Proficiency in Python; experience with Bash and C/C++ is a plus as well as a strong experience working in Linux environments
, , JR2014966