DevJobs

HPC Middleware Developer

Overview
Skills
  • C++ C++ ꞏ 5y
  • Python Python
  • Linux Linux ꞏ 3y
  • Ethernet
  • InfiniBand
  • NCCL
  • RDMA
  • Deep learning systems
  • GPU acceleration
  • MPI
  • NSX
We are now looking for a senior HPC software engineer. As a member of our the High Performance Computing Software development team, you will be responsible for designing and implementing new protocols and algorithms that enable the best performance possible on Nvidia networked supercomputers and datacenters. This role offers you an excellent opportunity to deliver production grade solutions, get hands on with ground-breaking technology, and work closely with technical leaders solving some of the biggest challenges in machine learning, cloud computing, and system co-design.

What You'll Be Doing

The team is responsible for developing high performance communication frameworks and applications running in production on the world’s largest supercomputers and datacenters. The work environment is dynamic and challenging; we are innovating and inventing software products at the forefront of technology in terms of performance, scalability, and features. Our team works closely with networking chip design teams in co-designing new hardware features and software APIs.

What We Need To See

  • 5 years’ experience of Programming in C/C++
  • 3 years’ experience in Linux environment and tools
  • Deep knowledge of Networking Protocols InfiniBand, Ethernet
  • Deep knowledge in computer architecture and operating systems
  • Experience in performance optimizations
  • MSc in computer science / software engineering (or equivalent experience).

Ways To Stand Out From The Crowd

  • You have positive attitude and work well with others.
  • PhD in CS/EE/Math/Physics
  • Knowledge in MPI and High-performance computing
  • Knowledge in RDMA technology
  • Open Source Software Contributor

With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most hard-working and talented people in the world working for us. If you're creative and passionate about developing cloud services we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits .

Applications for this job will be accepted at least until November 17, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

div>div>p>span>span>NVIDIA has been redefining computer graphics, PC gaming, and /span>span>accelerated/span>span> computing for more than 25 years. /span>span>It’s/span>span> a unique legacy of innovation /span>span>that’s/span>span> fueled by great technology—and amazing people. Today, /span>span>we’re/span>span> tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing /span>span>what’s/span>span> never been done before takes vision, innovation, and the world’s best talent. Being an NVIDIAN means being part of a diverse and encouraging setting that encourages everyone to perform at their peak. Come join the team and discover how you can develop a lasting influence on the world./span>/span>span> /span>/p>p>/p>/div>div>p>span>span>NVIDIA is in search of a Senior Software Architect- a creative, forward-thinking, and practical researcher to improve the framework for widespread LLM learning and prediction. As part of our dynamic E2E Architecture group, you will design and optimize systems driving generative AI workloads, working at the intersection of software and hardware on some of the most advanced GPU clusters worldwide./span> span>You will define how AI models are deployed and scaled in production using the NVIDIA Spectrum-X Networking Platform, influencing decisions from inter-node communication and /span>span>compute/span>span> scheduling to system-level optimization. This is an opportunity to collaborate with best-in-class engineers and researchers and shape the future of generative AI in real-world applications. Your work will make a lasting impact by enabling generative AI technologies to reach real-world applications and improve global computing capabilities./span>/span>span> /span>/p>/div>div>p>/p>/div>div>p>b>span>What /span>span>You’ll/span>span> Be Doing:/span>/b>span> /span>/p>/div>div>ul>li>p>span>span>Lead research and development of end-to-end networking solutions for distributed AI training and inference at scale, with a focus on job completion time, failure resiliency, telemetry, scheduling, and placement. /span>/span>/p>/li>li>p>span>span>Analyze current deployments, develop prototypes, and recommend architectural improvements. /span>/span>/p>/li>li>p>span>span>Stay abreast of the latest research; become the team’s authority in emerging networking techniques and technologies. /span>/span>/p>/li>li>p>span>span>Design, simulate, and validate new systems using novel, scalable network simulator NSX. /span>/span>/p>/li>li>p>span>span>Develop and test prototypes on large-scale GPU clusters (e.g., Israel-1). /span>/span>/p>/li>li>p>span>span>Collaborate across hardware, firmware, and software teams to translate ideas into real networking product features. /span>/span>/p>/li>li>p>span>span>Publish patents and present research at leading conferences. /span>/span>/p>/li>/ul>/div>div>p>/p>p>b>span>What We Need to See:/span>/b>span> /span>/p>/div>div>ul>li>p>span>span>M.Sc. or PhD (preferred) in Computer Science, Electrical/Computer Engineering, or related field—or B.Sc. with research experience and publications. /span>/span>/p>/li>li>p>span>span>5+ years of relevant experience./span>/span>/p>/li>li>p>span>span>Deep expertise in networking and communication internals (NCCL, RDMA, congestion control, routing). /span>/span>/p>/li>li>p>span>span>Strong software engineering skills in C++ and/or Python. /span>/span>/p>/li>li>p>span>span>Excellent system-level design and problem-solving abilities. /span>/span>/p>/li>li>p>span>span>Outstanding communication and collaboration skills across technical domains. /span>/span>/p>/li>/ul>/div>div>p>/p>p>b>span>Ways to Stand Out from the Crowd:/span>/b>span> /span>/p>/div>div>ul>li>p>span>span>Proven passion for solving sophisticated technical problems and delivering impactful solutions. /span>/span>/p>/li>li>p>span>span>Record of publications in top-tier conferences. /span>/span>/p>/li>li>p>span>span>Experience in designing and building large-scale AI training clusters. /span>/span>/p>/li>li>p>span>span>Post-PhD research experience /span>/span>/p>/li>li>p>span>span>Practical understanding of deep learning systems, GPU acceleration, and AI model execution flows. /span>/span>/p>/li>/ul>p>/p>/div>/div>p style="text-align:inherit">/p>p style="text-align:inherit">/p>p style="text-align:inherit">/p>p style="text-align:inherit">/p>

JR2007940

JR2007773

Nvidia