Distributed AI Framework Software Engineer (JB-237)

No longer accepting applications

Overview

Job TypeOn-site

Experience unknown

Job PositionAI/ML

UpdatedAug 27, 2024

LocationCenter District

SalaryN/A

Skills

C
C++
PyTorch
TensorFlow
Linux
CPU
GPU
TPU

Hod HaSharon | Haifa

Who are we?

Our team at the Huawei Computing Network Innovation Lab is looking for exceptional talent to join us and lead the development of next generation data centers. We create cutting-edge technologies that synergize software and hardware in tandem to accelerate compute, storage and networking at large-scale. We aim to drive innovations and deliver software defined infrastructure and algorithms to HPC, AI/ML, and Big Data applications.

We are looking for outstanding candidates with hands-on experience in development and optimization of AI frameworks. If you are a team player with excellent communication skills and motivation to revolutionize application performance, you’re welcome on board!

What will you be doing?

• Work as part of an innovative research team to analyze, develop, test and deploy improvements that enhance Huawei’s distributed AI framework.

• Develop optimizations that leverage hardware accelerator capabilities, minimize communication overhead and improve training/inference throughput

• Research state-of-the-art, distributed AI training and inference algorithms (e.g. FSDP, DDP) to develop accessible model sharding capabilities

• Profile different distributed AI training strategies, compare parallelization methods, and identify the main bottlenecks to be optimized on the computation and network communication levels.

• Work in a distributed computing environment to optimize for both scale-up (multi-device) and scale-out (multi-node) systems

• Utilize advanced concepts such as Uncertainty Quantification, Mixed Precision Computing and Model Sparsity to improve performance and enable training of very large AI models

• Collaborate with partners from top universities, and open-source communities to conduct state-of-the-art research

What do we want to see?

• B.Sc. degree in computer science, computer engineering, or a closely related field

• Excellent C/C++ programming and software design skills, including debugging, performance analysis, and testing

• Strong technical skills and experience with developing code in a Linux environment

• Excellent teamwork and interpersonal skills

• Ability to work independently, define project goals and scope, and lead your own development effort

• Innovative thinking

Ways to stand out from the crowd:

• M.Sc. or Ph.D. degree

• Proven track record of conducting and publishing independent research

• Experience in optimizing distributed deep learning pipelines with TensorFlow / PyTorch

• Experience in analyzing workloads on large scale heterogeneous clusters

• Hands-on experience in developing code to target heterogeneous architectures (e.g. CPU/GPU/TPU)

• Experience in developing and contributing to large open-source libraries

Toga Networks

Similar jobs

3D Algorithm Developer

Ramat GanJun 24, 2026
Applied AI Scientist - On Site

Tel Aviv-YafoJun 24, 2026
Junior AI Sales Engineer

Tel Aviv-YafoJun 23, 2026
AI Technology Lead

RaananaJun 22, 2026
AI Engineer (GenAI & Integration)

RaananaJun 21, 2026
AI Specialist

Petah TikvaJun 17, 2026
AI Engineer

Center DistrictJun 16, 2026
Co-Founder & CAIO For a Physical AI & Humanoid Autonomy Startup

KarmielJun 08, 2026

Your Account

Your Account