Who are we?
Our team at the Huawei Computing Network Innovation Lab is looking for exceptional talent to join us and lead the development of next generation data centers. We create cutting-edge technologies that synergize software and hardware in tandem to accelerate compute, storage and networking at large-scale. We aim to drive innovations and deliver software defined infrastructure and algorithms to HPC, AI/ML, and Big Data applications.
We are looking for outstanding candidates with hands-on experience in development and optimization of AI frameworks. If you are a team player with excellent communication skills and motivation to revolutionize application performance, you’re welcome on board!
What will you be doing?
- Work as part of an innovative research team to analyze, develop, test and deploy improvements that enhance Huawei’s distributed AI framework.
- Develop optimizations that leverage hardware accelerator capabilities, minimize communication overhead and improve training/inference throughput
- Push the boundaries of the state of the art in LLM performance and efficiency, including model compression and quantization
- Analyze, profile and optimize the latest LLM AI algorithms, and implement as production-quality software libraries for latency-critical use-cases on next-generation hardware.
- Work in a distributed computing environment to optimize for both scale-up (multi-device) and scale-out (multi-node) systems
- Utilize advanced concepts such as Uncertainty Quantification, Mixed Precision Computing and Model Sparsity to improve performance and enable training of very large AI models
- Collaborate with partners from top universities, and open-source communities to conduct state-of-the-art research
What do we want to see?
- B.Sc. degree in computer science, computer engineering, or a closely related field
- 5+ years of experience in AI kernel and performance optimizations
- Excellent C/C++ programming and software design skills, including debugging, performance analysis, and testing
- Strong technical skills and experience with developing code in a Linux environment
- Excellent teamwork and interpersonal skills
- Ability to work independently, define project goals and scope, and lead your own development effort
- Innovative thinking
Ways to stand out from the crowd:
- M.Sc. or Ph.D. degree
- Proven track record of conducting and publishing independent research
- Experience in optimizing distributed deep learning pipelines with TensorFlow / PyTorch
- Experience in analyzing workloads on large scale heterogeneous clusters
- Hands-on experience in developing code to target heterogeneous architectures (e.g. CPU/GPU/TPU)
- Experience in developing and contributing to large open-source libraries