DevJobs

AI Application Team Lead

Overview
Skills
  • C++ C++
  • Python Python
  • PyTorch PyTorch
  • ML ML
  • AI ꞏ 5y
  • Performance optimization ꞏ 5y
  • ML engineering ꞏ 5y
  • ML systems ꞏ 5y
  • Triton
  • Token pipelines
  • Softmax
  • MLPerf
  • MLIR
  • LLM architectures
  • Kernel profiling
  • Kernel optimization
  • GEMM
  • Cuda
  • Attention
NextSilicon is reimagining high-performance computing (HPC & AI). Our accelerated compute solutions leverage intelligent adaptive algorithms to vastly accelerate supercomputers, driving them forward into a new generation. We have developed a novel software-defined hardware architecture that is achieving significant advancements in both the HPC and AI domains.

At NextSilicon, everything we do is guided by three core values:

  • Professionalism: We strive for exceptional results through professionalism and unwavering dedication to quality and performance.
  • Unity: Collaboration is key to success. That's why we foster a work environment where every employee can feel valued and heard.
  • Impact: We're passionate about developing technologies that make a meaningful impact on industries, communities, and individuals worldwide.
  • impact on industries, communities, and individuals worldwide.

We are seeking a highly skilled AI Application Team Lead to build and lead a team responsible for developing, running, and optimizing large-scale AI workloads on NextSilicon’s AI hardware platform. This role focuses on benchmarking state-of-the-art models (e.g., LLaMA, DeepSeek), executing MLPerf suites, analyzing system-level performance, and driving cross-stack optimizations across hardware, runtime, and software frameworks.

The ideal candidate combines strong technical depth in AI/ML systems, hands-on experience with LLM workloads, and leadership capability to guide a high-performance engineering team.

Requirements:

  • 5+ years of experience in AI/ML engineering, performance optimization, or ML systems.
  • Deep understanding of LLM architectures, training & inference mechanics, and modern ML frameworks.
  • Strong proficiency in PyTorch ecosystem, with a specific focus on performance tuning via Triton, Cuda or MLIR-based compiler frameworks.
  • Hands-on expertise profiling and optimizing kernels (GEMM, attention, softmax, token pipelines).
  • Demonstrated experience running or tuning MLPerf or similar large-scale benchmarks.
  • Strong Python and C++ development skills.
  • Proven leadership experience: mentoring, guiding, or managing engineers.

Responsibilities:

  • Lead and mentor a team of AI application and performance engineers.
  • Run and optimize AI workloads (LLaMA, DeepSeek, etc.) and execute MLPerf benchmarks.
  • Analyze end-to-end performance and identify HW/SW bottlenecks.
  • Develop optimization strategies across models, kernels, frameworks, and runtime.
  • Build profiling, debugging, and validation tools for large-scale AI workloads.
  • Collaborate with hardware, compiler, and device software teams to improve performance.
NextSilicon