DevJobs

Senior LLM Agents Architect

Overview
Skills
  • C++ C++
  • Python Python
  • PyTorch PyTorch
  • RAG
  • CUDA
  • Nsight Compute
  • Nsight Systems
  • Triton
  • TorchInductor
  • TorchDynamo
  • PTX
  • NVLink
  • LangGraph
  • LangChain
  • InfiniBand
  • CrewAI
  • Codex SDK
  • Codex
  • Claude Code
We don't just build the hardware and software that powers the AI revolution — we are building the AI that designs the next generation of both. Our team sits at the intersection of inference software and GPU architecture, creating autonomous LLM-driven systems that reason about hardware, write high-performance CUDA, and automate the complex loops of architectural simulation, analysis, and optimization.

We are looking for a senior LLM Agents Architect to work hands-on with hardware architects, verification engineers, GPU performance experts, and software developers to build end-to-end agent flows that drive significant improvements in kernel optimization, architectural exploration, and developer efficiency.

What You'll Be Doing

  • Design and build agentic AI systems that generate, analyze, and optimize GPU compute kernels — targeting speed-of-light performance on NVIDIA hardware.
  • Collaborate with GPU architects and performance engineers to encode domain expertise — memory hierarchy trade-offs, occupancy tuning, instruction-level reasoning — into agent workflows that rival hand-tuned optimization.
  • Build automated performance forensics agents capable of ingesting large-scale simulation traces and Nsight profiler data to identify bottlenecks and propose architectural or software mitigations.
  • Partner with HW architects to develop agentic flows for GPU architectural studies — enabling rapid what-if analysis across micro-architecture configurations such as cache sizing, memory controller design, and compute unit scaling.
  • Explore agentic approaches to HW/SW co-design challenges, including replacing or augmenting graph-compiler functionality (e.g., TorchInductor) with LLM-driven optimization and code-generation pipelines.
  • Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.
  • Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.
  • Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.

What We Need To See

  • 7+ years in applied ML/AI or large-scale systems, with 2+ years crafting agentic or LLM-powered applications in production environments.
  • B.Sc in Computer Science / Electrical Engineering.
  • Solid grounding in computer architecture: memory hierarchies, parallelism models, pipelining, and cache behavior. Specific familiarity with NVIDIA GPU architecture — streaming multiprocessors, warp scheduling, shared/global memory model, and occupancy reasoning — is essential.
  • Hands-on CUDA programming experience: writing, profiling, and optimizing GPU kernels — not just calling into CUDA-accelerated libraries. Comfortable with tools such as Nsight Compute, Nsight Systems, or equivalent profiling workflows.
  • Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production — not just experience with off-the-shelf frameworks.
  • Strong software engineering skills in Python and one systems language (C++ preferred).
  • Proficient in tool use, RAG pipelines, and model adaptation techniques for building agentic systems.
  • Demonstrated ability to collaborate with HW/SW domain experts and translate their heuristics into deterministic tools, constraints, and evaluation metrics.
  • Excellence in communication and facilitation: aligning diverse collaborators, documenting decisions/assumptions, and influencing without authority.
  • Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.

Ways To Stand Out From The Crowd

  • Familiarity with the PyTorch compilation and lowering stack (torch.compile, TorchDynamo, TorchInductor, Triton, down to PTX), and with GPU graph compilers, kernel fusion strategies, or auto-tuning frameworks.
  • Background in performance engineering for HPC or GPU-accelerated workloads, including experience with performance modeling or hardware simulators.
  • Familiarity with distributed processing, multi-GPU workloads, and networking (e.g., NVLink, InfiniBand).
  • Familiarity with frontier agentic coding tools (e.g., Claude Code, Codex, Cursor) — understanding their underlying architecture: tool orchestration, context management, and autonomous task execution patterns.
  • Hands-on experience building a domain-specific coding agent — whether on top of frontier agentic harnesses (e.g., Claude Code, Codex SDK) or lower-level agent frameworks (e.g., LangChain/LangGraph deep agents, CrewAI). Comfortable with the design choices that make a coding agent useful in practice: task scoping, tool and context curation, evaluation, and failure recovery.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com.

, , JR2005216

Nvidia