Senior LLM Agents Architect

No longer accepting applications

Overview

Job TypeHybrid

Experience7 years

Job PositionAI/ML

UpdatedMay 21, 2026

LocationYokneam Ilit

SalaryN/A

Skills

C++
Python
PyTorch
CUDA
RAG
Nsight Compute
Nsight Systems
Triton
TorchInductor
TorchDynamo
torch.compile
PTX
NVLink
LangGraph
LangChain
InfiniBand
HPC
CrewAI
Codex SDK
Codex
Claude Code

We don't just build the hardware and software that powers the AI revolution — we are building the AI that designs the next generation of both. Our team sits at the intersection of inference software and GPU architecture, creating autonomous LLM-driven systems that reason about hardware, write high-performance CUDA, and automate the complex loops of architectural simulation, analysis, and optimization.

We are looking for a senior LLM Agents Architect to work hands-on with hardware architects, verification engineers, GPU performance experts, and software developers to build end-to-end agent flows that drive significant improvements in kernel optimization, architectural exploration, and developer efficiency.

What You'll Be Doing

Design and build agentic AI systems that generate, analyze, and optimize GPU compute kernels — targeting speed-of-light performance on NVIDIA hardware.
Collaborate with GPU architects and performance engineers to encode domain expertise — memory hierarchy trade-offs, occupancy tuning, instruction-level reasoning — into agent workflows that rival hand-tuned optimization.
Build automated performance forensics agents capable of ingesting large-scale simulation traces and Nsight profiler data to identify bottlenecks and propose architectural or software mitigations.
Partner with HW architects to develop agentic flows for GPU architectural studies — enabling rapid what-if analysis across micro-architecture configurations such as cache sizing, memory controller design, and compute unit scaling.
Explore agentic approaches to HW/SW co-design challenges, including replacing or augmenting graph-compiler functionality (e.g., TorchInductor) with LLM-driven optimization and code-generation pipelines.
Rapidly prototype and thoughtfully productize; integrate with internal services, utilize GPU capabilities, remove bottlenecks, and deliver fitting solutions.
Set up evaluation backbone using offline golden sets and online telemetry for confident iterations, cost control, and safe improvements.
Mentor and improve teams through insights in agent orchestration, prompting, RAG, observability, crafting documentation and playbooks for NVIDIA's teams.

What We Need To See

7+ years in applied ML/AI or large-scale systems, with 2+ years crafting agentic or LLM-powered applications in production environments.
B.Sc in Computer Science / Electrical Engineering.
Solid grounding in computer architecture: memory hierarchies, parallelism models, pipelining, and cache behavior. Specific familiarity with NVIDIA GPU architecture — streaming multiprocessors, warp scheduling, shared/global memory model, and occupancy reasoning — is essential.
Hands-on CUDA programming experience: writing, profiling, and optimizing GPU kernels — not just calling into CUDA-accelerated libraries. Comfortable with tools such as Nsight Compute, Nsight Systems, or equivalent profiling workflows.
Proven ownership of at least one end-to-end agentic system or LLM application: requirements, architecture, implementation, evaluation, and incremental hardening in production — not just experience with off-the-shelf frameworks.
Strong software engineering skills in Python and one systems language (C++ preferred).
Proficient in tool use, RAG pipelines, and model adaptation techniques for building agentic systems.
Demonstrated ability to collaborate with HW/SW domain experts and translate their heuristics into deterministic tools, constraints, and evaluation metrics.
Excellence in communication and facilitation: aligning diverse collaborators, documenting decisions/assumptions, and influencing without authority.
Track record of building observability for AI systems: dataset/version management, offline test suites, online telemetry, guardrails/safety checks, and rollback plans.

Ways To Stand Out From The Crowd

Familiarity with the PyTorch compilation and lowering stack (torch.compile, TorchDynamo, TorchInductor, Triton, down to PTX), and with GPU graph compilers, kernel fusion strategies, or auto-tuning frameworks.
Background in performance engineering for HPC or GPU-accelerated workloads, including experience with performance modeling or hardware simulators.
Familiarity with distributed processing, multi-GPU workloads, and networking (e.g., NVLink, InfiniBand).
Familiarity with frontier agentic coding tools (e.g., Claude Code, Codex, Cursor) — understanding their underlying architecture: tool orchestration, context management, and autonomous task execution patterns.
Hands-on experience building a domain-specific coding agent — whether on top of frontier agentic harnesses (e.g., Claude Code, Codex SDK) or lower-level agent frameworks (e.g., LangChain/LangGraph deep agents, CrewAI). Comfortable with the design choices that make a coding agent useful in practice: task scoping, tool and context curation, evaluation, and failure recovery.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com.

, , JR2005216

Nvidia

Similar jobs

AI/Data Science Team Manager

HerzliyaJun 22, 2026
AI Squad Leader

Tel Aviv-YafoJun 21, 2026
Applied AI Engineer

Tel Aviv-YafoApr 09, 2026
מהנדס/ת מערכת

AshdodMar 29, 2026
Research Team Lead

Tel Aviv DistrictJun 19, 2026
CTO AI Lab Architect

Tel Aviv-YafoJul 02, 2026
Data Scientist

RehovotJun 02, 2026
Senior Machine Learning Engineer

Or YehudaJul 01, 2026

Your Account

Your Account