
חדש באתר! העלו קורות חיים אנונימיים לאתר ואפשרו למעסיקים לפנות אליכם!
AI Accelerator Software Engineer – Silicon Software & Low-Level AI
Most GPU engineers work within the limits of what NVIDIA decided.
Here, you decide the limits.
GSI Technology (NASDAQ: GSIT) is developing Gemini2 — an Associative Processing Unit built for ultra-low latency, high-parallelism AI execution. We're not building on top of someone else's stack. We're building the stack — and we need engineers who've been waiting for exactly this kind of problem.
🔬 The gap you'll close
Between modern AI models and novel compute-in-memory hardware lies a space that PyTorch can't see and CUDA can't reach — memory access patterns, DMA flows, instruction scheduling, and execution strategies that simply don't have a reference implementation yet.
That's your domain.
⚙️ What you'll build
Highly optimized compute kernels for Transformer inference, LLM/VLM execution, FFTs, OpenCV pipelines, and Edge AI workloads
Memory access patterns, DMA utilization, and instruction scheduling — tuned for silicon that didn't exist two years ago
Performance analysis pipelines using profilers, traces, and hardware analyzers — and then fix what you find
Benchmarking infrastructure, internal tooling, and testing frameworks
Work directly with Architecture, Compiler, and AI teams — your kernel-level decisions shape how the next version of the chip gets designed
✅ What we need
B.Sc./M.Sc. in CS, EE, or equivalent
6+ years in low-level C/C++: embedded, firmware, accelerator, systems, or performance-critical software
Deep understanding of:
Memory hierarchies, caches, DMA, and bandwidth optimization
Parallel execution and performance-critical code
Hardware-aware algorithm optimization
Bit-level and systems-oriented reasoning
⭐ Strong bonus if you bring
GPU / NPU / DSP / FPGA or custom accelerator programming
Assembly or low-level programming experience
Compute kernel, firmware, or driver development
AI inference optimization or deep learning infrastructure
Profiling, tracing, and performance-debug experience
🎯 You're likely a strong fit if you've ever...
Written CUDA or HIP kernels — and wanted to go deeper than the driver allows
Spent days hunting a 3% latency regression in embedded firmware and felt satisfied when you found it
Looked at a DMA controller spec and felt curious, not scared
Worked on DSP algorithms and wondered what it'd feel like to do it for AI workloads
Had opinions about both sides of a hardware/software interface
📍 Tel Aviv, Ramat Hahayal | Full-Time | Hybrid
💰 Competitive compensation + (NASDAQ: GSIT)
Not sure if your background is the right fit? Reach out— we'd rather have the conversation.