
חדש באתר! העלו קורות חיים אנונימיים לאתר ואפשרו למעסיקים לפנות אליכם!
AI Accelerator Software Engineer
Location: Ramat Hahayal, Tel Aviv
Company: GSI Technology Israel
Employment Type: Full-time
About GSI Technology
GSI Technology is building Gemini2, an advanced Associative Processing Unit designed for efficient AI execution with ultra-low latency, high parallelism, and low power consumption.
The Role
We are looking for exceptional engineers who enjoy working close to the hardware and extracting maximum performance from novel compute architectures.
As an AI Accelerator Software Engineer, you will develop highly optimized low-level software, mathematical kernels, and execution flows for GSI’s Associative Processing Unit.
You will work on performance-critical code involving parallel execution, memory movement, DMA behavior, instruction scheduling, synchronization, and throughput optimization.
This role combines systems programming, accelerator kernel development, hardware-aware algorithm design, and AI inference optimization.
You will collaborate closely with architecture, compiler, and AI teams to shape how next-generation AI workloads execute on custom hardware.
What You’ll Work On
Design and optimize low-level compute kernels for AI and signal-processing workloads
Develop optimized implementations for: Transformer inference, LLM/VLM execution flows, OpenCV pipelines, FFTs, Edge AI workloads, Optimize memory access patterns, DMA utilization, execution flow, and scheduling
Analyze bottlenecks using profilers, traces, and hardware analyzers , Prototype and benchmark new execution strategies
Build internal tooling, testing infrastructure, and performance analysis frameworks
Use modern AI-assisted engineering tools to accelerate development and exploration
Required Qualifications
B.Sc. or M.Sc. in Computer Science, Electrical Engineering, Software Engineering, or related field
6+ years of professional C/C++ development in low-level, embedded, systems, firmware, accelerator, or performance-critical software
Strong background in:
Computer architecture
Memory hierarchies, caches, DMA, and bandwidth optimization
Parallel systems and performance-critical software
Hardware-aware algorithm optimization
Bit-level and systems-oriented reasoning
Preferred Experience
Accelerator programming (GPU, NPU, DSP, FPGA, or custom processors)
Assembly or other low-level programming
Development of compute kernels, firmware, or drivers
AI systems, deep learning infrastructure, or inference optimization
Profiling, tracing, benchmarking, and performance-debug tools
Effective use of modern AI development tools and coding assistants
Strong Fit Backgrounds
GPU kernels
DSP algorithms
Embedded high-performance C/C++
Computer architecture or RTL
AI inference optimization
Signal-processing or vision pipelines
Hardware/software co-design