Senior ML Embedded Engineer
Location: Ramat Hahayal, Tel Aviv
Employment Type: Full-time
Company: GSI Technology – A publicly traded, international high-tech company (NASDAQ: GSIT) developing the cutting-edge Gemini® Associative Processing Unit (APU) for computer-in-memory acceleration.
GSI is pioneering the Gemini APU—a cutting-edge, game-changing processor designed to accelerate compute-intensive tasks like large language models, machine learning, advanced image processing, and radar imaging.
If you're passionate about architecting high-performance software systems, implementing advanced algorithms, and drilling into low-level technical details, this is the role for you.
We’re seeking a dynamic and fast-learning engineer with a passion for diving deep into large language model implementations, and a keen focus on performance optimization and efficient execution.
Position Overview
We are seeking a highly skilled and motivated Senior ML Embedded Software Engineer to lead the development and optimization of AI models — including Large Language Models (LLMs) and Vision Language Models (VLM;s) — on GSI’s proprietary APU. This role bridges high-level machine learning understanding with low-level system and performance engineering, primarily in Python ,C and C++. You will be responsible for architecting, implementing, and optimizing AI pipelines under hardware constraints, with a strong emphasis on computer vision and transformer architectures.
Key Responsibilities
- Develop and optimize software libraries for CNNs, LLM’s and VLM implementations on embedded hardware.
- Design end-to-end system flows integrating AI models, especially in computer vision domains.
- Lead performance tuning efforts under constraints such as memory, compute, and latency.
- Work closely with hardware teams to co-design software optimized for GSI’s APU.
- Debug and optimize AI inference pipelines, including Python-based pre/post-processing where applicable.
- Team up across disciplines to turn wild ideas into reliable, high-performance code.
- Architect and develop a high-performance AI compiler framework for deploying quantized neural networks on the GSI Gemini edge platform, enabling advanced edge AI workloads and optimizing for low-latency inference, efficient hardware utilization, and seamless integration with hardware acceleration pipelines.
Required Qualifications
- B.Sc. in Computer Science or Electrical Engineering from a leading university.
- 5+ years of experience in embedded software development using C++ and C.
- Solid experience in one or more of the following: Computer Vision, RT-Embedded, DSP.
- Proven experience in developing and optimizing AI pipelines under performance, memory, and latency constraints.
- Proven track record in performance/memory-constrained programming.
- Strong communication skills, analytical mindset, and attention to detail.
- Independent, solution-oriented, and highly motivated to make things happen
- Proven track record developing and optimizing software algorithms with deep consideration for hardware architecture, memory bandwidth, and system constraints
- Strong understanding of processor architecture fundamentals—caches, pipeline stages, execution units, and memory hierarchies
- Ability to interpret detailed hardware specifications and translate them into robust, efficient software solution.
Preferred Qualifications
- Practical experience with transformer architectures and/or vision-language models (VLMs).
- Deep knowledge of computer vision pipelines and multimodal systems.
- Experience designing complex software systems from concept to deployment.
- Familiarity with hardware-aware optimization techniques such as:
- Quantization
- Pruning
- Kernel fusion
- Experience with performance profiling tools (e.g., PyTorch Profiler, NVIDIA Nsight).
- Low-level optimization experience with CUDA, OpenCL, or hardware-specific SDKs.
Privacy Statement
All applications will be handled with strict confidentiality. Your information will not be shared without your consent.