Microcode SW Engineer for LLM / VLM applications
Location: Ramat Hahayal , Tel -Aviv
Employer: GSI Israel
Term: Full-time Position
GSI is pioneering the Gemini APU—a cutting-edge, game-changing processor designed to accelerate compute-intensive tasks like large language models, machine learning, advanced image processing, and radar imaging.
If you're passionate about architecting high-performance software systems, implementing advanced algorithms, and drilling into low-level technical details, this is the role for you
We’re seeking a dynamic and fast-learning engineer with a passion for diving deep into large language model implementations, and a keen focus on performance optimization and efficient execution
What you’ll be owning
- Deep dive into our cutting edge associative HW processing unit
- Design, build, and optimize low-level microcode—including instruction scheduling, memory access patterns, and control flow—for our custom Associative Processing architecture. You'll be working directly with a novel instruction set and hardware behaviors to craft routines that unlock parallelism and maximize throughput. This involves writing cycle-aware logic for compute units, managing hardware state transitions, and tuning for ultra-low latency across deeply pipelined data paths
- Prototype, and iterate on diverse workloads—including transformer-based LLM inference, OpenCV pipelines, FFTs, and edge ML use cases—pushing the boundaries of distributed compute and memory co-location
- Team up across disciplines to turn wild ideas into reliable, high-performance code
- Squash bugs and bottlenecks using analyzers, profilers, and trace tools, always backed by data
- Level up our CI, testing, and docs, keeping dev velocity high and friction low
- Adapt fast, dive into unfamiliar tech, and thrive in the pivot-friendly chaos of startup life
Qualifications
- B.Sc. or M.Sc. in Computer Science, Electrical Engineering, Software Engineering
- Experience Path 1:
- 5+ years of professional C/C++ development focused on low-level programming or microcode for hardware processing units (e.g., CPU, GPU)
- Experience Path 2:
- 5+ years in RTL design/verification plus 2+ years of hands-on C/C++ development
Required Technical Expertise
- Proven track record developing and optimizing software algorithms with deep consideration for hardware architecture, memory bandwidth, and system constraints
- Strong understanding of processor architecture fundamentals—caches, pipeline stages, execution units, and memory hierarchies
- Ability to interpret detailed hardware specifications and translate them into robust, efficient software solution
Preferred Qualifications / Additional Skills
- Practical experience with microcode development and optimization
- Proficiency in assembly language programming
- Strong understanding of deep learning or computer vision algorithms, architectures, and frameworks
- Demonstrated ability to port and refine complex algorithms in performance-sensitive, low-level environments
- Experience with Python scripting for tool creation, data analysis, and automated testing workflows
- Solid foundation in compiler theory, including design principles and code generation techniques
- Practical experience writing performance-critical code, including firmware, compute kernels, and device drivers
Our Privacy Policy: Your resume and information will be kept confidential.