We’re looking for a deep learning software engineer who is versatile, curious, independent, and keen to embark on true challenges. As part of the Deci AI Inference Team, you will play a crucial role in optimizing and deploying deep learning models for real-time inference in various applications. You will work closely with our research scientists, software engineers, and product team to ensure efficient and accurate model deployment - from microprocessors to multi-accelerator cloud instances. You will utilize cutting edge hardware and state-of-the-art models, all while practicing software development best practices.
To succeed in this role, you must possess a thorough understanding of both software engineering principles and deep learning theory. Your contributions will encompass the development of Deci’s core products - enabling graph compilation, runtime optimization, model deployment and more - all aimed at squeezing the most out of Deci’s customers’ HW
Requirements:
- Familiarity with highly concurrent systems and their SW stacks - GPUs, DL accelerators, CUDA, Triton, etc.
- Extensive experience deploying deep neural models to production settings - on cloud or edge devices
- Track record of profile-based performance analysis, methodical discovery of bottlenecks, and general hardware utilization maximization
- Knowledge of common SOTA deep learning architectures, their pre and post-processing transformations, and the relevance of these transformations to different deep learning tasks
- A deep understanding of the transformer architecture, the latest attention mechanisms (FlashAttention, PagedAttention, …), and the LLM optimization and serving space is a large bonus
- Familiarity with cloud computing platforms (AWS, GCP, Azure) and knowledge of containerization-related technologies (Docker, Kubernetes, containers)
Preferred qualifications:
- Familiar with highly concurrent systems, GPU programming, CUDA and the CUDA Toolkit
- Extensive experience deploying deep neural models to production settings - on cloud or edge devices
- Track record of profile-based performance analysis, methodical discovery of bottlenecks, and general hardware utilization optimization
- Familiar with stream processing frameworks (GStreamer, ROS, …) and efficient data-management techniques that they leverage
- Knowledge of common SOTA deep learning architectures, their pre and post processing data transformations, and the relevance of these transformations to different deep learning tasks
- Deep understanding of Unix-based OS internals - from process and thread management to the virtual memory system and file systems
Responsibilities:
- Develop the core logic behind Deci’s DL development platform. Contribute to inference libraries and internal model optimization tools used by Deci’s customers, researchers, and algorithm teams
- Develop strategies and infrastructure aimed at improving the reliability of Deci’s deep learning systems
- Develop and deliver production-grade, high-throughput real time inference-enabling frameworks
- Adopt and integrate cutting-edge research - in the fields of deep learning model optimization and deployment - into Deci products and tools