DevJobs

AI Research Engineer - Maternity Leave Replacement

Overview
Skills
  • Python Python
  • C++ C++
  • Deep learning Deep learning ꞏ 5y
  • PyTorch PyTorch
  • Diffusion
  • GANs
  • Transformers
  • CUDA
  • ONNX
  • Quantization
  • TensorRT
  • WebRTC

The role

We are looking for a brilliant AI Research Engineer to build the brain and body of our Real-Time Avatar & Conversational Stack. This is a hands-on, deep-tech role where you will design, train, and optimize the next generation of Multimodal AI Models. You will join an elite R&D unit, working at the bleeding edge of Generative Video, Speech Synthesis, and Large Language Models. Your mission is to solve one of the hardest problems in AI: creating a unified, ultra-low-latency agent that can see, hear, and speak with human-level fidelity. You won't just implement papers; you will architect the systems that define the state-of-the-art for enterprise video.


The day-to-day

  • Collaborate with Technical Leadership: Partner directly with the Head of AI to architect the long-term research roadmap. You will work shoulder-to-shoulder with other AI Research Engineers, brainstorming novel architectures and conducting peer reviews to push the collective intelligence of the team.
  • Master Multimodal Architectures: Research and train large-scale models that fuse Video Generation (pixels), Audio (speech/prosody), and Text (semantics) into a cohesive experience.
  • Next-Gen Video Synthesis: Develop and optimize advanced architectures—specifically Diffusion Transformers (DiT) and modern GANs—for photorealistic avatar synthesis, focusing on lip-sync accuracy and temporal consistency.
  • Conquer Real-Time Constraints: Tackle the challenge of "in-the-wild" inference. You will optimize heavy foundation models to run within strict millisecond latency budgets, ensuring fluid, uninterrupted conversation.
  • Advance the Speech Stack: Enhance our proprietary Streaming ASR and Neural TTS architectures to handle interruptions, emotional intonation, and multi-speaker dynamics seamlessly.


Ideally, we’re looking for:

  • 5+ years of experience in Deep Learning research and engineering, with a strong track record of bringing research concepts to production.
  • Advanced Academic Background: M.Sc. or Ph.D. in Computer Science, AI, or a related field, with a focus on Generative Models or Computer Vision.
  • Generative Media Expertise: Deep understanding of modern architectures (Transformers, Diffusion, GANs) applied to video synthesis, neural rendering, or audio generation.
  • Strong Engineering Skills: Proficiency in Python and deep learning frameworks (PyTorch is preferred), with the ability to write clean, modular, and scalable code.
  • Inference Optimization: Experience optimizing models for low-latency real-time inference (e.g., Quantization, TensorRT, ONNX).


These would also be nice:

  • Top-Tier Publications: A record of published papers in major AI conferences (CVPR, NeurIPS, ICCV, etc.).
  • Low-Level Optimization: Experience with CUDA or C++ for maximizing GPU performance.
  • Streaming Knowledge: Familiarity with real-time media protocols like WebRTC.


The perks:

  1. Hybrid, flexible work environment
  2. Extended private health (including mental) insurance
  3. Personal and professional development programs
  4. Occasional Cross company long weekends
Kaltura