About the Job
Innodata's Frontier AI teams are pushing the boundaries of reinforcement learning applications—and RLVR (Reinforcement Learning with Verifiable Rewards) and RL Gyms in particular—to train, evaluate, and stress-test the world's most advanced AI models and agents. We're hiring an Applied RL Scientists to join our leading researchers, chief scientist, and VP for AI to design the algorithmic core of these systems, the implementation frameworks of RL environments, and to turn cutting edge research ideas into shipped pipelines on short timescales.
You will work side-by-side with our researcher team to design reward models, training objectives, data-generation strategies, and evaluation methodologies. You'll prototype them in code, run rigorous experiments, and collaborate with engineers to deploy what works into production. This is an applied research-heavy role for someone who can read a paper on Thursday and have a working implementation by Sunday.
What You'll Do
- Help steer the algorithmic direction of our RL training environments, evaluation, and data-generation workflows.
- Translate research ideas into working code—both internal prototypes and production-grade pipelines.
- Design reward models, verifiers, and evaluation harnesses with defensible properties.
- Run experiments, rigorously analyze results, and use findings to drive the next iteration.
- Partner with engineers to operationalize the right algorithms at scale.
- Stay current on the literature about RL, post-training, and evaluation, and bring in the most useful ideas quickly into production.
What You'll Bring
- PhD (preferred) or MSc in Computer Science, Mathematics, Statistics, Machine Learning, or related fields.
- Strong research background in reinforcement learning, ideally including exposure to RLHF, RLVR, DPO, or other post-training methods.
- Hands-on experience implementing RL algorithms from scratch (PPO, GRPO, DPO, or similar).
- Strong Python and PyTorch skills—comfortable writing custom training loops, not just using high-level wrappers.
- Solid mathematical foundations: probability, statistics, optimization, linear algebra.
- A track record of taking research from ideas to working code quickly.
- Excellent English communication—you can explain a method clearly to engineers and a result clearly to partners.
- Creativity and problem solving
Bonus Points
- Publications at top ML venues (NeurIPS, ICML, ICLR, ACL, EMNLP).
- Experience designing reward models, verifiers, or evaluation methodologies for LLMs.
- Familiarity with distributed training infrastructure and large-scale experiments.
- Open-source contributions to RL or LLM post-training libraries (TRL, OpenRLHF, verl, etc.).
- Experience working closely with engineering teams to ship research into production.