DevJobs

Machine Learning Scientist II - GenAI Evaluation

Overview
Skills
  • Python Python
  • Java Java
  • SQL SQL
  • Kafka Kafka
  • Spark Spark
  • Hadoop
About Us: At Booking.com, data drives our decisions. Technology is at our core. And innovation is everywhere. But our company is more than datasets, lines of code or A/B tests. We’re the thrill of the first night in a new place. The excitement of the next morning. The friends you encounter. The journeys you take. The sights you see. And the memories you make. Through our products, partners and people, we make it easier for everyone to experience the world.

Leadership/Team Quote:

The Content Intelligence team is responsible for developing machine learning solutions based on computer vision, natural language processing, and generative AI to drive supply enrichment and power a wide range of applications at Booking.com. This includes building GenAI agents for tasks such as dialog systems, Q&A, and trip planning, as well as developing foundational capabilities like agent infrastructure and evaluation. A key focus of the team is also the development of in-house travel-specific LLMs to support core use cases across the platform.

Role Description:

As a Machine Learning Scientist, your work will focus on the evaluation and optimization of generative AI systems. You will develop and fine-tune Judge LLMs to assess model outputs across a variety of tasks, design robust evaluation frameworks for agentic workflows, and build scalable pipelines for synthetic data generation. The team also plays a critical role in multilingual evaluation, enabling GenAI applications to support market expansion across all supported languages.

Key Job Responsibilities and Duties:

  • Develop and apply state-of-the-art techniques for evaluating generative AI systems, with a focus on agent workflows, multilingual output, and task-specific Judge LLMs.
  • Design and implement scalable evaluation pipelines, including synthetic data generation and benchmarking for model quality, relevance, and consistency..
  • Optimize and maintain Judge LLMs to assess outputs across dialog systems, Q&A, and trip planning use cases.
  • Conduct in-depth data analysis to define and track evaluation metrics, validate label quality, and explore performance across different languages and user scenarios.
  • Ensure the reliability, efficiency, and scalability of evaluation tools and frameworks in both offline and online environments.
  • Collaborate closely with ML engineers to integrate evaluation components into production pipelines, supporting continuous improvement of GenAI applications.
  • Work cross-functionally with product, research, and analytics teams to align evaluation strategies with business goals and user impact.


Qualifications & Skills:

  • Advanced knowledge and experience in Computer Vision and Natural Language Processing, engineering aspects of developing ML and GenerativeAI models at scale.
  • Experience designing and executing end-to-end research and development plans and generating impact through large-scale machine learning model development. Preferably evidenced by peer-reviewed publication, patents, open sourced code or the like.
  • Relevant work or academic experience (MSc + 4 years of working experience, or PhD + 2 years of working experience), involved in the application of Machine Learning to business problems.
  • Masters degree, PhD or equivalent experience in a quantitative field (e.g. Computer Science, Engineering Mathematics, Artificial Intelligence, Physics, etc.).
  • Experience on multiple machine learning facets: working with large data sets, model development, statistics, experimentation, data visualization, optimization, software development.
  • Experience collaborating cross functionally in the development of machine learning products (e.g. Developers, UX specialists, Product Managers, etc.).
  • Strong working knowledge of Python, Java, Kafka, Hadoop, SQL, and Spark or similar technologies. Working experience with version control systems.
  • Excellent English communication skills, both written and verbal.
  • Successfully driving technical, business and people related initiatives that improve productivity, performance and quality while communicating with stakeholders at all levels
  • Leading by example, gaining respect through actions, not your title. Developing your team and motivating them to achieve their goals. Providing feedback timely and managing your key team performance indicators


Benefits & Perks - Global Impact, Personal Relevance:

Booking.com’s Total Rewards Philosophy is not only about compensation but also about benefits. We offer a competitive compensation and benefits package, as well unique-to-Booking.com benefits which include:

  • Annual paid time off and generous paid leave scheme including: parent, grandparent, bereavement, and care leave
  • Hybrid working including flexible working arrangements, and up to 20 days per year working from abroad (home country)
  • Industry leading product discounts - up to 1400 per year - for yourself, including automatic Genius Level 3 status and Booking.com wallet credit


Diversity, Equity and Inclusion (DEI) at Booking.com:

Diversity, Equity & Inclusion have been a core part of our company culture since day one. This ongoing journey starts with our very own employees, who represent over 140 nationalities and a wide range of ethnic and social backgrounds, genders and sexual orientations.

Take it from our Chief People Officer, Paulo Pisano: “At Booking.com, the diversity of our people doesn’t just build an outstanding workplace, it also creates a better and more inclusive travel experience for everyone. Inclusion is at the heart of everything we do. It’s a place where you can make your mark and have a real impact in travel and tech.”

We ensure that colleagues with disabilities are provided the adjustments and tools they need to participate in the job application and interview process, to perform crucial job functions, and to receive other benefits and privileges of employment.

Application Process:

  • Let’s go places together: How we Hire
  • This role does not come with relocation assistance.


Booking.com is proud to be an equal opportunity workplace and is an affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. We strive to move well beyond traditional equal opportunity and work to create an environment that allows everyone to thrive.

Pre-Employment Screening

If your application is successful, your personal data may be used for a pre-employment screening check by a third party as permitted by applicable law. Depending on the vacancy and applicable law, a pre-employment screening may include employment history, education and other information (such as media information) that may be necessary for determining your qualifications and suitability for the position.
Booking.com