DevJobs

Language Data Scientist

Overview
Skills
  • Python Python
  • Power BI Power BI
  • Tableau Tableau
  • Hugging Face
  • NLTK
  • SpaCy

Position Summary

Innodata is building a team of Language Data Scientists and Gen AI experts to help our customers advance AGI. As a subject matter expert in LLMs, you will work hands-on with multi-modal and multi-lingual datasets and collaborate with cross-functional partners. You will use your unique experience and skills to drive innovation and continuous improvement

 

Who we’re looking for

You have over 5 years of experience in data science, language engineering, and AI and you bring a wealth of expertise in language, culture, and multi-lingual projects. You’re a pro at designing complex human evaluation tasks, analysing data with advanced statistical tools, and leading teams to success.

Your skills in machine learning, Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG) help you tackle challenges with a critical, innovative mindset. You’re also a strong communicator, excelling in cross-functional collaboration and understanding business needs. Plus, you’re experienced with powerful data tools like Tableau, Power BI, and various programming languages

 

Tell me more

As a Language Data Scientist, your role involves managing, consulting and engaging with customers on process improvisations in LLM training data synthesis, validation and annotation. Advise and support business unit heads on engaging with customers to understand the upstream activities that would be performed using Innodata Inc services

 

Responsibilities

·      Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers

·      Critically assess annotation tooling and workflows

·      Quantitatively analyse large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance

 

Requirements

·      MA in (computational) linguistics, data science, computer science (AI / ML / NLU), or a related scientific / quantitative field, PhD strongly preferred

·      Strong knowledge of data structures, algorithms, and data engineering principles

·      Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face

·      Proficiency in Python to handle / transform large datasets (e.g. pre- and post-processing data), to perform quantitative analyses, and to visualize data

·      Possess excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI propositions

·      Model Fine-Tuning: Knowledge of Fine-tune pre-trained models to adapt them to specific tasks and datasets, improving their performance and relevance

·      Data Engineering and Pipelines: Deep understanding of data pipelines to support ML and NLP workflows, knowledge of efficient data collection, transformation, and storage

·      Continuous Improvement: Updated with the latest advancements in ML and NLP technologies

·      Strong analytical and problem-solving abilities

·      Excellent communication and collaboration skills

·      Ability to work independently and as part of a team

·      Adaptable to changing technologies and methodologies

 

 

Preferred skills

·      Understanding of techniques such as GPT, VAE, and GANs

·      Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals

·      Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques

·      Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency

·      Experience of developing and maintaining AI pipelines, including data preprocessing, feature extraction, model training, and evaluation

·      Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and non-technical stakeholders

·      Contributing to the customer establishment of best practices and standards for generative AI development within the organization

·      Providing technical mentorship and guidance to junior team members



Who We Are

With more than 2,000 customers and operations in 13 cities around the world, we are the AI technology solutions provider-of-choice for 7 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, law, and medicine.


Only relevant applicants will be contacted

We are an equal opportunity employer

Innodata