As a Junior Data Scientist, you’ll join our data science team and help build data- and model-driven features, with a strong focus on modern ML and LLMs. You’ll work closely with senior data scientists and engineers to analyze data, prototype and evaluate models, and support their integration into production workflows. This is a hands-on role with plenty of room to learn and grow.
What You’ll Do
- Explore and analyze large datasets using Python and SQL, and turn findings into clear, actionable insights.
- Help building and evaluating supervised ML models (e.g., classification, regression) using pandas, NumPy, scikit-learn, PyTorch.
- Run LLM/SLM inference via APIs (e.g., OpenAI/Anthropic/Gemini) and, where relevant, internal or self-hosted models.
- Support prompt and context engineering: drafting and iterating prompt templates, few-shot examples, and simple retrieval-augmented flows.
- Assist in early-stage RAG and agentic workflows (e.g., basic retrieval pipelines, tool-using agents)
- Help define and track model and LLM quality metrics (accuracy, precision/recall, latency, cost) and take part in applying basic guardrails and safety checks.
Requirements:
What You Bring (Required)
- Bachelor's degree in Data Science, Computer Science, Statistics, Mathematics, Engineering or a related quantitative field.
- Some hands-on experience with ML or data projects (coursework, internships, personal projects, or up to 1–2 years of industry experience).
- Solid understanding of statistics and core ML concepts (regression, classification, clustering, NLP).
- Proficiency in Python (pandas, NumPy, scikit-learn, PyTorch) and SQL for data analysis and modeling.
- Exposure to LLM APIs and basic prompt engineering.
- Familiarity with Git and collaborative coding.
- Strong analytical skills and clear communication, especially when explaining technical work to non-technical stakeholders.
Nice To Have (Preferred)
- Master’s degree in a relevant quantitative field.
- 1–3 years of experience in a Data Scientist / ML/AI Engineer role.
- Experience with open source/self-hosted LLMs (HuggingFace), agent frameworks (LangChain/LlamaIndex/LangGraph/CrewAI), observability (Langfuse/LangSmith), and vector DB + RAG (FAISS/Pinecone/Chroma).