About the Position
Lead and build the Data Intelligence function that ingests, validates, harmonizes and serves collaborator and internal data - including data for a production clinical component - to power ML/LLM research and product workflows. Hands-on technical leader who owns data platform, LLM/tooling orchestration, clinical/biological/chemical fidelity, and regulatory/QMS-ready governance.
Core responsibilities
- Own end-to-end internal multimodal data platform (collaborators’ EHR, imaging, omics, assays).
- Design, build and validate the clinical data environment (generation, augmentation, fidelity metrics and privacy safeguards) for model training and experiments.
- Architect, deploy, and operate robust LLM systems and tools to structure data and reach the company's KPIs. Execute rigorous evaluation pipelines (hallucination guardrails, accuracy, and reliability tracking).
- Hire and lead a cross-functional team (data engineering, clinical annotators, bioinformatics) and own standards and tech choices.
- Partner with clinical, product, compliance and bizdev teams to translate domain requirements into systematically defined data products.
- Lead secure, auditable collaborator data transfer onboarding: schema mapping, transfer, de-identification, provenance (data lineage), and monitoring.
Must-have qualifications
- Proven leader of cross-functional technical teams
- 5+ years building and operating large data platforms/pipelines; demonstrable experience with clinical or biomedical datasets, data lakehouses.
- 3+ years of building Multimodal data pipelines, schema, and data harmonization.
- MS/PhD in CS, Bioinformatics, Computational Biology, Biomedical Engineering or clinical/chemical discipline.
- Strong software & data engineering skills: Python, SQL, orchestration, Docker, data modeling, modern data lakehouse, and large-scale data processing.
- Experience with Linux environment and CLI tools for data transfer.
- Familiarity with object storage and cloud providers (AWS/GCP/Azure/OCI).
A Significant Advantage
- Hands-on production experience with LLMs and LLM tooling (RAG, orchestration, tool integrations, evaluation, prompt engineering).
- Domain experience in oncology, immunology, assays or pharma/CRO partnerships.
- Experience working with clinical data models and standards (e.g., FHIR/HL7, OMOP, ICD, SNOMED, LOINC, CDISC) in real-world datasets.
- Experience implementing data catalogs and lineage tools.