Data Engineer, Scientific Data Ingestion

Company: Mithrl
Location: San Francisco
Posted on: April 2, 2026

Job Description:

ABOUT MITHRL We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives. Mithrl is building the world’s first commercially available AI Co-Scientist—a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports. Our traction speaks for itself: 12X year-over-year revenue growth Trusted by leading biotechs and big pharma across three continents Driving real breakthroughs from target discovery to patient outcomes. WHAT YOU WILL DO Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources — unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines. Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.). Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data — extracting metadata, inferring column roles/types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets. Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion — so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data. Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform. Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems. WHAT YOU BRING Must-have 5 years of experience in data engineering / data wrangling with real-world tabular or semi-structured data. Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar). Excellent experience dealing with messy Excel / CSV / spreadsheet-style data — inconsistent headers, multiple sheets, mixed formats, free-text fields — and normalizing it into clean structures. Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab-derived data. Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning. Strong desire and ability to own the ingestion & normalization layer end-to-end — from raw upload ? final clean dataset — with an eye for maintainability, reproducibility, and scalability. Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions. Nice-to-have Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs). Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions. Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi-tenant ingestion. Past exposure to LLM-based data transformation or cleansing agents — building or integrating tools that clean or structure messy data automatically. Any background in computational biology / lab-data / bioinformatics is a bonus — though not required. WHAT YOU WILL LOVE AT MITHRL Mission-driven impact: you’ll be the gatekeeper of data quality — ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You’ll have outsized influence over the reliability and trustworthiness of our entire data AI stack. High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You’ll work closely with our product, data science, and infrastructure teams — shaping how data is ingested, stored, and exposed to end users or AI agents. Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution Speed: We ship fast (2x/week) and improve continuously based on real user feedback Location: Beautiful SF office with a high-energy, in-person culture Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) 401(k) with top-tier plans We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification as listed. Research shows that people who identify as being from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy, so we urge you not to exclude yourself prematurely and to submit an application if you're interested in this work. We think AI systems like the ones we're building have enormous social and ethical implications. We think this makes representation even more important, and we strive to include a range of diverse perspectives on our team.

Keywords: Mithrl, Davis , Data Engineer, Scientific Data Ingestion, Science, Research & Development , San Francisco, California

Didn't find what you're looking for? Search again!

Let San Francisco recruiters find you. Post your resume for free!

Get San Francisco Science, Research & Development jobs via email.

View more Davis Science, Research & Development jobs

Other Science, Research & Development Jobs

(CW) Sr Research Associate II
Description: The selected candidate will work in the Separations and Biophysical Assays team within the Analytical Sciences group to assist in the advancement of BioMarin's therapeutics for clinical evaluation and (more...)
Company: BioMarin Pharmaceutical Inc.
Location: Novato
Posted on: 03/26/2026

Production Supervisor
Description: Job Description Job Description Production Supervisor Production Supervisor Position Overview We build the hardware/software for highly advanced VR flight simulators. The Production Supervisor is responsible (more...)
Company: CyberCoders
Location: Livermore
Posted on: 03/25/2026

Data Management Contractor
Description: Alumis Inc. is a precision medicines company with the mission to transform the lives of patients with autoimmune diseases. Even with treatment innovations of the last two decades, many patients with immunologic (more...)
Company: Alumis Inc.
Location: South San Francisco
Posted on: 03/26/2026

Salary in Davis, California Area | More details for Davis, California Jobs |Salary

Head of Asset in Gyn, Oncology Clinical Development
Description: Company Description About AbbVie AbbVie's mission is to discover and deliver innovative medicines and solutions that solve serious health issues today and address the medical challenges of tomorrow. We (more...)
Company: AbbVie
Location: San Francisco
Posted on: 03/27/2026

Supplier Operations Analyst
Description: Who We Are BioMarin is a global biotechnology company that relentlessly pursues bold science to translate genetic discoveries into new medicines that advance the future of human health. Since our founding (more...)
Company: BioMarin Pharmaceutical Inc.
Location: San Rafael
Posted on: 03/27/2026

Senior Scientist I/II - Analytical Development (CMC)
Description: OPPORTUNITY The Senior Scientist will be responsible for managing analytical development, guiding external partners e.g., CTLs, CDMOs , and standing up key methods in-house to support the characterization (more...)
Company: Mammoth Biosciences
Location: Brisbane
Posted on: 03/25/2026

Data Scientist 2
Description: About Technical Operations BioMarin s Technical Operations group is responsible for creating our drugs for use in clinical trials and for scaling production of those drugs for the commercial market. (more...)
Company: BioMarin Pharmaceutical Inc.
Location: Novato
Posted on: 03/26/2026

(CW) Senior Research Associate – Formulation & Analytical Support
Description: Who We Are BioMarin is a global biotechnology company that relentlessly pursues bold science to translate genetic discoveries into new medicines that advance the future of human health. Since our founding (more...)
Company: BioMarin Pharmaceutical Inc.
Location: San Rafael
Posted on: 03/27/2026

(C W) Senior Specialist - Laboratory Operation (Temporary / Contract)
Description: Who We Are BioMarin is a global biotechnology company that relentlessly pursues bold science to translate genetic discoveries into new medicines that advance the future of human health. Since our founding (more...)
Company: BioMarin Pharmaceutical Inc.
Location: Novato
Posted on: 03/27/2026

Scientific Associate Director - PKDM (Small Molecule ADME)
Description: Join Amgens Mission of Serving Patients At Amgen, if you feel like youre part of something bigger, its because you are. Our shared missionto serve patients living with serious illnessesdrives all that (more...)
Company: Amgen
Location: South San Francisco
Posted on: 03/27/2026

Loading more jobs...

Data Engineer, Scientific Data Ingestion

Didn't find what you're looking for? Search again!

Other Science, Research & Development Jobs

Log In or Create An Account