Joshua Cook

Joshua Cook

Data Scientist at Caltech CTME

Joshua Cook is an independent AI and machine-learning engineer specializing in structured extraction from unstructured text, with a focus on the unsolved problem of trusting these systems where no labeled ground truth exists. He has shipped production extraction pipelines processing tens of thousands of records, teaches applied AI at Caltech CTME, and writes on agentic AI and machine-learning foundations. This talk comes straight out of that work.

Talks

Data Con LA 2026

    Extraction Trees: Reliable Structured Data from Messy Text

    AI/ML & Data Science Advanced

    Single-prompt LLM extraction works about 70% of the time and breaks on documents with more than one event. A layered extraction-tree pipeline, demonstrated on a biomedical benchmark with real ground truth.