Joshua Cook
Data Scientist at Caltech CTME
Joshua Cook is an independent AI and machine-learning engineer specializing in structured extraction from unstructured text, with a focus on the unsolved problem of trusting these systems where no labeled ground truth exists. He has shipped production extraction pipelines processing tens of thousands of records, teaches applied AI at Caltech CTME, and writes on agentic AI and machine-learning foundations. This talk comes straight out of that work.
Talks
Data Con LA 2026
Extraction Trees: Reliable Structured Data from Messy Text
AI/ML & Data Science Advanced
Single-prompt LLM extraction works about 70% of the time and breaks on documents with more than one event. A layered extraction-tree pipeline, demonstrated on a biomedical benchmark with real ground truth.