“Millions-to-One, Words-to-Terms” - Generative AI in Action for Rare Disease Diagnosis and Clinical Data Harmonization

AI/ML & Data Science Intermediate

We present two Generative AI (GenAI)-based frameworks and practices that address 2 precision medicine data challenges. "millions-to-one" challenge in genomics: filtering millions of variants from a patient down to a single causative variant with diagnosis report. "words-to-terms" challenge: transforming unstructured, jargon-laden clinical data into standardized, terms encoded in ontologies.

Lishuang Shen

Clinical Staff Scientist at Childen's Hospital Los Angeles

The promise of precision medicine is constrained by two major data interpretation bottlenecks. The first is the "millions-to-one" challenge in genomics: filtering millions of variants from a patient’s whole-genome sequencing (WGS) down to a single causative variant for diagnosis report. The second is the "words-to-terms" challenge: transforming unstructured, jargon-laden clinical data from literature and electronic health records into standardized, interoperable terms encoded in ontologies. We present two Generative AI (GenAI)-based novel frameworks and practices that address each of the challenges. Both systems integrate contextual information from source data with knowledge from curated medical databases, Large Language Model's (LLM) foundational knowledge, and real-time web data. We demonstrate the power of the framework in mining clinical sequencing data to molecular diagnosis interpretation report. In the first scenario, GenAI automated application of ACMG/AMP guidelines for patient variant interpretation. In the 2nd scenario, the framework maps mysterious clinical symptom jargons to phenotype ontologies. We rapidly processed 2,000 cases from literature to build a large-scale virtual cohort for Leigh Syndrome, a rare mitochondrial disorder.