Precision information extraction for rare disease epidemiology at scale.

William Z Kariampuzha, Gioconda Alyea, Sue Qu, Jaleal Sanjak, Ewy Mathé, Eric Sid, Haley Chatelaine, Arjun Yadaw, Yanji Xu, Qian Zhu

Journal of Translational Medicine 2023 Februrary 29

BACKGROUND: The United Nations recently made a call to address the challenges of an estimated 300 million persons worldwide living with a rare disease through the collection, analysis, and dissemination of disaggregated data. Epidemiologic Information (EI) regarding prevalence and incidence data of rare diseases is sparse and current paradigms of identifying, extracting, and curating EI rely upon time-intensive, error-prone manual processes. With these limitations, a clear understanding of the variation in epidemiology and outcomes for rare disease patients is hampered. This challenges the public health of rare diseases patients through a lack of information necessary to prioritize research, policy decisions, therapeutic development, and health system allocations.

METHODS: In this study, we developed a newly curated epidemiology corpus for Named Entity Recognition (NER), a deep learning framework, and a novel rare disease epidemiologic information pipeline named EpiPipeline4RD consisting of a web interface and Restful API. For the corpus creation, we programmatically gathered a representative sample of rare disease epidemiologic abstracts, utilized weakly-supervised machine learning techniques to label the dataset, and manually validated the labeled dataset. For the deep learning framework development, we fine-tuned our dataset and adapted the BioBERT model for NER. We measured the performance of our BioBERT model for epidemiology entity recognition quantitatively with precision, recall, and F1 and qualitatively through a comparison with Orphanet. We demonstrated the ability for our pipeline to gather, identify, and extract epidemiology information from rare disease abstracts through three case studies.

RESULTS: We developed a deep learning model to extract EI with overall F1 scores of 0.817 and 0.878, evaluated at the entity-level and token-level respectively, and which achieved comparable qualitative results to Orphanet's collection paradigm. Additionally, case studies of the rare diseases Classic homocystinuria, GRACILE syndrome, Phenylketonuria demonstrated the adequate recall of abstracts with epidemiology information, high precision of epidemiology information extraction through our deep learning model, and the increased efficiency of EpiPipeline4RD compared to a manual curation paradigm.

CONCLUSIONS: EpiPipeline4RD demonstrated high performance of EI extraction from rare disease literature to augment manual curation processes. This automated information curation paradigm will not only effectively empower development of the NIH Genetic and Rare Diseases Information Center (GARD), but also support the public health of the rare disease community.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Consensus Statement on Vitamin D Status Assessment and Supplementation: Whys, Whens, and Hows.Andrea Giustina et al.Endocrine Reviews 2024 April 28

The Tricuspid Valve: A Review of Pathology, Imaging, and Current Treatment Options: A Scientific Statement From the American Heart Association.Laura J Davidson et al.Circulation 2024 April 26

British Society of Gastroenterology guidelines for the management of hepatocellular carcinoma in adults.Abid Suddle et al.Gut 2024 April 17

Interstitial Lung Disease: A Review.Toby M MaherJAMA 2024 April 23

Ventilator Waveforms May Give Clues to Expiratory Muscle Activity.Yi Chi, Huaiwu He, Yun LongAmerican Journal of Respiratory and Critical Care Medicine 2024 April 25

Systemic lupus erythematosus.Alberta Hoi et al.Lancet 2024 April 18

Acute Kidney Injury and Electrolyte Imbalances Caused by Dapagliflozin Short-Term Use.António Cabral Lopes et al.Pharmaceuticals 2024 March 27

Management of Type 2 Diabetes Mellitus With Noninsulin Pharmacotherapy.Elizabeth M Vaughan, Zuleica M Santiago-DelgadoAmerican Family Physician 2024 April

Colorectal polypectomy and endoscopic mucosal resection: European Society of Gastrointestinal Endoscopy (ESGE) Guideline - Update 2024.Monika Ferlitsch et al.Endoscopy 2024 April 27

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Precision information extraction for rare disease epidemiology at scale.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app