Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.

Giovanni Maria Iannantuono, Dara Bracken-Clarke, Fatima Karzai, Hyoyoung Choo-Wosoba, James L Gulley, Charalampos S Floudas

medRxiv 2023 October 32

BACKGROUND: The capability of large language models (LLMs) to understand and generate human-readable text has prompted the investigation of their potential as educational and management tools for cancer patients and healthcare providers.

MATERIALS AND METHODS: We conducted a cross-sectional study aimed at evaluating the ability of ChatGPT-4, ChatGPT-3.5, and Google Bard to answer questions related to four domains of immuno-oncology (Mechanisms, Indications, Toxicities, and Prognosis). We generated 60 open-ended questions (15 for each section). Questions were manually submitted to LLMs, and responses were collected on June 30th, 2023. Two reviewers evaluated the answers independently.

RESULTS: ChatGPT-4 and ChatGPT-3.5 answered all questions, whereas Google Bard answered only 53.3% (p <0.0001). The number of questions with reproducible answers was higher for ChatGPT-4 (95%) and ChatGPT3.5 (88.3%) than for Google Bard (50%) (p <0.0001). In terms of accuracy, the number of answers deemed fully correct were 75.4%, 58.5%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (p = 0.03). Furthermore, the number of responses deemed highly relevant was 71.9%, 77.4%, and 43.8% for ChatGPT-4, ChatGPT-3.5, and Google Bard, respectively (p = 0.04). Regarding readability, the number of highly readable was higher for ChatGPT-4 and ChatGPT-3.5 (98.1%) and (100%) compared to Google Bard (87.5%) (p = 0.02).

CONCLUSION: ChatGPT-4 and ChatGPT-3.5 are potentially powerful tools in immuno-oncology, whereas Google Bard demonstrated relatively poorer performance. However, the risk of inaccuracy or incompleteness in the responses was evident in all three LLMs, highlighting the importance of expert-driven verification of the outputs returned by these technologies.

IMPLICATIONS FOR PRACTICE: Several studies have recently evaluated whether large language models may be feasible tools for providing educational and management information for cancer patients and healthcare providers. In this cross-sectional study, we assessed the ability of ChatGPT-4, ChatGPT-3.5, and Google Bard to answer questions related to immuno-oncology. ChatGPT-4 and ChatGPT-3.5 returned a higher proportion of responses, which were more accurate and comprehensive, than those returned by Google Bard, yielding highly reproducible and readable outputs. These data support ChatGPT-4 and ChatGPT-3.5 as powerful tools in providing information on immuno-oncology; however, accuracy remains a concern, with expert assessment of the output still indicated.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Renin-Angiotensin-Aldosterone System: From History to Practice of a Secular Topic.Sara H Ksiazek et al.International Journal of Molecular Sciences 2024 April 5

Albumin: a comprehensive review and practical guideline for clinical use.Farshad Abedi, Batool Zarei, Sepideh ElyasiEuropean Journal of Clinical Pharmacology 2024 April 13

Revascularization Strategy in Myocardial Infarction with Multivessel Disease.Alexander Jobs et al.Journal of Clinical Medicine 2024 March 27

Clinical practice guidelines on the management of status epilepticus in adults: A systematic review.Luca Vignatelli et al.Epilepsia 2024 April 13

Interstitial Lung Disease: A Review.Toby M MaherJAMA 2024 April 23

Detecting Abnormal Eye Movements in Patients with Neurodegenerative Diseases - Current Insights.Akila Sekar, Muriel T N Panouillères, Diego KaskiEye and Brain 2024

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app