Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis.

Yazid K Ghanem, Armaun D Rouhi, Ammr Al-Houssan, Zena Saleh, Matthew C Moccia, Hansa Joshi, Kristoffel R Dumon, Young Hong, Francis Spitz, Amit R Joshi, Michael Kwiatt

Surgical Endoscopy 2024 March 6

INTRODUCTION: Generative artificial intelligence (AI) chatbots have recently been posited as potential sources of online medical information for patients making medical decisions. Existing online patient-oriented medical information has repeatedly been shown to be of variable quality and difficult readability. Therefore, we sought to evaluate the content and quality of AI-generated medical information on acute appendicitis.

METHODS: A modified DISCERN assessment tool, comprising 16 distinct criteria each scored on a 5-point Likert scale (score range 16-80), was used to assess AI-generated content. Readability was determined using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) scores. Four popular chatbots, ChatGPT-3.5 and ChatGPT-4, Bard, and Claude-2, were prompted to generate medical information about appendicitis. Three investigators independently scored the generated texts blinded to the identity of the AI platforms.

RESULTS: ChatGPT-3.5, ChatGPT-4, Bard, and Claude-2 had overall mean (SD) quality scores of 60.7 (1.2), 62.0 (1.0), 62.3 (1.2), and 51.3 (2.3), respectively, on a scale of 16-80. Inter-rater reliability was 0.81, 0.75, 0.81, and 0.72, respectively, indicating substantial agreement. Claude-2 demonstrated a significantly lower mean quality score compared to ChatGPT-4 (p = 0.001), ChatGPT-3.5 (p = 0.005), and Bard (p = 0.001). Bard was the only AI platform that listed verifiable sources, while Claude-2 provided fabricated sources. All chatbots except for Claude-2 advised readers to consult a physician if experiencing symptoms. Regarding readability, FKGL and FRE scores of ChatGPT-3.5, ChatGPT-4, Bard, and Claude-2 were 14.6 and 23.8, 11.9 and 33.9, 8.6 and 52.8, 11.0 and 36.6, respectively, indicating difficulty readability at a college reading skill level.

CONCLUSION: AI-generated medical information on appendicitis scored favorably upon quality assessment, but most either fabricated sources or did not provide any altogether. Additionally, overall readability far exceeded recommended levels for the public. Generative AI platforms demonstrate measured potential for patient education and engagement about appendicitis.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Revascularization Strategy in Myocardial Infarction with Multivessel Disease.Alexander Jobs et al.Journal of Clinical Medicine 2024 March 27

Intravenous infusion of dexmedetomidine during the surgery to prevent postoperative delirium and postoperative cognitive dysfunction undergoing non-cardiac surgery: a meta-analysis of randomized controlled trials.Di Wang et al.European Journal of Medical Research 2024 April 19

The Tricuspid Valve: A Review of Pathology, Imaging, and Current Treatment Options: A Scientific Statement From the American Heart Association.Laura J Davidson et al.Circulation 2024 April 26

Consensus Statement on Vitamin D Status Assessment and Supplementation: Whys, Whens, and Hows.Andrea Giustina et al.Endocrine Reviews 2024 April 28

Management of Diverticulitis: A Review.Olivia A Sacks, Jason HallJAMA Surgery 2024 April 18

Interstitial Lung Disease: A Review.Toby M MaherJAMA 2024 April 23

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app