Assessing the Accuracy and Reliability of AI-Generated Responses to Patient Questions Regarding Spine Surgery.

Viknesh S Kasthuri, Jacob Glueck, Han Pham, Mohammad Daher, Mariah Balmaceno-Criss, Christopher L McDonald, Bassel G Diebo, Alan H Daniels

Journal of Bone and Joint Surgery. American Volume 2024 Februrary 10

BACKGROUND: In today's digital age, patients increasingly rely on online search engines for medical information. The integration of large language models such as GPT-4 into search engines such as Bing raises concerns over the potential transmission of misinformation when patients search for information online regarding spine surgery.

METHODS: SearchResponse.io, a database that archives People Also Ask (PAA) data from Google, was utilized to determine the most popular patient questions regarding 4 specific spine surgery topics: anterior cervical discectomy and fusion, lumbar fusion, laminectomy, and spinal deformity. Bing's responses to these questions, along with the cited sources, were recorded for analysis. Two fellowship-trained spine surgeons assessed the accuracy of the answers on a 6-point scale and the completeness of the answers on a 3-point scale. Inaccurate answers were re-queried 2 weeks later. Cited sources were categorized and evaluated against Journal of the American Medical Association (JAMA) benchmark criteria. Interrater reliability was measured with use of the kappa statistic. A linear regression analysis was utilized to explore the relationship between answer accuracy and the type of source, number of sources, and mean JAMA benchmark score.

RESULTS: Bing's responses to 71 PAA questions were analyzed. The average completeness score was 2.03 (standard deviation [SD], 0.36), and the average accuracy score was 4.49 (SD, 1.10). Among the question topics, spinal deformity had the lowest mean completeness score. Re-querying the questions that initially had answers with low accuracy scores resulted in responses with improved accuracy. Among the cited sources, commercial sources were the most prevalent. The JAMA benchmark score across all sources averaged 2.63. Government sources had the highest mean benchmark score (3.30), whereas social media had the lowest (1.75).

CONCLUSIONS: Bing's answers were generally accurate and adequately complete, with incorrect responses rectified upon re-querying. The plurality of information was sourced from commercial websites. The type of source, number of sources, and mean JAMA benchmark score were not significantly correlated with answer accuracy. These findings underscore the importance of ongoing evaluation and improvement of large language models to ensure reliable and informative results for patients seeking information regarding spine surgery online amid the integration of these models in the search experience.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Review article: Recent advances in ascites and acute kidney injury management in cirrhosis.Danielle Adebayo, Florence WongAlimentary Pharmacology & Therapeutics 2024 March 26

Haemodynamic monitoring during noncardiac surgery: past, present, and future.Karim Kouz et al.Journal of Clinical Monitoring and Computing 2024 April 31

2024 AHA/ACC/AMSSM/HRS/PACES/SCMR Guideline for the Management of Hypertrophic Cardiomyopathy: A Report of the American Heart Association/American College of Cardiology Joint Committee on Clinical Practice Guidelines.Steve R Ommen et al.Circulation 2024 May 9

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Assessing the Accuracy and Reliability of AI-Generated Responses to Patient Questions Regarding Spine Surgery.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app