Sentiment Analysis Machine Learning Model Congruence: A Case Study Using Neuroscience Module Evaluations.

Jeffrey Plochocki, Jonathan Kibble

FASEB Journal : Official Publication of the Federation of American Societies for Experimental Biology 2022 May

INTRODUCTION: Medical school faculty and administrators regularly assess sentiment in student-generated textual data, such as instructor and course evaluations. Machine learning models (MLMs) that automate and systematize sentiment analysis are commercially available. However, the congruency of MLMs has not been extensively tested. We compare sentiment polarity derived from human analysis and five MLMs to test the hypothesis they yield significantly correlated output.

METHODS: Student evaluations (n=116) of the neuroscience module at the UCF College of Medicine were collected and anonymized. Students were asked to evaluate the strengths of the module (n=108) and provide suggestions for improvement (n=102). Responses were subjected to sentiment analysis by five commercially available MLMs and two module faculty reviewers. Sentiment was classified as either positive (1), neutral (0), or negative (-1).

RESULTS: Sentiment polarity as assessed by the reviewers was significantly correlated (r=0.66, p<0.05). Reviewer assessments were congruent for 73.8% (n=155) of responses. Congruence was greatest for responses on strengths of the module (92.6%, n=100) compared to suggestions for improvement (54.4%, n=103). Congruency among the MLMs was 38.1% (n=80) for all responses, 60.1% (n=80) for module strengths (n=65), and 14.7% (n=15) for suggestions for improvement. Correlation matrix showed moderate correlations among the reviewers and MLMs (range of r=0.41-0.62, p<0.05). Congruence among all reviewers and MLMs occurred for 34.3% (n=72) responses, with maximal incongruence occurring for only 2.4% (n=5) of responses. Again, congruence was greatest for responses on module strengths (58.3%, n=63) compared to suggestions for improvement (8.8%, n=9). All methods assessed the responses on strength of the module to be more positive than suggestions for improvement. With all methods combined, responses on strengths of the module scored an average of 0.79 compared to suggestions for improvement, which scored -0.34.

CONCLUSION: Sentiment polarity derived from human analysis and MLMs is significantly correlated, although the coefficients reflect only modest linear relationships. MLMs are less congruent than the human observers. All methods demonstrate greater congruence when assessing responses as more positive (i.e., module strengths) than negative (i.e., suggestions for improvement). Additional refinement of MLMs may be necessary before for they can be applied with consistency in medical education settings.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app