We have located links that may give you full text access.
Evaluation of a Generative Language Model Tool for Writing Examination Questions.
American Journal of Pharmaceutical Education 2024 March 12
OBJECTIVE: To describe an evaluation of a generative language model tool to write examination questions for a new elective course focused on the interpretation of common clinical laboratory results being developed as an elective for students in a Bachelor of Science in Pharmaceutical Sciences program.
METHODS: One hundred multiple choice questions were generated using a publicly available large language model for a course dealing with common laboratory values. Two independent evaluators with extensive training and experience in writing multiple choice questions evaluated each question for appropriate formatting, clarity, correctness, relevancy, and difficulty. For each question, a final dichotomous judgement was assigned by each reviewer, useable as written or not usable written.
RESULTS: The major finding of this study was that a generative language model (ChatGPT 3.5) could generate multiple choice questions for assessing common laboratory value information but only about half the questions (50% and 57% for the two evaluators, P=0.321) were deemed usable without modification. General agreement between evaluator comments was common (62% of comments) with more than one correct answer being the most common reason for commenting on the lack of usability (n=27).
CONCLUSION: The generally positive findings of this study suggest that the use of a generative language model tool for developing examination questions is deserving of further investigation.
METHODS: One hundred multiple choice questions were generated using a publicly available large language model for a course dealing with common laboratory values. Two independent evaluators with extensive training and experience in writing multiple choice questions evaluated each question for appropriate formatting, clarity, correctness, relevancy, and difficulty. For each question, a final dichotomous judgement was assigned by each reviewer, useable as written or not usable written.
RESULTS: The major finding of this study was that a generative language model (ChatGPT 3.5) could generate multiple choice questions for assessing common laboratory value information but only about half the questions (50% and 57% for the two evaluators, P=0.321) were deemed usable without modification. General agreement between evaluator comments was common (62% of comments) with more than one correct answer being the most common reason for commenting on the lack of usability (n=27).
CONCLUSION: The generally positive findings of this study suggest that the use of a generative language model tool for developing examination questions is deserving of further investigation.
Full text links
Related Resources
Get seemless 1-tap access through your institution/university
For the best experience, use the Read mobile app
All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.
By using this service, you agree to our terms of use and privacy policy.
Your Privacy Choices
You can now claim free CME credits for this literature searchClaim now
Get seemless 1-tap access through your institution/university
For the best experience, use the Read mobile app