Mostrar el registro sencillo del ítem
Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study.
| dc.contributor.author | Torres-Zegarra, B.C. | es_PE |
| dc.contributor.author | Rios-Garcia, W. | es_PE |
| dc.contributor.author | Ñaña-Cordova, A.M. | es_PE |
| dc.contributor.author | Arteaga-Cisneros, K.F. | es_PE |
| dc.contributor.author | Benavente-Chalco, X.C. | es_PE |
| dc.contributor.author | Bustamante-Ordoñez, M.A. | es_PE |
| dc.contributor.author | Gutierrez-Rios, C.J. | es_PE |
| dc.contributor.author | Ramos-Godoy, C.A. | es_PE |
| dc.contributor.author | Teresa Panta Quezada, K.L. | es_PE |
| dc.contributor.author | Gutiérrez-Arratia, J.D. | es_PE |
| dc.contributor.author | Flores-Cohaila, J.A. | es_PE |
| dc.date.accessioned | 2026-03-11T17:32:14Z | |
| dc.date.available | 2026-03-11T17:32:14Z | |
| dc.date.issued | 2023 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.14074/10222 | |
| dc.description.abstract | Purpose We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME). Methods This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing). Results GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom. Conclusion Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations. | es_PE |
| dc.description.sponsorship | Este trabajo fue financiado por UK Research and Innovation, UKRI, (105173). | es_PE |
| dc.format | application/pdf | es_PE |
| dc.language.iso | eng | es_PE |
| dc.publisher | Korea Health Personnel Licensing Examination Institute. | es_PE |
| dc.relation.ispartof | https://www.scopus.com/pages/publications/85177454993 | es_PE |
| dc.relation.ispartof | urn:issn:19755937 | es_PE |
| dc.relation.ispartof | J. Educ. Eval. Health Prof. 2023; 20: 30 | es_PE |
| dc.rights | info:eu-repo/semantics/openAccess | es_PE |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | es_PE |
| dc.subject | Medical education | es_PE |
| dc.subject | Educational measurement | es_PE |
| dc.subject | Artificial intelligence | es_PE |
| dc.subject | Peru | es_PE |
| dc.title | Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study. | es_PE |
| dc.type | info:eu-repo/semantics/article | es_PE |
| dc.type.version | info:eu-repo/semantics/publishedVersion | es_PE |
| dc.subject.ocde | https://purl.org/pe-repo/ocde/ford#5.03.01 | es_PE |
| dc.identifier.doi | https://doi.org/10.3352/jeehp.2023.20.30 | es_PE |







