Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study.

Torres-Zegarra, B.C.; Rios-Garcia, W.; Ñaña-Cordova, A.M.; Arteaga-Cisneros, K.F.; Benavente-Chalco, X.C.; Bustamante-Ordoñez, M.A.; Gutierrez-Rios, C.J.; Ramos-Godoy, C.A.; Teresa Panta Quezada, K.L.; Gutiérrez-Arratia, J.D.; Flores-Cohaila, J.A.

dc.contributor.author	Torres-Zegarra, B.C.	es_PE
dc.contributor.author	Rios-Garcia, W.	es_PE
dc.contributor.author	Ñaña-Cordova, A.M.	es_PE
dc.contributor.author	Arteaga-Cisneros, K.F.	es_PE
dc.contributor.author	Benavente-Chalco, X.C.	es_PE
dc.contributor.author	Bustamante-Ordoñez, M.A.	es_PE
dc.contributor.author	Gutierrez-Rios, C.J.	es_PE
dc.contributor.author	Ramos-Godoy, C.A.	es_PE
dc.contributor.author	Teresa Panta Quezada, K.L.	es_PE
dc.contributor.author	Gutiérrez-Arratia, J.D.	es_PE
dc.contributor.author	Flores-Cohaila, J.A.	es_PE
dc.date.accessioned	2026-03-11T17:32:14Z
dc.date.available	2026-03-11T17:32:14Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/20.500.14074/10222
dc.description.abstract	Purpose We aimed to describe the performance and evaluate the educational value of justifications provided by artificial intelligence chatbots, including GPT-3.5, GPT-4, Bard, Claude, and Bing, on the Peruvian National Medical Licensing Examination (P-NLME). Methods This was a cross-sectional analytical study. On July 25, 2023, each multiple-choice question (MCQ) from the P-NLME was entered into each chatbot (GPT-3, GPT-4, Bing, Bard, and Claude) 3 times. Then, 4 medical educators categorized the MCQs in terms of medical area, item type, and whether the MCQ required Peru-specific knowledge. They assessed the educational value of the justifications from the 2 top performers (GPT-4 and Bing). Results GPT-4 scored 86.7% and Bing scored 82.2%, followed by Bard and Claude, and the historical performance of Peruvian examinees was 55%. Among the factors associated with correct answers, only MCQs that required Peru-specific knowledge had lower odds (odds ratio, 0.23; 95% confidence interval, 0.09–0.61), whereas the remaining factors showed no associations. In assessing the educational value of justifications provided by GPT-4 and Bing, neither showed any significant differences in certainty, usefulness, or potential use in the classroom. Conclusion Among chatbots, GPT-4 and Bing were the top performers, with Bing performing better at Peru-specific MCQs. Moreover, the educational value of justifications provided by the GPT-4 and Bing could be deemed appropriate. However, it is essential to start addressing the educational value of these chatbots, rather than merely their performance on examinations.	es_PE
dc.description.sponsorship	Este trabajo fue financiado por UK Research and Innovation, UKRI, (105173).	es_PE
dc.format	application/pdf	es_PE
dc.language.iso	eng	es_PE
dc.publisher	Korea Health Personnel Licensing Examination Institute.	es_PE
dc.relation.ispartof	https://www.scopus.com/pages/publications/85177454993	es_PE
dc.relation.ispartof	urn:issn:19755937	es_PE
dc.relation.ispartof	J. Educ. Eval. Health Prof. 2023; 20: 30	es_PE
dc.rights	info:eu-repo/semantics/openAccess	es_PE
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	es_PE
dc.subject	Medical education	es_PE
dc.subject	Educational measurement	es_PE
dc.subject	Artificial intelligence	es_PE
dc.subject	Peru	es_PE
dc.title	Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical Examination: a cross-sectional study.	es_PE
dc.type	info:eu-repo/semantics/article	es_PE
dc.type.version	info:eu-repo/semantics/publishedVersion	es_PE
dc.subject.ocde	https://purl.org/pe-repo/ocde/ford#5.03.01	es_PE
dc.identifier.doi	https://doi.org/10.3352/jeehp.2023.20.30	es_PE

Ficheros en el ítem

Nombre:: jeehp-20-30.pdf
Tamaño:: 802.3Kb
Formato:: PDF

Ver/

Este ítem aparece en la(s) siguiente(s) colección(ones)

Artículos por fondos concursables externos [431]

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como info:eu-repo/semantics/openAccess