Diagnostic accuracy of a large language model in pediatric case studies.
Clinicians and the public are increasingly interested in using chatbots like ChatGPT to learn more about their care, particularly for diagnoses. This study asked ChatGPT to provide a differential diagnosis list and final diagnosis for 100 pediatric case studies. ChatGPT had an overall error rate of 83%. Among incorrect diagnoses, many were clinically related to the final diagnosis, but too broad to be classified as correct, and just over half were of the same organ system. Despite the error rate, authors still thought that large language models (LLMs) could be helpful to clinicians as a tool, and recommend that teaching chatbots may improve diagnostic accuracy.