Combining multiple large language models improves diagnostic accuracy.
Collective intelligence (e.g., collaboration of multiple providers to come to a final diagnosis) has been shown to produce a more accurate diagnosis than even the group’s most senior member. This study applied methods of collective intelligence to four large language models (LLM). The collective diagnosis was more accurate than individual LLMs, even when the highest performing LLM was removed. The authors suggest aggregating diagnoses from multiple LLMs may increase clinician trust in the response and mitigate reliance on a sole LLM or vendor.