OpenAI’s ChatGPT-4.0, an advanced iteration of its large language model (LLM), has notably scored 85% on a clinical neurology exam, showcasing the model’s potential for significant future roles in clinical settings.
This landmark achievement was revealed in a study by researchers from the University Hospital Heidelberg and the German Cancer Research Center in Heidelberg, published on Dec. 7. The test, administered on May 31, compared the performance of two versions of ChatGPT: the older 3.5 and the newer 4.0.
In the test, which used questions from the American Board of Psychiatry and Neurology and a selection from the European Board for Neurology, ChatGPT-4.0’s impressive 85% score, equating to 1,662 correct answers out of 1,956 questions, was a significant leap from the 66.8% (1,306 correct responses) achieved by ChatGPT-3.5. This score exceeded the average human score of 73.8%, with ChatGPT-4.0 particularly excelling in behavioral, cognitive, and psychological-related questions, surpassing the 70% passing threshold typically set by educational institutions.
The research highlighted a distinction in the models’ abilities, noting their stronger performance in ‘lower-order thinking’ questions compared to those requiring ‘higher-order thinking.’
The study’s authors see these results as encouraging for future applications of LLMs in clinical neurology, after appropriate refinements. They believe that LLMs can be effectively integrated into documentation and decision-support systems in neurology, although caution should be exercised due to limitations in high-order cognitive tasks. One of the study’s authors, Dr. Varun Venkataramani, conveyed:
In the broader context, AI is increasingly finding its place in healthcare, undertaking significant tasks such as aiding in cancer treatment research for AstraZeneca and addressing antibiotic overprescription in Hong Kong.