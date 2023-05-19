The Messenger Animation Frame 1Frame 1 of The Messenger's logo animationThe Messenger Animation Frame 2Frame 2 of The Messenger's logo animationThe Messenger Animation Frame 3Frame 3 of The Messenger's logo animationThe Messenger Animation Frame 4Frame 4 of The Messenger's logo animationThe Messenger Animation Frame 5Frame 5 of The Messenger's logo animation

    ChatGPT Passes, Keeps Getting Better Scores On Radiology Board Exam

    The AI language model did well on questions requiring basic knowledge, but struggled with analytical questions

    Published |Updated
    Monique Merrill
    ChatGPT could be a radiologist, and it never even went to school.

    Two new research studies published Tuesday in the journal Radiology shine a light on the strengths and weaknesses of ChatGPT and other artificial intelligence language models. 

    ChatGPT-3.5 was given 150 multiple-choice questions based off of questions seen on the Canadian Royal College and American Board of Radiology exams.

    The language model got 69% of the questions correct and performed better on the questions that required “lower-order” thinking than the ones involving “higher-order” thinking. None of the questions had images, and they ranged from basic understanding to requiring analysis. A 70% correct response rate is needed to pass.

    Teera Konakan/Getty Images

    The version researchers tested, ChatGPT-3.5, is the most commonly used and available one. ChatGPT-4 was released in March and only to paid users. The new version is said to to have higher reasoning capabilities. 

    When researchers tested ChatGPT-4 with the same set of questions, the language model scored much higher than its predecessor, answering 81% of the questions correctly.

    “Our study demonstrates an impressive improvement in performance of ChatGPT in radiology over a short time period, highlighting the growing potential of large language models in this context,” lead author Dr. Rajesh Bhayana said in a statement

    While the second test saw improved scores, the language model’s confident responses even when it answered incorrectly are cause for concern, according to to Bhayana.

    “To me, this is its biggest limitation. At present, ChatGPT is best used to spark ideas, help start the medical writing process and in data summarization. If used for quick information recall, it always needs to be fact-checked,” Bhayana said in the statement.

