Assessing the Triaging and Diagnostic Accuracy of ChatGPT in Neurosurgical Cases

Author ORCID Identifier

0000000184809584

Location

Caesars Windsor Convention Centre, Room: LUNA

Event Website

https://wesparkconference.com/

Start Date

22-3-2025 3:15 PM

End Date

22-3-2025 4:15 PM

Description

Background: ChatGPT’s role in healthcare is expanding, particularly in medical triage. It has passed the American Board of Neurological Surgery exam and shown promise in interpreting radiological reports and supporting adjuvant therapy decisions. In resource-limited settings, ChatGPT may serve as a valuable decision-support tool for neurosurgical triage and diagnosis. However, research on its effectiveness in general neurosurgical triage remains limited. This study aims to comprehensively assess ChatGPT’s triaging and diagnostic accuracy by comparing its output to expert neurosurgical opinions at Windsor Regional Hospital (WRH). Objectives: The primary objective is to assess ChatGPT’s diagnostic and triaging accuracy using a representative sample of neurological cases at WRH. The secondary objective is to identify factors influencing its performance, including case and patient characteristics, available tests, and hallucinations. Given that neurosurgical triage follows a standardized system with clear urgency ratings, we hypothesize that ChatGPT will perform comparably to surgeons in straightforward cases but may deviate more in complex cases with multiple comorbidities. Methods: We will present 50 anonymized clinical vignettes to ChatGPT, including history of presenting illness, past medical history, and family history. We will evaluate ChatGPT’s accuracy in differential diagnosis, triage urgency, red flag identification, test selection, and hallucinations. Statistical analysis will include Cohen’s kappa to assess agreement with neurosurgeons. Diagnostic accuracy will be measured using sensitivity, specificity, and accuracy. Chi-square tests will examine performance variations based on patient and case characteristics. Implications: Automated decision-support tools like ChatGPT may greatly enhance triage efficiency while reducing administrative burden, benefiting both patients and hospitals.

This document is currently not available here.

Share

COinS
 
Mar 22nd, 3:15 PM Mar 22nd, 4:15 PM

Assessing the Triaging and Diagnostic Accuracy of ChatGPT in Neurosurgical Cases

Caesars Windsor Convention Centre, Room: LUNA

Background: ChatGPT’s role in healthcare is expanding, particularly in medical triage. It has passed the American Board of Neurological Surgery exam and shown promise in interpreting radiological reports and supporting adjuvant therapy decisions. In resource-limited settings, ChatGPT may serve as a valuable decision-support tool for neurosurgical triage and diagnosis. However, research on its effectiveness in general neurosurgical triage remains limited. This study aims to comprehensively assess ChatGPT’s triaging and diagnostic accuracy by comparing its output to expert neurosurgical opinions at Windsor Regional Hospital (WRH). Objectives: The primary objective is to assess ChatGPT’s diagnostic and triaging accuracy using a representative sample of neurological cases at WRH. The secondary objective is to identify factors influencing its performance, including case and patient characteristics, available tests, and hallucinations. Given that neurosurgical triage follows a standardized system with clear urgency ratings, we hypothesize that ChatGPT will perform comparably to surgeons in straightforward cases but may deviate more in complex cases with multiple comorbidities. Methods: We will present 50 anonymized clinical vignettes to ChatGPT, including history of presenting illness, past medical history, and family history. We will evaluate ChatGPT’s accuracy in differential diagnosis, triage urgency, red flag identification, test selection, and hallucinations. Statistical analysis will include Cohen’s kappa to assess agreement with neurosurgeons. Diagnostic accuracy will be measured using sensitivity, specificity, and accuracy. Chi-square tests will examine performance variations based on patient and case characteristics. Implications: Automated decision-support tools like ChatGPT may greatly enhance triage efficiency while reducing administrative burden, benefiting both patients and hospitals.

https://scholar.uwindsor.ca/we-spark-conference/2025/oralpresentations/11