Trends

GPT-4 may help with diagnoses missed by clinicians

August 17, 2023

Artificial intelligence (AI), especially machine learning, has been increasingly used in diagnosing conditions such as skin or breast cancer and Alzheimer disease. However, AI relies on clinical imaging.1 In low-income countries, where specialist care may be lacking, AI may be useful for making clinical diagnoses. The GPT-4 (Generative Pre-trained Transformer 4) program allows analysis of clinical history in daily practice.2 We hypothesized that GPT-4 could improve the diagnostic accuracy of clinicians by supplying the most probable diagnosis or suggesting differential diagnoses in complex cases.

Methods
The medical histories of 6 patients from the Division of Geriatrics in the Department of Medicine at Queen Mary Hospital who were aged 65 years or older and had delay of definitive diagnosis longer than 1 month in 2022 were retrieved after resolution.3-5 The full medical histories were entered chronologically on April 16, 2023 (at admission, 1 week after admission, and before final diagnosis) into GPT-4 (powered by OpenAI via Platform for Open Exploration) without information about definitive diagnosis. The GPT-4 responses were copied out and further analyzed (eMethods in Supplement 1). One patient has been described previously.6 Responses by GPT-4 and clinicians were collected and compared. Differential diagnoses were also generated using a medical diagnostic decision support systemIsabel DDx Companion; Isabel Healthcare). The study was approved by the Institutional Review Board of the University of Hong Kong and Hospital Authority Hong Kong West Cluster. Written consent was provided for all patients. This report followed the reporting guideline for case series studies.

Results
Six patients 65 years or older (2 women and 4 men) were included in the analysis. The accuracy of the primary diagnoses made by GPT-4, clinicians, and Isabel DDx Companion was 4 of 6 patients (66.7%), 2 of 6 patients (33.3%), and 0 patients, respectively. If including differential diagnoses, the accuracy was 5 of 6 (83.3%) for GPT-4, 3 of 6 (50.0%) for clinicians, and 2 of 6 (33.3%) for Isabel DDx Companion (Table). By studying the changes in GPT-4’s responses, we determined that certain key words were required to make an appropriate clinical response, including abdominal aortic aneurysm (patient 1), proximal stiffness (patient 2), acid-fast bacilli in urine (patient 3), metronidazole (patient 4), and retroperitoneal lymphadenopathy (patient 6). GPT-4 could suggest diagnoses not considered by clinicians before definitive investigations: mycotic aneurysm for patient 1 after computed tomography showing an abdominal aortic aneurysm; a drug cause of seizure in patient 5; and the presence of necrotic lymph nodes from a previous computed tomographic scan, which should have led to the diagnosis of lymphoma, in patient 6.

Discussion
Overall, GPT-4 has potential clinical use in older patients without a definitive clinical diagnosis after 1 month but requires comprehensive entry of demographic and clinical (including radiological and pharmacological) information. GPT-4 may increase confidence in diagnosis and earlier commencement of appropriate treatment, alert clinicians missing important diagnoses, and offer suggestions similar to specialists to achieve the correct clinical diagnosis, which has potential value in low-income countries with lack of specialist care. Clinicians need to be aware that GPT-4 is limited in multifocal infection, and the suggested management plan should be correlated with clinical context, as suggestions may be redundant. Clinicians should consider a drug review and review the possible diagnosis of malignant disease if suggested.

This study has several limitations. First, GPT-4 may not detect 2 focuses of infection or pinpoint the source of recurrent infection. Second, GPT-4 did not suggest the use of gallium scan or 18-fluorodeoxyglucose positron emission tomography to look for infections or malignant neoplasms in all but 1 patient. Third, some investigations may not be appropriate (eg, temporal artery biopsy in the absence of typical symptoms of giant cell arteritis). Overall, our findings suggest that the use of AI in diagnosis is both promising and challenging. JAMA Network

Medical Buyer

GPT-4 may help with diagnoses missed by clinicians