Detecting Racial/Ethnic Health Disparities Using Deep Learning From Frontal Chest Radiography

J Am Coll Radiol. 2022 Jan;19(1 Pt B):184-191. doi: 10.1016/j.jacr.2021.09.010.


PURPOSE: The aim of this study was to assess racial/ethnic and socioeconomic disparities in the difference between atherosclerotic vascular disease prevalence measured by a multitask convolutional neural network (CNN) deep learning model using frontal chest radiographs (CXRs) and the prevalence reflected by administrative hierarchical condition category codes in two cohorts of patients with coronavirus disease 2019 (COVID-19).

METHODS: A CNN model, previously published, was trained to predict atherosclerotic disease from ambulatory frontal CXRs. The model was then validated on two cohorts of patients with COVID-19: 814 ambulatory patients from a suburban location (presenting from March 14, 2020, to October 24, 2020, the internal ambulatory cohort) and 485 hospitalized patients from an inner-city location (hospitalized from March 14, 2020, to August 12, 2020, the external hospitalized cohort). The CNN model predictions were validated against electronic health record administrative codes in both cohorts and assessed using the area under the receiver operating characteristic curve (AUC). The CXRs from the ambulatory cohort were also reviewed by two board-certified radiologists and compared with the CNN-predicted values for the same cohort to produce a receiver operating characteristic curve and the AUC. The atherosclerosis diagnosis discrepancy, Δvasc, referring to the difference between the predicted value and presence or absence of the vascular disease HCC categorical code, was calculated. Linear regression was performed to determine the association of Δvasc with the covariates of age, sex, race/ethnicity, language preference, and social deprivation index. Logistic regression was used to look for an association between the presence of any hierarchical condition category codes with Δvasc and other covariates.

RESULTS: The CNN prediction for vascular disease from frontal CXRs in the ambulatory cohort had an AUC of 0.85 (95% confidence interval, 0.82-0.89) and in the hospitalized cohort had an AUC of 0.69 (95% confidence interval, 0.64-0.75) against the electronic health record data. In the ambulatory cohort, the consensus radiologists’ reading had an AUC of 0.89 (95% confidence interval, 0.86-0.92) relative to the CNN. Multivariate linear regression of Δvasc in the ambulatory cohort demonstrated significant negative associations with non-English-language preference (β = -0.083, P < .05) and Black or Hispanic race/ethnicity (β = -0.048, P < .05) and positive associations with age (β = 0.005, P < .001) and sex (β = 0.044, P < .05). For the hospitalized cohort, age was also significant (β = 0.003, P < .01), as was social deprivation index (β = 0.002, P < .05). The Δvasc variable (odds ratio [OR], 0.34), Black or Hispanic race/ethnicity (OR, 1.58), non-English-language preference (OR, 1.74), and site (OR, 0.22) were independent predictors of having one or more hierarchical condition category codes (P < .01 for all) in the combined patient cohort.

CONCLUSIONS: A CNN model was predictive of aortic atherosclerosis in two cohorts (one ambulatory and one hospitalized) with COVID-19. The discrepancy between the CNN model and the administrative code, Δvasc, was associated with language preference in the ambulatory cohort; in the hospitalized cohort, this discrepancy was associated with social deprivation index. The absence of administrative code(s) was associated with Δvasc in the combined cohorts, suggesting that Δvasc is an independent predictor of health disparities. This may suggest that biomarkers extracted from routine imaging studies and compared with electronic health record data could play a role in enhancing value-based health care for traditionally underserved or disadvantaged patients for whom barriers to care exist.

PMID:35033309 | DOI:10.1016/j.jacr.2021.09.010