Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study

Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms. We aimed to develop and validate a set of deep learning algorithms for automated detection of the following key findings from these scans: intracranial hemorrhage and its types (ie, intraparenchymal, intraventricular, subdural, extradural, and subarachnoid); calvarial fractures; midline shift; and mass effect. 

We retrospectively collected a dataset containing 313,318 head CT scans together with their clinical reports from around 20 centers in India between Jan 1, 2011, and June 1, 2017. A randomly selected part of this dataset (Qure25k dataset) was used for validation and the rest was used to develop algorithms. An additional validation dataset (CQ500 dataset) was collected in two batches from centers that were different from those used for the development and Qure25k datasets. We excluded postoperative scans and scans of patients younger than 7 years. The original clinical radiology report and consensus of three independent radiologists were considered as gold standard for the Qure25k and CQ500 datasets, respectively. Areas under the receiver operating characteristic curves (AUCs) were primarily used to assess the algorithms. 

The Qure25k dataset contained 21,095 scans (mean age 43 years; 9,030 [43%] female patients), and the CQ500 dataset consisted of 214 scans in the first batch (mean age 43 years; 94 [44%] female patients) and 277 scans in the second batch (mean age 52 years; 84 [30%] female patients). On the Qure25k dataset, the algorithms achieved an AUC of 0·92 (95% CI 0·91-0·93) for detecting intracranial hemorrhage (0·90 [0·89-0·91] for intraparenchymal, 0·96 [0·94-0·97] for intraventricular, 0·92 [0·90-0·93] for subdural, 0·93 [0·91-0·95] for extradural, and 0·90 [0·89-0·92] for subarachnoid). On the CQ500 dataset, AUC was 0·94 (0·92-0·97) for intracranial hemorrhage (0·95 [0·93-0·98], 0·93 [0·87-1·00], 0·95 [0·91-0·99], 0·97 [0·91-1·00], and 0·96 [0·92-0·99], respectively). AUCs on the Qure25k dataset were 0·92 (0·91-0·94) for calvarial fractures, 0·93 (0·91-0·94) for midline shift, and 0·86 (0·85-0·87) for mass effect, while AUCs on the CQ500 dataset were 0·96 (0·92-1·00), 0·97 (0·94-1·00), and 0·92 (0·89-0·95), respectively. 

Our results show that deep learning algorithms can accurately identify head CT scan abnormalities requiring urgent attention, opening up the possibility to use these algorithms to automate the triage process. 

https://www.ncbi.nlm.nih.gov/pubmed/30318264