Deep Neural Networks Improve Radiologists’ Performance in Breast Cancer Screening

We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting the presence of cancer in the breast, when tested on the screening population. We attribute the high accuracy to a few technical advances. (i) Our network’s novel two-stage architecture and training procedure, which allows us to use a high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. (ii) A custom ResNet-based network used as a building block of our model, whose balance of depth and width is optimized for high-resolution medical images. (iii) Pretraining the network on screening BI-RADS classification, a related task with more noisy labels. (iv) Combining multiple input views in an optimal way among a number of possible choices. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and show that our model is as accurate as experienced radiologists when presented with the same data. We also show that a hybrid model, averaging the probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To further understand our results, we conduct a thorough analysis of our network’s performance on different subpopulations of the screening population, the model’s design, training procedure, errors, and properties of its internal representations. Our best models are publicly available at