Development and Validation of a Deep Learning System to Detect Glaucomatous Optic Neuropathy Using Fundus Photographs

Importance  A deep learning system (DLS) that could automatically detect glaucomatous optic neuropathy (GON) with high sensitivity and specificity could expedite screening for GON.

Objective  To establish a DLS for detection of GON using retinal fundus images and glaucoma diagnosis with convoluted neural networks (GD-CNN) that has the ability to be generalized across populations.

Design, Setting, and Participants  In this cross-sectional study, a DLS for the classification of GON was developed for automated classification of GON using retinal fundus images obtained from the Chinese Glaucoma Study Alliance, the Handan Eye Study, and online databases. The researchers selected 241 032 images were selected as the training dataset. The images were entered into the databases on June 9, 2009, obtained on July 11, 2018, and analyses were performed on December 15, 2018. The generalization of the DLS was tested in several validation datasets, which allowed assessment of the DLS in a clinical setting without exclusions, testing against variable image quality based on fundus photographs obtained from websites, evaluation in a population-based study that reflects a natural distribution of patients with glaucoma within the cohort and an additive dataset that has a diverse ethnic distribution. An online learning system was established to transfer the trained and validated DLS to generalize the results with fundus images from new sources. To better understand the DLS decision-making process, a prediction visualization test was performed that identified regions of the fundus images utilized by the DLS for diagnosis.

Exposures  Use of a deep learning system.

Main Outcomes and Measures  Area under the receiver operating characteristics curve (AUC), sensitivity and specificity for DLS with reference to professional graders.

Results  From a total of 274 413 fundus images initially obtained from CGSA, 269 601 images passed initial image quality review and were graded for GON. A total of 241 032 images (definite GON 29 865 [12.4%], probable GON 11 046 [4.6%], unlikely GON 200 121 [83%]) from 68 013 patients were selected using random sampling to train the GD-CNN model. Validation and evaluation of the GD-CNN model was assessed using the remaining 28 569 images from CGSA. The AUC of the GD-CNN model in primary local validation datasets was 0.996 (95% CI, 0.995-0.998), with sensitivity of 96.2% and specificity of 97.7%. The most common reason for both false-negative and false-positive grading by GD-CNN (51 of 119 [46.3%] and 191 of 588 [32.3%]) and manual grading (50 of 113 [44.2%] and 183 of 538 [34.0%]) was pathologic or high myopia.

Conclusions and Relevance  Application of GD-CNN to fundus images from different settings and varying image quality demonstrated a high sensitivity, specificity, and generalizability for detecting GON. These findings suggest that automated DLS could enhance current screening programs in a cost-effective and time-efficient manner.