Assessment of Accuracy of an Artificial Intelligence Algorithm to Detect Melanoma in Images of Skin Lesions

Importance  A high proportion of suspicious pigmented skin lesions referred for investigation are benign. Techniques to improve the accuracy of melanoma diagnoses throughout the patient pathway are needed to reduce the pressure on secondary care and pathology services.

Objective  To determine the accuracy of an artificial intelligence algorithm in identifying melanoma in dermoscopic images of lesions taken with smartphone and digital single-lens reflex (DSLR) cameras.

Design, Setting, and Participants  This prospective, multicenter, single-arm, masked diagnostic trial took place in dermatology and plastic surgery clinics in 7 UK hospitals. Dermoscopic images of suspicious and control skin lesions from 514 patients with at least 1 suspicious pigmented skin lesion scheduled for biopsy were captured on 3 different cameras. Data were collected from January 2017 to July 2018. Clinicians and the Deep Ensemble for Recognition of Malignancy, a deterministic artificial intelligence algorithm trained to identify melanoma in dermoscopic images of pigmented skin lesions using deep learning techniques, assessed the likelihood of melanoma. Initial data analysis was conducted in September 2018; further analysis was conducted from February 2019 to August 2019.

Interventions  Clinician and algorithmic assessment of melanoma.

Main Outcomes and Measures  Area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity of the algorithmic and specialist assessment, determined using histopathology diagnosis as the criterion standard.

Results  The study population of 514 patients included 279 women (55.7%) and 484 white patients (96.8%), with a mean (SD) age of 52.1 (18.6) years. A total of 1550 images of skin lesions were included in the analysis (551 [35.6%] biopsied lesions; 999 [64.4%] control lesions); 286 images (18.6%) were used to train the algorithm, and a further 849 (54.8%) images were missing or unsuitable for analysis. Of the biopsied lesions that were assessed by the algorithm and specialists, 125 (22.7%) were diagnosed as melanoma. Of these, 77 (16.7%) were used for the primary analysis. The algorithm achieved an AUROC of 90.1% (95% CI, 86.3%-94.0%) for biopsied lesions and 95.8% (95% CI, 94.1%-97.6%) for all lesions using iPhone 6s images; an AUROC of 85.8% (95% CI, 81.0%-90.7%) for biopsied lesions and 93.8% (95% CI, 91.4%-96.2%) for all lesions using Galaxy S6 images; and an AUROC of 86.9% (95% CI, 80.8%-93.0%) for biopsied lesions and 91.8% (95% CI, 87.5%-96.1%) for all lesions using DSLR camera images. At 100% sensitivity, the algorithm achieved a specificity of 64.8% with iPhone 6s images. Specialists achieved an AUROC of 77.8% (95% CI, 72.5%-81.9%) and a specificity of 69.9%.

Conclusions and Relevance  In this study, the algorithm demonstrated an ability to identify melanoma from dermoscopic images of selected lesions with an accuracy similar to that of specialists.