This article was originally published here
Comput Methods Programs Biomed. 2021 Apr 18;206:106111. doi: 10.1016/j.cmpb.2021.106111. Online ahead of print.
BACKGROUND AND OBJECTIVE: Lung cancer is the most common type of cancer with a high mortality rate. Early detection using medical imaging is critically important for the long-term survival of the patients. Computer-aided diagnosis (CAD) tools can potentially reduce the number of incorrect interpretations of medical image data by radiologists. Datasets with adequate sample size, annotation, and truth are the dominant factors in developing and training effective CAD algorithms. The objective of this study was to produce a practical approach and a tool for the creation of medical image datasets.
METHODS: The proposed model uses the modified maximum transverse diameter approach to mark a putative lung nodule. The modification involves the possibility to use a set of overlapping spheres of appropriate size to approximate the shape of the nodule. The algorithm embedded in the model also groups the marks made by different readers for the same lesion. We used the data of 536 randomly selected patients of Moscow outpatient clinics to create a dataset of standard-dose chest computed tomography (CT) scans utilizing the double-reading approach with arbitration. Six volunteer radiologists independently produced a report for each scan using the proposed model with the main focus on the detection of lesions with sizes ranging from 3 to 30 mm. After this, an arbitrator reviewed their marks and annotations.
RESULTS: The maximum transverse diameter approach outperformed the alternative methods (3D box, ellipsoid, and complete outline construction) in a study of 10,000 computer-generated tumor models of different shapes in terms of accuracy and speed of nodule shape approximation. The markup and annotation of the CTLungCa-500 dataset revealed 72 studies containing no lung nodules. The remaining 464 CT scans contained 3151 lesions marked by at least one radiologist: 56%, 14%, and 29% of the lesions were malignant, benign, and non-nodular, respectively. 2887 lesions have the target size of 3-30 mm. Only 70 nodules were uniformly identified by all the six readers. An increase in the number of independent readers providing CT scans interpretations led to an accuracy increase associated with a decrease in agreement. The dataset markup process took three working weeks.
CONCLUSIONS: The developed cluster model simplifies the collaborative and crowdsourced creation of image repositories and makes it time-efficient. Our proof-of-concept dataset provides a valuable source of annotated medical imaging data for training CAD algorithms aimed at early detection of lung nodules. The tool and the dataset are publicly available at https://github.com/Center-of-Diagnostics-and-Telemedicine/FAnTom.git and https://mosmed.ai/en/datasets/ct_lungcancer_500/, respectively.