A CNN-based unified framework utilizing projection loss in unison with label noise handling for multiple Myeloma cancer diagnosis

This article was originally published here

Med Image Anal. 2021 May 13;72:102099. doi: 10.1016/j.media.2021.102099. Online ahead of print.


Multiple Myeloma (MM) is a malignancy of plasma cells. Similar to other forms of cancer, it demands prompt diagnosis for reducing the risk of mortality. The conventional diagnostic tools are resource-intense and hence, these solutions are not easily scalable for extending their reach to the masses. Advancements in deep learning have led to rapid developments in affordable, resource optimized, easily deployable computer-assisted solutions. This work proposes a unified framework for MM diagnosis using microscopic blood cell imaging data that addresses the key challenges of inter-class visual similarity of healthy versus cancer cells and that of the label noise of the dataset. To extract class distinctive features, we propose projection loss to maximize the projection of a sample’s activation on the respective class vector besides imposing orthogonality constraints on the class vectors. This projection loss is used along with the cross-entropy loss to design a dual branch architecture that helps achieve improved performance and provides scope for targeting the label noise problem. Based on this architecture, two methodologies have been proposed to correct the noisy labels. A coupling classifier has also been proposed to resolve the conflicts in the dual-branch architecture’s predictions. We have utilized a large dataset of 72 subjects (26 healthy and 46 MM cancer) containing a total of 74996 images (including 34555 training cell images and 40441 test cell images). This is so far the most extensive dataset on Multiple Myeloma cancer ever reported in the literature. An ablation study has also been carried out. The proposed architecture performs best with a balanced accuracy of 94.17% on binary cell classification of healthy versus cancer in the comparative performance with ten state-of-the-art architectures. Extensive experiments on two additional publicly available datasets of two different modalities have also been utilized for analyzing the label noise handling capability of the proposed methodology. The code will be available under https://github.com/shivgahlout/CAD-MM.

PMID:34098240 | DOI:10.1016/j.media.2021.102099