Just another “Clever Hans”? Neural networks and FDG PET-CT to predict the outcome of patients with breast cancer

This article was originally published here

Eur J Nucl Med Mol Imaging. 2021 Mar 5. doi: 10.1007/s00259-021-05270-x. Online ahead of print.

ABSTRACT

BACKGROUND: Manual quantification of the metabolic tumor volume (MTV) from whole-body 18F-FDG PET/CT is time consuming and therefore usually not applied in clinical routine. It has been shown that neural networks might assist nuclear medicine physicians in such quantification tasks. However, little is known if such neural networks have to be designed for a specific type of cancer or whether they can be applied to various cancers. Therefore, the aim of this study was to evaluate the accuracy of a neural network in a cancer that was not used for its training.

METHODS: Fifty consecutive breast cancer patients that underwent 18F-FDG PET/CT were included in this retrospective analysis. The PET-Assisted Reporting System (PARS) prototype that uses a neural network trained on lymphoma and lung cancer 18F-FDG PET/CT data had to detect pathological foci and determine their anatomical location. Consensus reads of two nuclear medicine physicians together with follow-up data served as diagnostic reference standard; 1072 18F-FDG avid foci were manually segmented. The accuracy of the neural network was evaluated with regard to lesion detection, anatomical position determination, and total tumor volume quantification.

RESULTS: If PERCIST measurable foci were regarded, the neural network displayed high per patient sensitivity and specificity in detecting suspicious 18F-FDG foci (92%; CI = 79-97% and 98%; CI = 94-99%). If all FDG-avid foci were regarded, the sensitivity degraded (39%; CI = 30-50%). The localization accuracy was high for body part (98%; CI = 95-99%), region (88%; CI = 84-90%), and subregion (79%; CI = 74-84%). There was a high correlation of AI derived and manually segmented MTV (R2 = 0.91; p < 0.001). AI-derived whole-body MTV (HR = 1.275; CI = 1.208-1.713; p < 0.001) was a significant prognosticator for overall survival. AI-derived lymph node MTV (HR = 1.190; CI = 1.022-1.384; p = 0.025) and liver MTV (HR = 1.149; CI = 1.001-1.318; p = 0.048) were predictive for overall survival in a multivariate analysis.

CONCLUSION: Although trained on lymphoma and lung cancer, PARS showed good accuracy in the detection of PERCIST measurable lesions. Therefore, the neural network seems not prone to the clever Hans effect. However, the network has poor accuracy if all manually segmented lesions were used as reference standard. Both the whole body and organ-wise MTV were significant prognosticators of overall survival in advanced breast cancer.

PMID:33674891 | DOI:10.1007/s00259-021-05270-x