Background: The Cox proportional hazards model with neural networks is widely used to accurately predict survival outcome for choosing cancer treatment strategies. Although this method has shown outstanding performance in many tasks, it has encountered challenges when dealing with high-dimensional datasets. In this study, we point out that the Cox network has estimation bias in processing such datasets with a large number of censored samples. The estimation bias is composed of censored estimation bias and variance estimation bias, which limit the prediction performance of the model. In order to correct this bias, this paper proposes the Deep Bayesian Perturbation Cox Network (DBP), which introduces Bayesian prior knowledge about censored samples to optimize the training process of the neural network. Specifically, the model uses a sampling module called Bayesian Perturbation to approximate the prior knowledge, which can be used as a component for other Cox-based neural networks.
Results: The comparison between DBP and the previous model in different kinds of genomic datasets demonstrates that our model has made significant improvements over previous state-of-the-art methods. In addition, the simulation experiments are performed to illustrate how the DBP method addresses the bias caused by Cox Network. In the case study, based on the predicted risks in BRCA data from TCGA, we identify 400 differential expressed genes and 20 KEGG pathways that are associated with breast cancer prognosis, among which 65% of the top 20 genes have been proved by literature review.
Conclusion: Overall, these results demonstrate that our proposed method is advanced and robust in datasets with a large proportion of censored samples. Besides, it can guide to discover disease-related genes.