Nat Commun. 2020 Oct 12;11(1):5131. doi: 10.1038/s41467-020-18918-3.
As artificial intelligence (AI) is increasingly applied to biomedical research and clinical decisions, developing unbiased AI models that work equally well for all ethnic groups is of crucial importance to health disparity prevention and reduction. However, the biomedical data inequality between different ethnic groups is set to generate new health care disparities through data-driven, algorithm-based biomedical research and clinical decisions. Using an extensive set of machine learning experiments on cancer omics data, we find that current prevalent schemes of multiethnic machine learning are prone to generating significant model performance disparities between ethnic groups. We show that these performance disparities are caused by data inequality and data distribution discrepancies between ethnic groups. We also find that transfer learning can improve machine learning model performance for data-disadvantaged ethnic groups, and thus provides an effective approach to reduce health care disparities arising from data inequality among ethnic groups.