Harnessing Population Pedigree Data and Machine Learning Methods to Identify Patterns of Familial Bladder Cancer Risk

Background: Relatives of bladder cancer (BCa) patients have been shown to be at increased risk for kidney, lung, thyroid, and cervical cancer after correcting for smoking related behaviors that may concentrate in some families. We demonstrate a novel approach to simultaneously assess risks for multiple cancers to identify distinct multi-cancer configurations (multiple different cancer types that cluster in relatives) surrounding familial BCa patients.

Methods: This study takes advantage of a unique population-level data resource, the Utah Population Database (UPDB), containing vast genealogy and statewide cancer data. Familial risk is measured using Standardized Incidence Risk (SIR) ratios account for sex, age, birth-cohort, and person-years of the pedigree members.

Results: We identify 1,023 families with a significantly higher BCa rates than population controls (fBCa). Familial SIRs are then calculated across twenty-five cancer-types and a weighted Gower distance with K-medoids clustering is used to identify Familial Multi-Cancer Configurations (FMC). We find five FMCs, each exhibiting a different pattern of cancer aggregation. Of the 25 cancer types studied, kidney and prostate cancers were most commonly enriched in the familial BCa clusters. Laryngeal, lung, stomach, acute-lymphocytic leukemia, Hodgkin’s disease, soft tissue carcinoma, esophageal, breast, lung, uterine, thyroid, and melanoma cancers were the other cancer types with increased incidence in familial BCa families.

Conclusions: This study identified five familial BCa FMCs showing unique risk patterns for cancers of other organs, suggesting phenotypic heterogeneity familial BCa.