Is there a role for artificial intelligence (AI) to help guide surgical options? According to a recent publication by Heller et al in Journal of Urology, there is potential for its implementation for it in the treatment of patients with kidney cancer.1
This study was based on the R.E.N.A.L. and PADUA nephrometry scores.2 The R.E.N.A.L. nephrometry scoring system, was developed in 2009 by researchers at Fox Chase Cancer Center in Philadelphia (3).3 R.E.N.A.L. is an acronym indicating that its scoring is derived from the tumor radius (R); tumor location, including whether the mass is exophytic or endophytic (E); proximity to the collecting system (N); whether anterior or posterior location (A); and location relative to the polar lines (L). It is used to assess surgical decision-making about whether the patient is a candidate for partial, instead of radical, nephrectomy. From its inception, R.E.N.A.L. has been found to be predictive of perioperative outcomes and oncologically prognostic in terms of tumor grade, stage, and patient survival.4 Thus its utility has implications for providing assistance to treating clinicians.5
The authors note that despite its potential, “widespread adoption outside of academic centers, has been modest.” Factors that may contribute to this lack of adoption include unreimbursed time, score ambiguity, and variability among observers. Machine learning, specifically deep learning, is a growing field. It has been applied to radiographic imaging and shown to be noninferior to human experts in several studies. Heller et al hypothesized that fully automated deep learning can be applied and achieve an accurate R.E.N.A.L. nephrometry score similar to human-generated nephrometry scores and have similar predictive abilities.
Assessing Patients: Human Versus AI
From 2010 to 2018, 300 consecutive patients with preoperative arterial phase imaging were identified from more than 70 external hospital systems. All surgeries were performed at a single institution. Exclusion criteria included not having arterial phase computed tomography (CT) preoperatively, nephrectomy for benign disease, or presence of a tumor thrombus.
Training for the deep learning was made by implementing a “ground truth” whereby semantic segmentation was accomplished by manually segmenting each individual voxel as kidney, kidney tumor, or background. This was done using expert input and Hounsfield unit (HU) thresholds to create the AI-generated model. The urinary collecting system and polar lines were obtained using HU thresholds to complete the R.E.N.A.L. score. The authors note that assessment of anterior versus posterior location was not performed in this study.
The median age of the patients included in the study was 60 years, and the median tumor size was 4.2 cm. A total of 92% of resected masses were malignant; 27% were high stage, 37% were high grade, and 24% had tumor necrosis present. In all, 74% of patients underwent laparoscopic/robotic surgery and 63% underwent a nephron-sparing surgery. All but 6 patients were able to be assessed by AI. Median R.E.N.A.L. scores were 8, with both the artificial intelligence and human scores. There was moderate significant correlation (0.6; P <0.001) between the AI and human total R.E.N.A.L. scores as well as between the individual components. However, although the agreement was substantial for the R component, the other individual factors only demonstrated fair agreement.
The AI and human scores similarly predicted oncologic outcomes, including the presence of malignancy, highstage disease (defined as >pT2), presence of high-grade disease, and presence of tumor necrosis. Moreover, AI was significantly associated with surgical approach, including pursuing minimally invasive and nephron sparing surgery. Perioperative outcomes, including estimated blood loss, perioperative blood transfusion requirements, and postoperative change in estimated glomerular filtration rate were all predicted by both AI and human assessment.
The authors reinforced that nephrometry scoring has the ability to “systematically extract clinical information from preoperative images” and has been used in several studies involving renal masses, although its uptake has been modest. There is potential for deep learning to optimize the R.E.N.A.L. nephrometry score by reducing the time needed to calculate the score while providing an unambiguous assessment and operable conclusions that may help guide patient care. This study has made several exciting contributions to the literature. The researchers were able to create an AI model using semantic segmentation that proved comparable to human-generated results. This points to the feasibility of creating, testing, and training an AI system. Limitations based on voxel ambiguity were overcome in this study by using HU thresholds.
Tumor radius had the highest concordance rates between the AI and human scores. A potential reason that the agreement rates for other components may only be the rule-based algorithm in the study, which may have contributed to divergence away from the true anatomy. Both AI and human scores were unable to predict complications or readmission rates. The authors write that, “the overseeing clinician is always encouraged to oversee such segmentations and decide if he/she agrees with the automated masks.”
The Future of AI Scoring
Although the study only included a single surgical site, imaging was collected from more than 70 external centers and AI was able to generate scores for 98% of patients. The researchers note that with continued experience, the number of patients with inconclusive AI scores will be diminished. They also encouraged the development of future challenges, especially in diverse geographic and demographic populations, to help continue to improve the validation and generalizability of the system.
Notwithstanding the limitations described above, automated AI scores had similar correlation with human scores as demonstrated by interobserver agreement between experts and offered advisory information on similar oncologic and nononcologic prognostic factors. Whereas obtaining manual R.E.N.A.L. scores can be time consuming, this automated approach can be easily implemented. AI had utility in providing information relevant to surgical decisionmaking (e.g., whether to pursue nephron-sparing surgery) and potential perioperative obstacles that will be useful to clinicians in their efforts to ensure the best outcomes for their patients.
In summary, the AI-generated nephrometry score in this study performed similarly to human scores and offered similar advisory information on important prognostic factors.
David Ambinder, MD is a urology resident at New York Medical College/Westchester Medical Center. His interests include surgical education, GU oncology and advancements in technology in urology. A significant portion of his research has been focused on litigation in urology.
- Heller N, Tejpaul R, Isensee F, et al. Computer-generated R.E.N.A.L. nephrometry scores yield comparable predictive results to those of human-expert scores in predicting oncologic and perioperative outcomes. J Urol. 2022;207(5):1105-1115. doi: 10.1097/ JU.0000000000002390
- Ficarra V, Novara G, Secco S, et al. Preoperative aspects and dimensions used for an anatomical (PADUA) classification of renal tumours in patients who are candidates for nephron-sparing surgery. Eur Urol. 2009;56(5):786-793. doi: 10.1016/j.eururo.2009.07.040
- Kutikov A, Uzzo RG. The R.E.N.A.L. nephrometry score: a comprehensive standardized system for quantitating renal tumor size, location, and depth. J Urol. 2009;182(3):844- 853. doi: 10.1016/j.juro.2009.05.035
- Kutikov A, Smaldone MC, Egleston BL, et al. Anatomic features of enhancing renal masses predict malignant and high-grade pathology: a preoperative nomogram using the RENAL nephrometry score. Eur Urol. 2011;60(2):241-248. doi: 10.1016/j.eururo.2011.03.029
- Joshi SS, Uzzo RG. Renal tumor anatomic complexity: clinical implications for urologists. Urol Clin North Am. 2017;44(2):179-187. doi: 10.1016/j.ucl.2016.12.004