Development and Internal Validation of Supervised Machine Learning Algorithms for Predicting Clinically Significant Functional Improvement in a Mixed Population of Primary Hip Arthroscopy Patients

Purpose: To (1) develop and validate a machine learning algorithm to predict clinically significant functional improvements after hip arthroscopy for femoroacetabular impingement syndrome (FAIS) and to (2) develop a digital application capable of providing patients with individual risk profiles to determine their propensity to gain clinically significant improvements in function.

Methods: A retrospective review of consecutive hip arthroscopy patients that underwent cam/pincer correction, labral preservation and capsular closure between January 2012-2017 from one large academic and three community hospitals operated on by a single high-volume hip arthroscopist was performed. The primary outcome was the minimal clinically important difference (MCID) for the hip outcome score (HOS) – activities of daily living (ADL) at two-years postoperatively, which was calculated using a distribution-based method. A total of 21 demographic, radiographic, and patient reported outcome measures were considered as potential covariates. An 80:20 random split was used to create training and testing sets from the patient cohort. Five supervised machine learning algorithms were developed using three iterations of ten-fold cross-validation on the training set and assessed by discrimination, calibration, Brier score, and decision curve analysis on an independent testing set of patients.

Results: A total of 818 patients with a median (interquartile range) age of 32.0 (22.0 – 42.0) and 69.2% female were included, of which 74.3% achieved the MCID for the HOS-ADL. The best performing algorithm was the stochastic gradient boosting model (c-statistic=0.84, calibration intercept=0.20, calibration slope=0.83, and Brier score=0.13). Of the initial 21 candidate variables, the eight most important features for predicting the MCID for the HOS-ADL included in model training were body mass index, age, preoperative HOS-ADL score, preoperative pain level, sex, Tönnis grade, symptom duration, and drug allergies. The algorithm was subsequently transformed into a digital application using local explanations to provide customized risk assessment: CONCLUSIONS: The stochastic boosting gradient model conferred excellent predictive ability for propensity to gain clinically significant improvements in function after hip arthroscopy. An open access digital application was created, which may augment shared decision-making and allow for preoperative risk stratification. External validation of this model is warranted to confirm the performance of these algorithms as the generalizability is currently unknown.

Level of evidence: IV, Case series.

Keywords: Predictive analytics; artificial intelligence; femoroacetabular impingement; function; hip arthroscopy.