Tuberculosis is a major cause of morbidity and mortality in the developing world. Drug resistance, which is predicted to rise in many countries worldwide, threatens tuberculosis treatment and control.
To identify features associated with treatment failure and to predict which patients are at highest risk of treatment failure.
On a multi-country dataset managed by the National Institute of Allergy and Infectious Diseases we applied various machine learning techniques to identify factors statistically associated with treatment failure and to predict treatment failure based on baseline demographic and clinical characteristics alone.
The complete-case analysis database consisted of 587 patients (68% males) with a median (p25-p75) age of 40 (30-51) years. Treatment failure occurred in approximately one fourth of the patients. The features most associated with treatment failure were patterns of drug sensitivity, imaging findings, findings in the microscopy Ziehl-Nielsen stain, education status, and employment status. The most predictive model was forward stepwise selection (AUC: 0.74), although most models performed at or above AUC 0.7. A sensitivity analysis using the 643 original patients filling the missing values with multiple imputation showed similar predictive features and generally increased predictive performance.
Machine learning can help to identify patients at higher risk of treatment failure. Closer monitoring of these patients may decrease treatment failure rates and prevent emergence of antibiotic resistance. The use of inexpensive basic demographic and clinical features makes this approach attractive in low and middle-income countries.