Background: Chronic lymphocytic leukaemia (CLL) is the most common cancer of the lymphatic system in Western countries. Several clinical and biological factors for CLL have been identified. However, it remains unclear which of the available prognostic models combining those factors can be used in clinical practice to predict long-term outcome in people newly-diagnosed with CLL.
Objectives: To identify, describe and appraise all prognostic models developed to predict overall survival (OS), progression-free survival (PFS) or treatment-free survival (TFS) in newly-diagnosed (previously untreated) adults with CLL, and meta-analyse their predictive performances.
Search methods: We searched MEDLINE (from January 1950 to June 2019 via Ovid), Embase (from 1974 to June 2019) and registries of ongoing trials (to 5 March 2020) for development and validation studies of prognostic models for untreated adults with CLL. In addition, we screened the reference lists and citation indices of included studies.
Selection criteria: We included all prognostic models developed for CLL which predict OS, PFS, or TFS, provided they combined prognostic factors known before treatment initiation, and any studies that tested the performance of these models in individuals other than the ones included in model development (i.e. ‘external model validation studies’). We included studies of adults with confirmed B-cell CLL who had not received treatment prior to the start of the study. We did not restrict the search based on study design.
Data collection and analysis: We developed a data extraction form to collect information based on the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS). Independent pairs of review authors screened references, extracted data and assessed risk of bias according to the Prediction model Risk Of Bias ASsessment Tool (PROBAST). For models that were externally validated at least three times, we aimed to perform a quantitative meta-analysis of their predictive performance, notably their calibration (proportion of people predicted to experience the outcome who do so) and discrimination (ability to differentiate between people with and without the event) using a random-effects model. When a model categorised individuals into risk categories, we pooled outcome frequencies per risk group (low, intermediate, high and very high). We did not apply GRADE as guidance is not yet available for reviews of prognostic models.
Main results: From 52 eligible studies, we identified 12 externally validated models: six were developed for OS, one for PFS and five for TFS. In general, reporting of the studies was poor, especially predictive performance measures for calibration and discrimination; but also basic information, such as eligibility criteria and the recruitment period of participants was often missing. We rated almost all studies at high or unclear risk of bias according to PROBAST. Overall, the applicability of the models and their validation studies was low or unclear; the most common reasons were inappropriate handling of missing data and serious reporting deficiencies concerning eligibility criteria, recruitment period, observation time and prediction performance measures. We report the results for three models predicting OS, which had available data from more than three external validation studies: CLL International Prognostic Index (CLL-IPI) This score includes five prognostic factors: age, clinical stage, IgHV mutational status, B2-microglobulin and TP53 status. Calibration: for the low-, intermediate- and high-risk groups, the pooled five-year survival per risk group from validation studies corresponded to the frequencies observed in the model development study. In the very high-risk group, predicted survival from CLL-IPI was lower than observed from external validation studies. Discrimination: the pooled c-statistic of seven external validation studies (3307 participants, 917 events) was 0.72 (95% confidence interval (CI) 0.67 to 0.77). The 95% prediction interval (PI) of this model for the c-statistic, which describes the expected interval for the model’s discriminative ability in a new external validation study, ranged from 0.59 to 0.83. Barcelona-Brno score Aimed at simplifying the CLL-IPI, this score includes three prognostic factors: IgHV mutational status, del(17p) and del(11q). Calibration: for the low- and intermediate-risk group, the pooled survival per risk group corresponded to the frequencies observed in the model development study, although the score seems to overestimate survival for the high-risk group. Discrimination: the pooled c-statistic of four external validation studies (1755 participants, 416 events) was 0.64 (95% CI 0.60 to 0.67); 95% PI 0.59 to 0.68. MDACC 2007 index score The authors presented two versions of this model including six prognostic factors to predict OS: age, B2-microglobulin, absolute lymphocyte count, gender, clinical stage and number of nodal groups. Only one validation study was available for the more comprehensive version of the model, a formula with a nomogram, while seven studies (5127 participants, 994 events) validated the simplified version of the model, the index score. Calibration: for the low- and intermediate-risk groups, the pooled survival per risk group corresponded to the frequencies observed in the model development study, although the score seems to overestimate survival for the high-risk group. Discrimination: the pooled c-statistic of the seven external validation studies for the index score was 0.65 (95% CI 0.60 to 0.70); 95% PI 0.51 to 0.77.
Authors’ conclusions: Despite the large number of published studies of prognostic models for OS, PFS or TFS for newly-diagnosed, untreated adults with CLL, only a minority of these (N = 12) have been externally validated for their respective primary outcome. Three models have undergone sufficient external validation to enable meta-analysis of the model’s ability to predict survival outcomes. Lack of reporting prevented us from summarising calibration as recommended. Of the three models, the CLL-IPI shows the best discrimination, despite overestimation. However, performance of the models may change for individuals with CLL who receive improved treatment options, as the models included in this review were tested mostly on retrospective cohorts receiving a traditional treatment regimen. In conclusion, this review shows a clear need to improve the conducting and reporting of both prognostic model development and external validation studies. For prognostic models to be used as tools in clinical practice, the development of the models (and their subsequent validation studies) should adapt to include the latest therapy options to accurately predict performance. Adaptations should be timely.