A population-based study to develop juvenile arthritis case definitions for administrative health data using model-based dynamic classification

Background: Previous research has shown that chronic disease case definitions constructed using population-based administrative health data may have low accuracy for ascertaining cases of episodic diseases such as rheumatoid arthritis, which are characterized by periods of good health followed by periods of illness. No studies have considered a dynamic approach that uses statistical (i.e., probability) models for repeated measures data to classify individuals into disease, non-disease, and indeterminate categories as an alternative to deterministic (i.e., non-probability) methods that use summary data for case ascertainment. The research objectives were to validate a model-based dynamic classification approach for ascertaining cases of juvenile arthritis (JA) from administrative data, and compare its performance with a deterministic approach for case ascertainment.

Methods: The study cohort was comprised of JA cases and non-JA controls 16 years or younger identified from a pediatric clinical registry in the Canadian province of Manitoba and born between 1980 and 2002. Registry data were linked to hospital records and physician billing claims up to 2018. Longitudinal discriminant analysis (LoDA) models and dynamic classification were applied to annual healthcare utilization measures. The deterministic case definition was based on JA diagnoses in healthcare use data anytime between birth and age 16 years; it required one hospitalization ever or two physician visits. Case definitions based on model-based dynamic classification and deterministic approaches were assessed on sensitivity, specificity, and positive and negative predictive values (PPV, NPV). Mean time to classification was also measured for the former.

Results: The cohort included 797 individuals; 386 (48.4 %) were JA cases. A model-based dynamic classification approach using an annual measure of any JA-related healthcare contact had sensitivity = 0.70 and PPV = 0.82. Mean classification time was 9.21 years. The deterministic case definition had sensitivity = 0.91 and PPV = 0.92.

Conclusions: A model-based dynamic classification approach had lower accuracy for ascertaining JA cases than a deterministic approach. However, the dynamic approach required a shorter duration of time to produce a case definition with acceptable PPV. The choice of methods to construct case definitions and their performance may depend on the characteristics of the chronic disease under investigation.

Keywords: Administrative data; Classification; Discriminant analysis; Juvenile arthritis; Longitudinal analyses.