Classifying Pseudogout using Machine Learning Approaches with Electronic Health Record Data

OBJECTIVE:

Identifying pseudogout in large datasets is difficult due to its episodic nature and lack of billing codes specific to this acute subtype of calcium pyrophosphate (CPP) deposition disease. We evaluated a novel machine learning approach for classifying pseudogout using electronic health record (EHR) data.

METHODS:

We created an EHR data mart of patients with ≥1 relevant billing code or ≥2 natural language processing (NLP) mentions of pseudogout or chondrocalcinosis, 1991-2017. We selected 900 for gold standard chart review for: (1) definite pseudogout, synovitis+synovial fluid CPP crystals; (2) probable pseudogout, synovitis+chondrocalcinosis; (3) not pseudogout. We applied a topic modeling approach to identify definite/probable pseudogout. A combined algorithm included topic modeling plus manually-reviewed CPP crystal results. We compared algorithm performance and cohorts identified by: (1) billing codes, (2) presence of CPP crystals, (3) topic modeling, (4) combined algorithm.

RESULTS:

Among 900 subjects, 123 (13.7%) had pseudogout by chart review (68 definite, 55 probable). Billing codes had sensitivity 65% and PPV 22% for pseudogout. Presence of CPP crystals had sensitivity 29% and PPV 92%. Without using CPP crystal results, topic modeling had sensitivity 29% and PPV 79%. The combined algorithm yielded sensitivity 42% and PPV 81%. The combined algorithm identified 50% more patients than presence of CPP crystals; the latter captured a portion of definite pseudogout and missed probable pseudogout.

CONCLUSION:

For pseudogout, an episodic disease with no specific billing code, combining NLP, machine learning methods, and synovial fluid lab results yielded an algorithm that significantly boosted PPV compared to billing codes.

 2020 Jan 7. doi: 10.1002/acr.24132. [Epub ahead of print]