A machine learning approach to identifying patients at risk of developing incident CKD
Authors: Tia Y. Yu, Lauren A. Wiener, Xiaoyan Wang, Oliver L. Fielding, Jung H. Son, Praveen K. Potukuchi, Csaba P. Kovesdy. pulseData, Inc., New York, NY; University of Tennessee Health Science Center, Memphis, TN.
Chronic Kidney Disease (CKD) is an under-identified condition and current methodology for identifying patients at risk of developing incident CKD is limited. Identifying patients who are high risk for CKD can improve awareness while delaying onset and progression of CKD. Machine learning algorithms can be used to stratify risk of those likely to develop incident CKD. Previous work has defined CKD using ICD codes or a limited number of eGFR readings.
Data from 1,780,262 patients with no baseline CKD in the Veterans Affairs healthcare system was analyzed. We used a random forest classifier to 1) predict incident CKD (eGFR >90 progressing to eGFR <60) and="" 2)="" predict="" the="" development="" of="" advanced="" ckd="" (egfr="">60 progressing to eGFR </60)>
The performance of the prediction models are summarized in Table 1. As models predict on outcomes across longer time ranges the lab values become less important while the comorbidities rise in importance. At the top risk quartile, our one year incident CKD model has an AUC of 0.839, a sensitivity of 0.754, and a specificity of 0.751, and our one year development of advanced CKD model has an AUC of 0.871, a sensitivity of 0.825, and a specificity of 0.751.
We demonstrate the ability to leverage advanced machine learning models to predict CKD incidence using longitudinal data commonly available in EHR systems. Future studies should validate our model in a clinical setting.