A machine learning approach to predicting renal failure

Abstract submitted for ASN Kidney Week 2017

November 9, 2017

A machine learning approach to predicting renal failure

Author Block: Girish N Nadkarni and Edward Lee, Oliver Fielding, Teddy Cha, Hai Po Sun, Chris Kipers, William Paiva, Elvena Fong and Steven G Coca


Risk prediction of end stage renal disease (ESRD) for population management and care intervention is both a research priority and unmet public health need. The use of electronic medical records (EMR) can be leveraged for improved assessment of ESRD onset. However, traditional risk scoring may not provide accurate risk prediction or complete population coverage if EMR data is incomplete. To handle missing data we developed a machine learning (ML) approach and compared it to traditional risk scoring in two EMR cohorts.


We utilized longitudinal data from the Mount Sinai Chronic Kidney Disease registry and a data set from the Center for Health Systems Innovation at Oklahoma State University provided by the Cerner Corporation. Using a random forest ML technique and imputation we can predict risk of ESRD (de5ned as administrative codes for dialysis or transplant). We then compared it to the Tangri 4-Variable kidney failure risk equation (KFRE) by comparing area under curve (AUC) measures and the percent of the population on which each metric can be calculated.


We analyzed data from 318,292 patients. The median age was 65 years, 54% were female and 20% were African American. 60% of the cohort had at least one estimated glomerular filtration rate (eGFR) measurement before ESRD onset, however, only 6% had both an eGFR measurement and a urine albumin creatinine ratio (UACR) value before failure. The AUC of the 4-Variable KFRE was 0.89 (95% CI [0.88, 0.91]), while the ML approach had an AUC of 0.94 (95% CI [0.94, 0.95]). Importantly, the improvement in AUC was achieved while risk-scoring 10 times more of the population.


The ML approach outperformed traditional risk scoring such as the 4-Variable KFRE both in risk discrimination and in population coverage. Therefore, future efforts to risk stratify for
population management and care intervention will benefit from utilizing ML approaches.