Copyright: ©Author(s) 2026.
World J Diabetes. Mar 15, 2026; 17(3): 115097
Published online Mar 15, 2026. doi: 10.4239/wjd.v17.i3.115097
Published online Mar 15, 2026. doi: 10.4239/wjd.v17.i3.115097
Table 1 Characteristics of included studies
| Ref. | Country | Data source | Recruitment period | Single-center/multi-center | Sample size | Inclusion criteria | Definition of positive outcome |
| Cho et al[22], 2008 | South Korea | EHRs | 1996-2005 | Single-center | 292 | Confirmed T2DM; age ≥ 18 years; patients underwent up to 20 clinical tests | 20 μg/minute to 200 μg/minute in urinary albumin; no microalbumin or renal failure at diabetes diagnosis; prior evidence of diabetic retinopathy |
| Dagliati et al[23], 2018 | Italy | EHRs | 2012-2016 | Multi-center | 943 | Confirmed T2DM; no pre-existing complications; sufficient follow-up records | eGFR < 60 mL/minute/1.73 m2; UACR=30 mg/gram to 299 mg/gram (≥ 2 morning samples) |
| Rodriguez-Romero et al[24], 2019 | United States | ACCORD dataset | 2001.01-2001.06; 2003.02-2005.10 | Multi-center | 10251 | Confirmed T2DM by ADA criteria; age ≥ 40 with CVD history; age ≥ 55 with high CVD risk; HbA1c 7.5%-9% (multiple drugs); 7.5%-11% (fewer drugs) | Baseline SCr doubling; eGFR < 60 mL/minute/1.73 m2; UACR ≥ 30 mg/gram or 3.4 mg/mmol; UACR ≥ 300 mg/gram or 33.9 mg/mmol; renal failure |
| Allen et al[25], 2022 | United States | EHRs | 2007-2020 | Multi-center | 111046 | Confirmed T2DM by ICD-9/10 codes; age ≥ 18 years; ≥5 years follow-up records; required baseline tests (1-year pre-T2DM); albuminuria/reduced eGFR allowed at baseline; no pre-existing CKD or renal transplant | DKD confirmed by ICD-9/10 codes |
| Dong et al[26], 2022 | China | EMRs | 2008.10-2019.12 | Single-center | 2809 | Confirmed T2DM by ADA criteria; age ≥ 18 years; 3 years follow-up records | UACR > 30 mg/gram; protein excretion rate > 150 mg/24 hours; urine dipstick test ≥ 1+; eGFR < 60 mL/minute/1.73 m2 |
| Nicolucci et al[27], 2022 | Italy | EMRs | NA | Multi-center | 147664 | Confirmed T2DM by ICD-9 CM codes | Confirmed DKD by ICD-9 CM codes; no pre-existing complication at baseline |
| Sabanayagam et al[28], 2023 | Singapore | SEED study | 2004-2011 | Multi-center | 1365 | Random glucose ≥ 11.1 mmol/L; HbA1c ≥ 6.5% (48 mmol/mol); self-reported antidiabetic medication use; diagnosed with diabetes by a physician | eGFR < 60 mL/minute/1.73 m2; ≥ 25% eGFR decline during follow-up |
| Hosseini Sarkhosh et al[29], 2023 | Iran | EHRs | 2012-2021 | Single-center | 1907 | Confirmed T2DM by ADA criteria; 5 years follow-up records; no pre-existing DKD at baseline | UACR ≥ 30 mg/gram; eGFR ≤ 60 mL/minute/1.73 m2 |
| Yun et al[30], 2024 | China | EHRs | NA | Multi-center | 6040 | Confirmed T2DM; age ≥ 18 years; ≥ 7 years follow-up records; ≥ 2 hospital visits per year; no pre-existing DKD at baseline or within 2 years | UACR ≥ 30 mg/gram (≥ 3 months); eGFR < 60 mL/minute/1.73 m2; exclude other kidney diseases |
| Lin et al[31], 2025 | China | EHRs | 2011-2023 | Single-center | 3291 | Confirmed T2DM by ICD-9/10 codes; age ≥ 18 years; at least one hospital visit following the end of the baseline time window | eGFR < 60 mL/minute/1.73 m2; UACR ≥ 30 mg/gram over 3 months; DKD confirmed by ICD-9/10 codes; protein in urine dipstick test ≥ 1+ |
| Zou et al[32], 2025 | China | EHRs | 2014.01-2022.12 | Single-center | 9572 | Confirmed T2DM; no pre-existing DKD; ≥ 3 years follow-up records; exclude glomerular/systemic diseases | UACR ≥ 30 mg/gram in 2/3 tests within 3 months to 6 months; eGFR < 60 mL/minute/1.73 m2 for more than 3 months; renal biopsy was consistent with DKD pathological changes |
| Dei Cas et al[33], 2025 | Italy | DARWIN-Renal | 2015.01-2021.09 | Multi-center | 22379 | Confirmed T2DM; age ≥ 18 years; ≥ 1 year follow-up records | 5 renal disease severity thresholds established by the KDOQI scale based on the eGFR |
Table 2 Characteristics of the best-performing model from each included study
| Ref. | Number | Algorithm type | Missing data imputation | Oversample | Train-test split | Validation type | Internal validation method | Best model | Prediction horizon, year | AUC (95%CI) |
| Cho et al[22], 2008 | 292 | ML | NA | NA | NA | Internal | LOOCV | SVM | 1 | 0.969 (0.941-0.997) |
| Dagliati et al[23], 2018 | 943 | ML | missForest | Oversampling | NA | Internal | LOOCV | LR | 3 | 0.808 (0.772-0.845) |
| Rodriguez-Romero et al[24], 2019 | 10251 | ML | NA | SMOTE | 66:34 | Internal | CV-10 | RF | 1-1.9 | 0.730 (0.715-0.745) |
| Allen et al[25], 2022 | 111046 | ML | missForest | NA | 7:1 | External | Holdout | XGBoost | 5 | 0.750 (0.734-0.766) |
| Dong et al[26], 2022 | 861 | ML | missForest | NA | 8:2 | Internal | CV-5 | LightGBM | 3 | 0.815 (0.747-0.882) |
| Nicolucci et al[27], 2022 | 147664 | ML | Extra-values imputation | SMOTE | Yes | External | CV-10 | XGBoost | 2 | 0.970 (0.968-0.972) |
| Sabanayagam et al[28], 2023 | 1365 | ML | Mean/mode | NA | 8:2 | Internal | CV-5 | Elastic Net | 6 | 0.851 (0.847-0.856) |
| Hosseini Sarkhosh et al[29], 2023 | 3444 | ML | Mean/mode | NA | 8:2 | External | RFECV | RF | 5 | 0.790 (0.770-0.820) |
| Yun et al[30], 2024 | 6040 | DL | NA | NA | 7:3 | Internal | Holdout | LSTM | 5 | 0.830 (0.807-0.853) |
| Lin et al[31], 2025 | 3291 | ML | MICE | NA | 7:3 | Internal | Holdout | SuperLearner | 2.53 | 0.714 (0.673-0.755) |
| Zou et al[32], 2025 | 12190 | ML | MICE | SMOTE | 8:2 | Internal | CV-5 | LightGBM | 3 | 0.918 (0.906-0.930) |
| Dei Cas et al[33], 2025 | 32379 | DL | MICE | NA | 8:2 | External | CV-5 | RNN | 2 | 0.887 (0.869-0.904) |
Table 3 Subgroup analysis of pooled area under the receiver operating characteristic curve values in internal validation
| Subgroup variable | Level | Studies | AUC (95%CI) | I2 | t11 | P value | χ2 | P value |
| Region | Overall | 7.34 | < 0.0001 | 0.01 | 0.9422 | |||
| Asian | 7 | 0.86 (0.74-0.92) | 96.7% | |||||
| Western | 5 | 0.86 (0.78-0.91) | 99.7% | |||||
| Study type | Overall | 12 | 0.86 (0.78-0.91) | 85.8% | 7.34 | < 0.0001 | 0.47 | 0.4951 |
| Prospective study | 3 | 0.90 (0.40-0.99) | 99.7% | |||||
| Retrospective study | 9 | 0.85 (0.74-0.91) | 99.7% | |||||
| Algorithm type | Overall | 7.34 | < 0.0001 | 0.00 | 0.9563 | |||
| ML | 10 | 0.86 (0.76-0.92) | 99.7% | |||||
| DL | 2 | 0.86 (0.23-0.99) | 99.7% | |||||
| Center | Overall | 12 | 0.86 (0.78-0.91) | 99.7% | 7.34 | < 0.0001 | 0.02 | 0.8770 |
| Single-center | 5 | 0.86 (0.66-0.96) | 97.7% | |||||
| Multicenter | 7 | 0.85 (0.78-0.91) | 99.8% | |||||
| Validation | Overall | 12 | 0.86 (0.78-0.91) | 99.7% | 7.34 | < 0.0001 | 0.26 | 0.6107 |
| Internal validation | 8 | 0.84 (0.74-0.91) | 98.5% | |||||
| External validation | 4 | 0.86 (0.78-0.91) | 99.9% | |||||
| Prediction horizon | Overall | 12 | 0.86 (0.78-0.91) | 99.7% | 7.34 | < 0.0001 | 0.92 | 0.3362 |
| < 3 years | 5 | 0.90 (0.64-0.98) | 99.8% | |||||
| ≥ 3 years | 7 | 0.83 (0.77-0.88) | 98.0% |
Table 4 Predictors in diabetic kidney disease prediction models
| Ref. | Number | Demographic | Past history | Laboratory |
| Cho et al[22], 2008 | 39 | BMI (mean, minimum) | NA | SBP (mean, EST, and initial), DBP (slope and maximum), WBC (initial), WBC (latest), hemoglobin, platelet (slope), platelet (var), platelet (K), platelet (EST), cholesterol (K and EST), AST (K and initial), ALT (initial, minimum, and K), ALP (minimum and latest), creatinine (mean and var), uric acid (var), Na (EST), K (slope and K), triglyceride (EST), HDL-C (var and initial), LDL-C (EST and initial), microalbumin (mean, var, maximum, and minimum) |
| Dagliati et al[23], 2018 | 4 | BMI | Smoking, hypertension | HbA1c |
| Rodriguez-Romero et al[24], 2019 | 18 | Age | NA | UAlb at baseline, eGFR at baseline, UCr at baseline, eGFR from baseline to year 1, trig at baseline, FPG at baseline, CPK at baseline, age, eGFR at month 4, K at baseline, LDL at baseline, Chol at baseline, eGFR at month 12, FPG at month 4, FPG from baseline to year 1, trig from baseline to year 1, UACR at baseline, FPG at month 8 |
| Allen et al[25], 2022 | 15 | Age, gender, BMI | Acute kidney injury, chronic heart failure, smoking, drinking | SBP, DBP, blood urea nitrogen, creatinine, eGFR, cholesterol (HDL and LDL), white cell count |
| Dong et al[26], 2022 | 8 | Age, BMI | NA | Hcy, HbA1c, ALB, eGFR, bicarbonate, LDL |
| Nicolucci et al[27], 2022 | 46 | Gender, age, height, weight, BMI, waist circumference, diabetes duration | NA | SBP, DBP, ankle/brachial index DX, ankle/brachial index SX, fasting blood glucose, blood glucose after breakfast, blood glucose before dinner, blood glucose before lunch, blood glucose at 11:00 pm, blood glucose after lunch, blood glucose after dinner, pre-prandial blood glucose, post-prandial blood glucose, HbA1c, albuminuria, serum creatinine, creatinine clearance, total cholesterol, LDL cholesterol, HDL cholesterol, triglycerides, fibrinogen, GGT, ALT, AST, alkaline phosphatase, amylase, CPK, hemoglobin, platelets, BUN, uric acid, glycosuria, urinary amylase, urinary ketones, urinary potassium, urinary sodium, urine creatinine, urine culture |
| Sabanayagam et al[28], 2023 | 15 | Ethnicity (Malay/Chinese) | Diabetic retinopathy, hypertension | Acetate, diabetic retinopathy, SBP, DHA, GFR-EPI, HbA1c, IDL-CE%, M-HDL-PL%, M-VLDL-PL%, S-HDL-FC%, XL-HDL-CE%, anti-DM meds |
| Hosseini Sarkhosh et al[29], 2023 | 6 | Diabetes duration | Hypertension, CVD | HbA1c, eGFR, ACR |
| Yun et al[30], 2024 | 10 | Age | NA | PP, SBP, variabilities of PP and SBP, Scr, HDL-C, HbA1c, TG, variabilities of HbA1c |
| Lin et al[31], 2025 | 46 | Gender, age | Smoking, drinking | WBC count, neutrophil count, lymph count, mono count, eosinophil count, basophil count, red blood cell count, hemoglobin, hematocrit, mean corpuscular volume, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, red cell distribution width-coefficient of variation, platelet, mean platelet volume, platelet distribution width, lymph, mono, eosinophil, basophil, ALT, AST, AST/ALT (AST ALT), total bilirubin, creatinine, Ca, total cholesterol, triglyceride, HDL, LDL, prealbumin, HbA1c, D-Dimer, pH, glucose, billing, ketone, obstetric, nitrogen, urology |
| Zou et al[32], 2025 | 5 | Age | NA | UACR, Cystatin C, eGFR, Neutrophil |
| Dei Cas et al[33], 2025 | 34 | Gender, age, weight, weight-past, BMI, diabetes duration | CKD | SBP, DBP, SBP in the past, DBP in the past, HbA1c, HbA1c in the past, eGFR, eGFR in the past, AER, ACR > 30, AER in the past, and anti-DM Meds |
- Citation: Chen Q, Peng HW, Fu CX, Meng KK, Zhang JB. Machine learning and deep learning in predicting the risk of diabetic kidney disease: A systematic review and meta-analysis. World J Diabetes 2026; 17(3): 115097
- URL: https://www.wjgnet.com/1948-9358/full/v17/i3/115097.htm
- DOI: https://dx.doi.org/10.4239/wjd.v17.i3.115097
