ADVANCING PUBLIC HEALTH WITH CAUSAL MACHINE LEARNING: ESTIMATING THE CAUSAL EFFECT OF SMOKING ON DIABETES RISK USING BRFSS 2015 DATA

  • DAVID AKANJI DEPARTMENT OF STATISTICS, UNIVERSITY OF LAGOS, AKOKA, LAGOS STATE, NIGERIA.
  • MUMINU OSUMAH ADAMU DEPARTMENT OF STATISTICS, UNIVERSITY OF LAGOS. NIGERIA
Keywords: Causal inference, machine learning, propensity score matching, smoking, type 2 diabetes, public health, DoWhy

Abstract

Distinguishing correlation from causation is critical for effective public health interventions. This study applies a rigorous causal machine learning framework to estimate the causal effect of smoking on diabetes risk using the 2015 Behavioral Risk Factor Surveillance System (BRFSS) dataset (N = 253,680 U.S. adults). We constructed a directed acyclic graph (DAG), identified the average treatment effect (ATE) via the backdoor criterion, and estimated it using propensity score matching (PSM) within the DoWhy library. Smoking was associated with a statistically significant 1.6 percentage point increase in diabetes probability (ATE = 0.016, 95% CI: 0.010–0.022). Robustness was confirmed through multiple refutation tests, including subset refutation (new ATE = 0.0086, p = 0.52) and random common cause addition (new ATE = 0.0071, p = 0.20). These findings provide policy-relevant causal evidence supporting intensified anti-smoking interventions as part of diabetes prevention strategies and demonstrate the value of integrating causal inference with machine learning in observational public health research.

References

[1] S. Athey and G.W. Imbens, Machine learning methods that economists should know about, Annual Review of Economics, 11, 685–725, (2019), https://doi.org/10.1146/annurev-economics-080217-053433.
[2] J.E. Brand, et al., Recent developments in causal inference and machine learning, Annual Review of Sociology, 49, 81–110, (2023), https://doi.org/10.1146/annurev-soc-030420-015345
[3] V. Chernozhukov, et al., Double/debiased machine learning for treatment and structural parameters, The Econometrics Journal, 21(1), C1–C68, (2018).
[4] K. Inoue, et al., Machine learning in causal inference for epidemiology, European Journal of Epidemiology, 39(10), 1073–1083, (2024), https://doi.org/10.1007/s10654-024-01173-x.
[5] S.C. Larsson, et al., Smoking, diabetes and cardiovascular diseases: A triad of causal associations, Diabetologia, 66(8), 1409–1419, (2023), https://doi.org/10.1007/s00125-023-05932-4.
[6] A. Pan, et al., Relation of smoking with incident type 2 diabetes: A systematic review and meta-analysis, Lancet Diabetes & Endocrinology, 3(12), 958–967, (2015), https://doi.org/10.1016/S2213-8587(15)00316-2.
[7] M. Prosperi, et al., Causal inference and counterfactual prediction in machine learning for actionable healthcare, Nature Machine Intelligence, 2(7), 369–375, (2020), https://doi.org/10.1038/s42256-020-0197-y.
[8] A. Sharma and E. Kiciman, DoWhy: An end-to-end library for causal inference, arXiv:2011.04216, (2020).
[9] C. Willi, et al., Active smoking and the risk of type 2 diabetes: A systematic review and meta-analysis, JAMA, 298(22), 2654–2664, (2007), https://doi.org/10.1001/jama.298.22.2654.
[10] S. Yuan and S.C. Larsson, A causal relationship between cigarette smoking and type 2 diabetes: A Mendelian randomization study, Scientific Reports, 9, 19342, (2019), https://doi.org/10.1038/s41598-019-56014-9.
[11] Centers for Disease Control and Prevention (CDC), Behavioral Risk Factor Surveillance System (BRFSS), 2015, U.S. Department of Health and Human Services, Atlanta, GA.
Published
2025-12-29
How to Cite
AKANJI, D., & ADAMU, M. O. (2025). ADVANCING PUBLIC HEALTH WITH CAUSAL MACHINE LEARNING: ESTIMATING THE CAUSAL EFFECT OF SMOKING ON DIABETES RISK USING BRFSS 2015 DATA. Unilag Journal of Mathematics and Applications, 5(2), 58 - 64. Retrieved from https://lagjma.unilag.edu.ng/article/view/2782