ADVANCING PUBLIC HEALTH WITH CAUSAL MACHINE LEARNING: ESTIMATING THE CAUSAL EFFECT OF SMOKING ON DIABETES RISK USING BRFSS 2015 DATA
Abstract
Distinguishing correlation from causation is critical for effective public health interventions. This study applies a rigorous causal machine learning framework to estimate the causal effect of smoking on diabetes risk using the 2015 Behavioral Risk Factor Surveillance System (BRFSS) dataset (N = 253,680 U.S. adults). We constructed a directed acyclic graph (DAG), identified the average treatment effect (ATE) via the backdoor criterion, and estimated it using propensity score matching (PSM) within the DoWhy library. Smoking was associated with a statistically significant 1.6 percentage point increase in diabetes probability (ATE = 0.016, 95% CI: 0.010–0.022). Robustness was confirmed through multiple refutation tests, including subset refutation (new ATE = 0.0086, p = 0.52) and random common cause addition (new ATE = 0.0071, p = 0.20). These findings provide policy-relevant causal evidence supporting intensified anti-smoking interventions as part of diabetes prevention strategies and demonstrate the value of integrating causal inference with machine learning in observational public health research.
References
[2] J.E. Brand, et al., Recent developments in causal inference and machine learning, Annual Review of Sociology, 49, 81–110, (2023), https://doi.org/10.1146/annurev-soc-030420-015345
[3] V. Chernozhukov, et al., Double/debiased machine learning for treatment and structural parameters, The Econometrics Journal, 21(1), C1–C68, (2018).
[4] K. Inoue, et al., Machine learning in causal inference for epidemiology, European Journal of Epidemiology, 39(10), 1073–1083, (2024), https://doi.org/10.1007/s10654-024-01173-x.
[5] S.C. Larsson, et al., Smoking, diabetes and cardiovascular diseases: A triad of causal associations, Diabetologia, 66(8), 1409–1419, (2023), https://doi.org/10.1007/s00125-023-05932-4.
[6] A. Pan, et al., Relation of smoking with incident type 2 diabetes: A systematic review and meta-analysis, Lancet Diabetes & Endocrinology, 3(12), 958–967, (2015), https://doi.org/10.1016/S2213-8587(15)00316-2.
[7] M. Prosperi, et al., Causal inference and counterfactual prediction in machine learning for actionable healthcare, Nature Machine Intelligence, 2(7), 369–375, (2020), https://doi.org/10.1038/s42256-020-0197-y.
[8] A. Sharma and E. Kiciman, DoWhy: An end-to-end library for causal inference, arXiv:2011.04216, (2020).
[9] C. Willi, et al., Active smoking and the risk of type 2 diabetes: A systematic review and meta-analysis, JAMA, 298(22), 2654–2664, (2007), https://doi.org/10.1001/jama.298.22.2654.
[10] S. Yuan and S.C. Larsson, A causal relationship between cigarette smoking and type 2 diabetes: A Mendelian randomization study, Scientific Reports, 9, 19342, (2019), https://doi.org/10.1038/s41598-019-56014-9.
[11] Centers for Disease Control and Prevention (CDC), Behavioral Risk Factor Surveillance System (BRFSS), 2015, U.S. Department of Health and Human Services, Atlanta, GA.
Copyright (c) 2025 DAVID AKANJI, MUMINU OSUMAH ADAMU

This work is licensed under a Creative Commons Attribution 4.0 International License.