A COMPARATIVE ANALYSIS OF SOME SELECTED MACHINE LEARNING ALGORITHMS FOR CLASSIFICATION AND PREDICTION OF DIABETES TYPES

  • ROTIMI OGUNDEJI DEPARTMENT OF STATISTICS, FACULTY OF SCIENCE, UNIVERSITY OF LAGOS, AKOKA, NIGERIA.
  • HOPE ADEGOKE DEPARTMENT OF STATISTICS, FACULTY OF SCIENCE, UNIVERSITY OF LAGOS, AKOKA, NIGERIA.
Keywords: Algorithm Accuracy, Classification Models, Diabetes Types, Neutral Network

Abstract

Diabetes mellitus is one of the most common human diseases worldwide and may cause several health-related complications. It is responsible for considerable morbidity, mortality and economic loss. A timely diagnosis and prediction of this disease could provide patients with an opportunity to take the appropriate preventive and treatment strategy. To gain an understanding of risk factors, the study explores the comparison of six classification algorithms along with the artificial neural network algorithm in the prediction of diabetes types. The analysis of data from patient records shows four main predictors of type 1 & type 2 diabetes: Age, Ulcer Duration, White Blood Cell (WBC) and Erythrocyte Sedimentation Rate (ESR). All the classification algorithms based on accuracy, precision, recall, and F-measure show good results for the parameters with accuracy greater than 95%. However, random forest and k-nearest neighbor algorithms provided 99% accuracy for the train/test split. Thus, these models can be applied to make a reasonable classification of type 1 and type 2 diabetes. Hence, the study ensures a timely diagnosis and prediction of this disease for appropriate preventive and treatment strategy.

References

[1] Y. Heianza, N. Kato, M. E. Cooper, L. Groop, E. Ferrannini, R. A. DeFronzo, K. Yano and T. Kadowaki (2020). Challenges and Opportunities in the Classification and Treatment of Diabetes. The Lancet Diabetes & Endocrinology, 8(6): 474-486.
[2] WHO (2022), Newsroom, 16 September, 2022. Retrieved from: https://www.who.int/news-room/fact-sheets/detail/diabetes
[3] N. Singh, R. Kesharwani, A. Tiwari & D. Patel (2016). A review on diabetes mellitus. The Pharma Innovation. 5. 36-40.
[4] J. Lindström & J. Tuomilehto (2003). The diabetes risk score: A practical tool to predict type 2 diabetes risk. Diabetes Care, 26(3): 725-731.
[5] J. K. Jobeda, Y. F. Simon (2021) A Comparison of Machine Learning Algorithms for Diabetes Prediction, ICT Express, 7(4): 432-439.
[6] K. VijiyaKumar, B. Lavanya, I. Nirmala & S. S. Caroline (2019). Random Forest Algorithm for the Prediction of Diabetes. In 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN) (pp.1-5). IEEE.
[7] S. Kumar, B. Bhushan, D. Singh & D. Choubey (2020). Classification of Diabetes using Deep Learning. In 2020 6th International Conference on Computing and Sustainable Societies (ICCSS) (pp. 651-655). IEEE. https://doi.org/10.1109/ICCSP48568.2020.9182293
[8] N. V. Chawla, K. W. Bowyer, L. O. Hall & W. P. Kegelmeyer (2002). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In Advances in neural information processing systems (pp. 307-314).
[9] P. S. Kohli & S. Arora (2018). Application of Machine Learning in Disease Prediction. In 2018 4th International Conference on Computing Communication and Automation (ICCCA) (pp. 1-4). Greater Noida, India: IEEE. https://doi.org/10.1109/CCAA.2018.8777449.
[10] Y. K. Sajratul, R. Monibor & H. Kamrul (2018). Important Feature Selection & Accuracy Comparisons of Different Machine Learning Models for Early Diabetes Detection, In 2018 International Conference on Innovation in Engineering and Technology (ICIET).
[11] A. Adel & S. Abdulkadir (2019). Performance Comparison of Machine Learning Techniques on Diabetes Disease Detection, 1st International Informatics and Software Engineering Conference (UBMYK).
[12] G. A. Pethunachiyar (2020). Classification of Diabetes Patients Using Kernel Based Support Vector Machines, In 2020 International Conference on Computer Communication and Informatics (ICCCI -2020), Coimbatore, INDIA (pp. 168-171).
[13] K. D. Samrat, H. Ashraf & R. Mahbubur (2018). Implementation of a Web Application to Predict Diabetes Disease: An Approach Using Machine Learning Algorithm, In 2018 21st International Conference of Computer and Information Technology (ICCIT), pp. 21-23.
[14] M. Shanthi, R. Marimuthu, S. N. Shivapriya & R. Navaneethakrishnan (2019). Diagnosis of Diabetes using an Extreme Learning Machine Algorithm based Model. In 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST) (pp. 1-5). IEEE.
[15] M. J. Uddin, M. M. Ahamad, M. N. Hoque, M. A. A. Walid, S. Aktar, N. Alotaibi, S. A. Alyami, M. A. Kabir, M. A. Moni. (2023). Comparison of Machine Learning Techniques for the Detection of Type-2 Diabetes Mellitus: Experiences from Bangladesh. Information. 2023; 14(7):376. https://doi.org/10.3390/info14070376
[16] R. Patil & S. Tamane (2018). A Comparative Analysis on the Evaluation of Classification Algorithms in the Prediction of Diabetes. International Journal of Electrical and Computer Engineering. 8. 3966-3975. 10.11591/ijece.v8i5.pp3966-3975.
[17] S. L. Cichosz, , C. Bender, & O. Hejlesen (2024). A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients. Diabetology, 5(1), 1–11. https://doi.org/10.3390/diabetology5010001
[18] U. M. Butt, S. Letchmunan, M. Ali, F. H. Hassan, A. Baqir & H. H. R. Sherazi (2021). Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications. Journal of Healthcare Engineering, 2021, 9930985. https://doi.org/10.1155/2021/9930985
[19] V. Rawat, S. Joshi, S. Gupta, D. P. Singh, N. Singh (2022). Machine learning algorithms for early diagnosis of diabetes mellitus: A comparative study, Materials Today: Proceedings, Volume 56, Part 1,2022, Pages 502-506, ISSN 2214-7853, https://www.sciencedirect.com/science/article/pii/S2214785322007507
[20] Y. Yang, J. O. Pedersen & Q. Diao (2016). A comparative study of feature selection methods for text classification. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, (pp.153–162).
[21] M. Mukherjee & M. Khushi (2021). SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features. Applied System Innovation, 4(1): 18 – 27
[22] J. S. Cramer (2002). The Origins of Logistic Regression. Tinbergen Institute, Tinbergen Institute Discussion Papers. 10.2139/ssrn.360300.
[23] C. Cortes & V. Vapnik (1995). Support-Vector Networks. Machine Learning, 20(3): 273-297.
[24] G. Guo, H. Wang, D. Bell & Y. Bi (2004). KNN Model-Based Approach in Classification. Proceedings of the International Conference on Natural Computation, 2004, (pp. 163-167).
[25] J. R. Quinlan (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
[26] L. Breiman (2001). Random forests. Machine Learning, 45(1), 5-32.
[27] A. M. Kibriya, E. Frank, B. Pfahringer & G. Holmes (2004). Multinomial Naive Bayes for Text categorization Revisited. In Proceedings of the 22nd International Conference on Machine Learning (pp. 633-640).
[28] R. Ogundeji, J. Akinyemi and M. Tijani (2022). Na¨Ive Bayes Algorithm for Document Classification. Annals of Mathematics and Computer Science 7(2022): 54 – 65.
[29] N. Djuric, J. Zhou & J. R. Finkelstein (2010). Bayesian Network Classifiers for Categorical Data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 40(3): 881-890.
[30] N. Japkowicz & M. Shah (2011). Evaluating Learning Algorithms: A Classification Perspective.Cambridge: Cambridge University Press, England.
[31] G. V. Klepac & T. Šmuc, (2019). Evaluation of Classifier Performance with Confusion Matrix and Related Measures. Biochemia Medica, 29(2): 1-15.
[32] E. Çoban (2016). Neural Networks and Their Applications. Retrieved from: https://www.researchgate.net/publication/294085530.
Published
2024-07-14
How to Cite
OGUNDEJI, R., & ADEGOKE, H. (2024). A COMPARATIVE ANALYSIS OF SOME SELECTED MACHINE LEARNING ALGORITHMS FOR CLASSIFICATION AND PREDICTION OF DIABETES TYPES. Unilag Journal of Mathematics and Applications, 3, 53-70. Retrieved from http://lagjma.unilag.edu.ng/article/view/2144
Section
Articles