Klasifikasi Penyakit Diabetes Melitus Menggunakan Metode Stacking Ensemble
Abstract
Pendeteksian dini terhadap risiko diabetes merupakan tantangan penting dalam dunia medis modern. Penelitian ini bertujuan untuk meningkatkan akurasi klasifikasi pasien diabetes menggunakan metode stacking ensemble, yang menggabungkan tiga model pembelajaran mesin: K-Nearest Neighbors (KNN), Random Forest, dan XGBoost. Dataset yang digunakan adalah Pima Indians Diabetes, yang terdiri dari 768 data pasien. Setelah dilakukan preprocessing, balancing, dan feature selection, model stacking dibangun dengan Logistic Regression sebagai meta-learner. Hasil evaluasi menunjukkan bahwa stacking ensemble mencapai akurasi 77.27% dan ROC AUC 82.91%. Metode ini menunjukkan potensi besar dalam pengembangan sistem diagnosis otomatis yang lebih andal untuk penyakit diabetes..
Downloads
References
Carpinteiro, C., Lopes, J., Abelha, A., & Santos, M. F. (2023). A Comparative Study of Classification Algorithms for Early Detection of Diabetes. Procedia Computer Science, 220, 868–873. https://doi.org/10.1016/j.procs.2023.03.117
Kim, K. B., Park, H. J., & Song, D. H. (2023). Combining Supervised and Unsupervised Fuzzy Learning Algorithms for Robust Diabetes Diagnosis. Applied Sciences (Switzerland), 13(1). https://doi.org/10.3390/app13010351
Kumar, A., & Jain, M. (2020). Ensemble Learning for AI Developers: Learn Bagging, Stacking, and Boosting Methods with Use Cases. In Ensemble Learning for AI Developers: Learn Bagging, Stacking, and Boosting Methods with Use Cases. Apress Media LLC. https://doi.org/10.1007/978-1-4842-5940-5
Kumari, S., & Upadhaya, A. (2024). Investigating Role of Supervised Machine Learning Approach in Classification of Diabetic Patient. https://doi.org/https://doi.org/10.52783/jes.2987
Liang, X., Song, W., Yang, W., & Yue, Z. (2025). Enhancing diabetes risk assessment through Bayesian networks: An in-depth study on the Pima Indian population. Endocrine and Metabolic Science, 17. https://doi.org/10.1016/j.endmts.2024.100212
Nguyen, L. P., Tung, D. D., Nguyen, D. T., Le, H. N., Tran, T. Q., Binh, T. Van, & Pham, D. T. N. (2023). The Utilization of Machine Learning Algorithms for Assisting Physicians in the Diagnosis of Diabetes. Diagnostics, 13(12). https://doi.org/10.3390/diagnostics13122087
Nokeri, T. C. (2021). Data Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn. In Data Science Solutions with Python: Fast and Scalable Models Using Keras, PySpark MLlib, H2O, XGBoost, and Scikit-Learn. Springer International Publishing. https://doi.org/10.1007/978-1-4842-7762-1
Parsian, Mahmoud. (2015). Data algorithms : recipes for scaling up with Hadoop and Spark (A. Spencer & M. Beaugureau, Eds.; Vol. 1). O’Reilly Media.
Phongying, M., & Hiriote, S. (2023). Diabetes Classification Using Machine Learning Techniques. Computation, 11(5). https://doi.org/10.3390/computation11050096
Quinto, B. (2020). Next-generation machine learning with spark: Covers XGBoost, LightGBM, Spark NLP, distributed deep learning with keras, and more. In Next-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More. Apress Media LLC. https://doi.org/10.1007/978-1-4842-5669-5
Reza, M. S., Amin, R., Yasmin, R., Kulsum, W., & Ruhi, S. (2024). Improving diabetes disease patients classification using stacking ensemble method with PIMA and local healthcare data. Heliyon, 10(2). https://doi.org/10.1016/j.heliyon.2024.e24536

This work is licensed under a Creative Commons Attribution 4.0 International License.