Name: Optimalisasi Machine Learning dengan Hyperparameter Tuning untuk Prediksi Cardiovascular Disease
Author: FADLAN HAMID ALFEBI

Optimalisasi Machine Learning dengan Hyperparameter Tuning untuk Prediksi Cardiovascular Disease

FADLAN HAMID ALFEBI
14210242

ABSTRAK

ABSTRAK

Nama : Fadlan Hamid Alfebi

NIM : 14210242

Program Studi : Ilmu Komputer

Fakultas : Teknologi Informasi

Jenjang : Strata Dua (S2)

Konsentrasi : Data Mining

Judul : “Optimalisasi Machine Learning dengan Hyperparameter Tuning untuk Prediksi Cardiovascular Disease”

Penyakit kardiovaskular (CVD) adalah penyebab utama kematian di seluruh dunia. Pencegahan primer dengan prediksi awal timbulnya penyakit. Menggunakan data laboratorium dari National Health and Nutrition Examination Survey (NHANES) pada jangka waktu 2017-2020 (N=8.544), kami mengoptimalisasi algoritma machine learning (ML) dengan hyperparameter tuning untuk mengklasifikasikan individu yang berisiko. Model ML dievaluasi berdasarkan kinerja klasifikasinya setelah dilakukan beberapa teknik data preprocessing, di antaranya feature selection, imputasi missing value, dan teknik resampling. Pada model ML dasar, Logistic Regression (LR) memiliki hasil terbaik dibanding model ML lain dengan akurasi sebesar 91.46% dan area under receiver operating characteristics (AUROC) sebesar 92.22%. Setelah diterapkan hyperparameter tuning HyperOpt, akurasi meningkat menjadi 92.98% dan AU-ROC menjadi 93.90%. Performa akhir dalam memprediksi CVD mengungguli studi sebelumnya.

KATA KUNCI

Klasifikasi,Cardiovascular Disease,machine learning,Logistic Regression,Hyperparameter Tuning

DAFTAR PUSTAKA

DAFTAR REFERENSI

[1] T. Gaziano, K. S. Reddy, F. Paccaud, S. Horton, and V. Chaturvedi, “Cardiovascular disease,” Dis. Control Priorities Dev. Countries. 2nd Ed., 2006.

[2] Y. Ruan et al., “Cardiovascular disease (CVD) and associated risk factors among older adults in six low-and middle-income countries: Results from SAGE Wave 1,” BMC Public Health, vol. 18, no. 1, p. 778, Jun. 2018, doi: 10.1186/s12889-018-5653-9.

[3] R. Lozano et al., “Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010,” Lancet, vol. 380, no. 9859, pp. 2095– 2128, 2012.

[4] A. Kumar et al., “Low socioeconomic status is an independent risk factor for ischemic stroke: a case-control study in North Indian population,” Neuroepidemiology, vol. 44, no. 3, pp. 138–143, 2015.

[5] W. Yu, T. Liu, R. Valdez, M. Gwinn, and M. J. Khoury, “Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes,” BMC Med. Inform. Decis. Mak., vol. 10, no. 1, p. 16, 2010, doi: 10.1186/1472-6947-10-16.

[6] K. S. Yew and E. Cheng, “Acute stroke diagnosis,” Am. Fam. Physician, vol. 80, no. 1, p. 33, 2009.

[7] C. Kreatsoulas, H. S. Shannon, M. Giacomini, J. L. Velianou, and S. S. Anand, “Reconstructing angina: cardiac symptoms are the same in women and men,” JAMA Intern. Med., vol. 173, no. 9, pp. 829–833, 2013.

[8] K. Chayakrit, Z. HongJu, W. Zhen, A. Mehmet, and K. Takeshi, “Artificial Intelligence in Precision Cardiovascular Medicine,” J. Am. Coll. Cardiol., vol. 69, no. 21, pp. 2657–2664, May 2017, doi: 10.1016/j.jacc.2017.03.571.

[9] A. Rajkomar, J. Dean, and I. Kohane, “Machine learning in medicine,” N. Engl. J. Med., vol. 380, no. 14, pp. 1347–1358, 2019.

[10] R. R. Pratama, “Analisis Model Machine Learning Terhadap Pengenalan Aktifitas Manusia,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 19, no. 2, pp. 302–311, 2020.

[11] I. Ahmad, S. Samsugi, and Y. Irawan, “Implementasi Data Mining Sebagai Pengolahan Data,” J. Teknoinfo, vol. 16, no. 1, p. 46, 2022.

[12] APMG International, “Enterprise Big Data Professional,” in Enterprise Big Data Framework, 1.6., 2021, p. 87.

[13] P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, vol. 55, no. 10, pp. 78–87, 2012.

[14] Y. Ba?tanlar and M. Özuysal, “Introduction to machine learning,” miRNomics MicroRNA Biol. Comput. Anal., pp. 105–128, 2014.

[15] E. F. Morales and H. J. Escalante, “A brief introduction to supervised, unsupervised, and reinforcement learning,” in Biosignal processing and classification using computational learning and intelligence, Elsevier, 43 Program Studi Ilmu Komputer (S2) Universitas Nusa Mandiri 2022, pp. 111–129.

[16] G. James, D. Witten, T. Hastie, R. Tibshirani, and J. Taylor, An introduction to statistical learning: With applications in python. Springer Nature, 2023.

[17] T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, R. Tibshirani, and J. Friedman, “Unsupervised learning,” Elem. Stat. Learn. Data mining, inference, Predict., pp. 485–585, 2009.

[18] X. Zhu and A. B. T. A.-T. T.- Goldberg, “Introduction to semi-supervised learning.” Springer Cham, Switzerland, Cham, Switzerland, 2009. doi: 10.2200/S00196ED1V01Y200906AIM006 LK - https://worldcat.org/title/428541480.

[19] Z. Ding, Y. Huang, H. Yuan, and H. Dong, “Introduction to reinforcement learning,” Deep Reinf. Learn. Fundam. Res. Appl., pp. 47–123, 2020.

[20] M. Mouhajir, M. Nechba, and Y. Sedjari, “High Performance Computing Applied to Logistic Regression: A CPU and GPU Implementation Comparison,” in 2023 IEEE International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings), 2023, pp. 1– 5. doi: 10.1109/AIBThings58340.2023.10291024.

[21] D. H. Ha et al., “Quadratic Discriminant Analysis Based Ensemble Machine Learning Models for Groundwater Potential Modeling and Mapping,” Water Resour. Manag., vol. 35, no. 13, pp. 4415–4433, 2021, doi: 10.1007/s11269-021-02957-6.

[22] E. K. Ampomah, G. Nyame, Z. Qin, P. C. Addo, E. O. Gyamfi, and M. Gyan, “Stock market prediction with gaussian naïve bayes machine learning algorithm,” Informatica, vol. 45, no. 2, 2021.

[23] S. Misra, H. Li, and J. He, Machine learning for subsurface characterization. Gulf Professional Publishing, 2019.

[24] J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189– 215, 2020, doi: https://doi.org/10.1016/j.neucom.2019.10.118.

[25] P. Lalwani, M. K. Mishra, J. S. Chadha, and P. Sethi, “Customer churn prediction system: a machine learning approach,” Computing, pp. 1–24, 2022.

[26] M. A. Pratama, M. Munawaroh, and W. J. Pranoto, “Perbandingan Performa Algoritma Linear Regresi dan Random Forest untuk Prediksi Harga Bawang Merah di Kota Samarinda,” TEKTONIK J. Ilmu Tek., vol. 1, no. 2, pp. 172–182, 2024.

[27] W. Jing, B. Qian, and L. Yannian, “Study on food safety risk based on LightGBM model: a review,” Food Sci. Technol., vol. 42, p. e42021, 2022.

[28] P. Jeyaprakaash and K. Sashirekha, “Accuracy Measure of Customer Churn Prediction in Telecom Industry using Adaboost over Decision Tree Algorithm,” J. Pharm. Negat. Results, pp. 1495–1503, 2022.

[29] M. Chen, Q. Liu, S. Chen, Y. Liu, C.-H. Zhang, and R. Liu, “XGBoostbased algorithm interpretation and application on post-fault transient 44 Program Studi Ilmu Komputer (S2) Universitas Nusa Mandiri stability status prediction of power system,” IEEE Access, vol. 7, pp. 13149–13158, 2019.

[30] M. F. Nugroho, “Fitur Seleksi Forward Selection Untuk Menetukan Atribut Yang Berpengaruh Pada Klasifikasi Kelulusan Mahasiswa Fakultas Ilmu Komputer UNAKI Semarang Menggunakan Algoritma Naive Bayes,” J. Inform. Upgris, vol. 3, no. 1, 2017.

[31] K. Pearson, “X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling,” London, Edinburgh, Dublin Philos. Mag. J. Sci., vol. 50, no. 302, pp. 157–175, 1900, doi: 10.1080/14786440009463897.

[32] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Syst. Appl., vol. 73, pp. 220–239, 2017.

[33] N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.

[34] M. R. Smith, T. Martinez, and C. Giraud-Carrier, “An instance level analysis of data complexity,” Mach. Learn., vol. 95, no. 2, pp. 225–256, 2014.

[35] G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, 2004.

[36] B. H. Shekar and G. Dagnew, “Grid search-based hyperparameter tuning and classification of microarray cancer data,” in 2019 second international conference on advanced computational and communication paradigms (ICACCP), IEEE, 2019, pp. 1–8.

[37] C. Witt, “Worst-case and average-case approximations by simple randomized search heuristics,” in Annual Symposium on Theoretical Aspects of Computer Science, Springer, 2005, pp. 44–56.

[38] V. Nguyen, “Bayesian optimization for accelerating hyper-parameter tuning,” in 2019 IEEE second international conference on artificial intelligence and knowledge engineering (AIKE), IEEE, 2019, pp. 302–305.

[39] J. Patel, D. TejalUpadhyay, and S. Patel, “Heart disease prediction using machine learning and data mining technique,” Hear. Dis., vol. 7, no. 1, pp. 129–137, 2015.

[40] A. Singh and R. Kumar, “Heart disease prediction using machine learning algorithms,” in 2020 international conference on electrical and electronics engineering (ICE3), IEEE, 2020, pp. 452–457.

[41] M. A. Alim, S. Habib, Y. Farooq, and A. Rafay, “Robust heart disease prediction: a novel approach based on significant feature and ensemble learning model,” in 2020 3rd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), IEEE, 2020, pp. 1– 5.

[42] R. Kannan and V. Vasanthi, “Machine learning algorithms with ROC curve 45 Program Studi Ilmu Komputer (S2) Universitas Nusa Mandiri for predicting and diagnosing the heart disease,” in Soft computing and medical bioinformatics, Springer, 2019, pp. 63–72.

[43] R. Atallah and A. Al-Mousa, “Heart disease detection using machine learning majority voting ensemble method,” in 2019 2nd international conference on new trends in computing sciences (ictcs), IEEE, 2019, pp. 1– 6.

[44] P. S. Kohli and S. Arora, “Application of machine learning in disease prediction,” in 2018 4th International conference on computing communication and automation (ICCCA), IEEE, 2018, pp. 1–4.

[45] A. Ed-Daoudy and K. Maalmi, “Real-time machine learning for early detection of heart disease using big data approach,” in 2019 international conference on wireless technologies, embedded and intelligent systems (WITS), IEEE, 2019, pp. 1–5.

[46] A. Dinh, S. Miertschin, A. Young, and S. D. Mohanty, “A data-driven approach to predicting diabetes and cardiovascular disease with machine learning,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, pp. 1–15, 2019.

[47] K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” J. Big data, vol. 3, no. 1, pp. 1–40, 2016.

[48] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.

[49] MathWork, “Detector Performance Analysis Using ROC Curves - MATLAB,” Www.Mathworks.Com. Accessed: Nov. 19, 2022. [Online]. Available: http://www.mathworks.com/help/phased/examples/detectorperformance-analysis-using-roc-curves.html

[50] F. H. Alfebi and M. D. Anasanti, “Improving Cardiovascular Disease Prediction by Integrating Imputation, Imbalance Resampling, and Feature Selection Techniques into Machine Learning Model,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 17, no. 1, pp. 55–66, 2023.

[51] V. N. G. Raju, K. P. Lakshmi, V. M. Jain, A. Kalidindi, and V. Padma, “Study the influence of normalization/transformation process on the accuracy of supervised classification,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), IEEE, 2020, pp. 729–735.

Detail Informasi

Tesis ini ditulis oleh :

Nama : FADLAN HAMID ALFEBI
NIM : 14210242
Prodi : Ilmu Komputer
Kampus : Margonda
Tahun : 2023
Periode : II
Pembimbing : Dr. Muhammad Haris, S.Kom, M.Eng
Asisten :
Kode : 0056.S2.IK.TESIS.II.2023
Diinput oleh : NZH
Terakhir update : 22 Juli 2024
Dilihat : 208 kali

TENTANG PERPUSTAKAAN

E-Library Perpustakaan Universitas Nusa Mandiri merupakan platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.