MENINGKATKAN PERFORMA KLASIFIKASI PADA DATA IMBALANCE MENGGUNAKAN GENERATIVE ADVERSARIAL NETWORK

  • SISKA RAHMADANI
  • 14002456

ABSTRAK

 

ABSTRAK Nama : Siska Rahmadani NIM : 14002456 Program Studi : Ilmu Komputer Fakultas : Teknologi Informasi Jenjang : Strata Dua (S2) Konsentrasi : Data Mining Judul : Meningkatkan Performa Algoritma Klasifikasi pada Data
Imbalance Menggunakan Generative Adversarial Network Masalah data imbalance sering ditemukan di dunia nyata dan secara signifikan mempengaruhi kinerja algoritma machine learning. Data imbalance berarti setiap target (class) tidak seimbang dan sering ditemukan pada data di bidang medis. Penelitian ini mengusulkan untuk mengeksplorasi metode Generative Adversarial Networks (GAN) berbasis oversampling untuk meningkatkan kinerja algoritma klasifikasi pada kumpulan data yang imbalance. Diharapkan GAN dapat memahami distribusi data yang sebenarnya. Metode yang diusulkan dievaluasi pada beberapa metrik yaitu recall, precision, F1 -score, AUC score, dan FP-rate. Hasil eksperimen membuktikan bahwa penerapan GAN berkinerja lebih baik daripada metode lain di beberapa metrik di seluruh dataset dan diharapkan dapat digunakan sebagai metode alternatif untuk meningkatkan kinerja model klasifikasi pada data medis yang imbalance. Kata kunci: GAN, Imbalance, Machine learning, Data medis, Oversampling
 

KATA KUNCI

Data Imbalance,Generative Adversarial Network


DAFTAR PUSTAKA

 

DAFTAR REFERENSI [1] N. V Chawla, “Data Mining for Imbalanced Datasets: An Overview,” in
Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds. Boston, MA: Springer US, 2010, pp. 875–886. [2] J. L. Leevy, T. M. Khoshgoftaar, R. A. Bauder, and N. Seliya, “A survey on addressing high-class imbalance in big data,” J. Big Data, vol. 5, no. 1, 2018. [3] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Syst. Appl., vol. 73, pp. 220–239, 2017. [4] M. Khushi et al., “A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data,” IEEE Access, vol. 9, pp. 109960–109975, 2021. [5] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,” 6th Int. Conf. Learn.
Represent. ICLR 2018 - Conf. Track Proc., pp. 1–26, 2018. [6] A. Ali-Gombe and E. Elyan, “MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network,”
Neurocomputing, vol. 361, pp. 212–221, 2019. [7] G. Douzas and F. Bacao, “Effective data generation for imbalanced learning using conditional generative adversarial networks,” Expert Syst.
Appl., vol. 91, pp. 464–471, 2018. [8] W. Dari, N. Miranda, S. Informasi, and U. N. Mandiri, “IMPLEMENTATION OF C4 . 5 ALGORITHM IN CLASSIFYING BREAST CANCER BASED ON MENOPAUSE AGE,” J. Pilar Nusa
Mandiri, vol. 17, no. 2, pp. 137–142, 2018. [9] R. Jain and D. V, “Data Mining Algorithms in Healthcare: An Extensive ReviewNo Title,” Fifth Int. Conf. I-SMAC (IoT Soc. Mobile, Anal. Cloud), pp. 728–733, 2021. [10] P. Edastama, A. S. Bist, and A. Prambudi, “Implementation Of Data
69 Mining On Glasses Sales Using The Apriori Algorithm,” Int. J. Cyber IT
Serv. Manag., vol. 1, no. 2, pp. 159–172, 2021. [11] A. Azevedo and M. F. Santos, “KDD, semma and CRISP-DM: A parallel overview,” MCCSIS’08 - IADIS Multi Conf. Comput. Sci. Inf. Syst. Proc.
Informatics 2008 Data Min. 2008, no. June, pp. 182–185, 2008. [12] F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf. Sci. (Ny)., vol. 513, pp. 429–441, 2020. [13] P. Khulbe, “Introduction To Machine Learning. What is Machine Learning?,” Mediom, 2022. [Online]. Available: https://medium.com/@preetikhulbey99/introduction-to-machine-learning- 7247e6ccbe65. [Accessed: 31 -May-2022]. [14] R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th Int. Conf. Inf. Commun. Syst. ICICS
2020, pp. 243–248, 2020. [15] S. Fotouhi, S. Asadi, and M. W. Kattan, “A comprehensive data level analysis for cancer diagnosis on imbalanced data,” J. Biomed. Inform., vol. 90, no. October 2017, p. 103089, 2019. [16] R. Geetha, S. Sivasubramanian, M. Kaliappan, S. Vimal, and S. Annamalai, “Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier,” J. Med.
Syst., vol. 43, no. 9, 2019. [17] M. Naseriparsa, A. Al-Shammari, M. Sheng, Y. Zhang, and R. Zhou, “RSMOTE: improving classification performance over imbalanced medical
datasets,” Heal. Inf. Sci. Syst., vol. 8, no. 1, pp. 1 –13, 2020. [18] B. Remeseiro and V. Bolon-Canedo, “A review of feature selection methods in medical applications,” Comput. Biol. Med., vol. 112, no. February, p. 103375, 2019. [19] S. Yemulwar, “Feature Selection Techniques. Feature Selection Techniques,” Medium, 2019. [Online]. Available: https://medium.com/analytics-vidhya/feature-selection-techniques
70 2614b3b7efcd. [Accessed: 25-Jun-2022]. [20] L. Tan, “Generating Synthetic Tabular Data,” Towards Data Science, 2021. [Online]. Available: https://towardsdatascience.com/generating-synthetictabular-data-503fe823f377. [Accessed: 31 -May-2022]. [21] A. Trabelsi, Z. Elouedi, and E. Lefevre, “Decision tree classifiers for evidential attribute values and class labels,” Fuzzy Sets Syst., vol. 366, pp. 46–62, 2019. [22] M. Rianto and R. Yunis, “Analisis Runtun Waktu Untuk Memprediksi Jumlah Mahasiswa Baru Dengan Model Random Forest,” Paradig. - J.
Komput. dan Inform., vol. 23, no. 1, 2021. [23] J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J. Big Data, vol. 6, no. 1, 2019. [24] A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” J. RESTI (Rekayasa Sist. dan
Teknol. Informasi), vol. 3, no. 2, pp. 196–201, 2019. [25] A. Bronshtein, “A Quick Introduction to K-Nearest Neighbors Algorithm,”
Medium, 2017. [Online]. Available: https://medium.com/@adi.bronshtein/a-quick-introduction-to-k-nearestneighbors-algorithm-62214cea29c7. [Accessed: 24-Jul-2022]. [26] M. H. Asnawi, I. Firmansyah, R. Novian, and R. S. Pontoh, “Perbandingan Algoritma Naïve Bayes, K-NN, dan SVM dalam Pengklasifikasian Sentimen Media Sosial,” Semin. Nas. Stat. X, vol. 10, no. 1, 2021. [27] B. Santoso, “An Analysis of Spam Email Detection Performance Assessment Using Machine Learning,” J. Online Inform., vol. 4, no. 1, p. 53, 2019. [28] S. Liu, Y. Wang, J. Zhang, C. Chen, and Y. Xiang, “Addressing the class imbalance problem in Twitter spam detection using ensemble learning,”
Comput. Secur., vol. 69, pp. 35–49, 2017. [29] B. Pranto, S. M. Mehnaz, S. Momen, and S. M. Huq, “Prediction of diabetes using cost sensitive learning and oversampling techniques on Bangladeshi and Indian female patients,” Proc. ICITR 2020 - 5th Int. Conf.
71
Inf. Technol. Res. Towar. New Digit. Enlight. , 2020. [30] M. Buda, A. Maki, and M. A. Mazurowski, “A systematic study of the class imbalance problem in convolutional neural networks,” Neural
Networks, vol. 106, pp. 249–259, 2018. [31] D. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,” BMC Med. Inform. Decis. Mak. , vol. 20, no. 1, pp. 1–16, 2020. [32] X. W. Liang, A. P. Jiang, T. Li, Y. Y. Xue, and G. T. Wang, “LR-SMOTE — An improved unbalanced data set oversampling based on K-means and SVM,” Knowledge-Based Syst., vol. 196, 2020. [33] L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni, “Modeling tabular data using conditional GAN,” Adv. Neural Inf. Process.
Syst., vol. 32, no. NeurIPS, 2019. [34] J. Engelmann and S. Lessmann, “Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning,” Expert Syst. Appl., vol. 174, no. September 2020, p. 114582, 2021. [35] S. Bourou, A. El Saer, T. H. Velivassaki, A. Voulkidis, and T. Zahariadis, “A review of tabular data synthesis using gans on an ids dataset,” Inf., vol. 12, no. 9, 2021. [36] R. A. Nugraha, H. F. Pardede, and A. Subekti, “Oversampling based on generative adversarial networks to overcome imbalance data in predicting fraud insurance claim,” Kuwait J. Sci., pp. 1–12, 2022. [37] A. Surip, M. A. Pratama, I. Ali, A. R. Dikananda, and A. I. Purnamasari, “Penerapan Machine Learning menggunakan algoritma C4.5 berbasis PSO dalam Menganalisa Data Siswa Putus Sekolah,” INFORMATICS Educ.
Prof. J. Informatics, vol. 5, no. 2, p. 147, 2021. [38] “Pima Indians Diabetes Database | Kaggle.” [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database. [Accessed: 20-Apr-2022]. [39] “Hepatitis Data | Kaggle.” [Online]. Available: https://www.kaggle.com/datasets/codebreaker619/hepatitis-data. [Accessed: 20-Apr-2022].
72 [40] “Breast Cancer Wisconsin (Diagnostic) Data Set | Kaggle.” [Online]. Available: https://www.kaggle.com/datasets/uciml/breast-cancerwisconsin-data. [Accessed: 20-Apr-2022]. [41] “Heart Disease Dataset | Kaggle.” [Online]. Available: https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset. [Accessed: 20-Apr-2022].
 

Detail Informasi

Tesis ini ditulis oleh :

  • Nama : SISKA RAHMADANI
  • NIM : 14002456
  • Prodi : Ilmu Komputer
  • Kampus : Margonda
  • Tahun : 2022
  • Periode : I
  • Pembimbing : Dr. Agus Subekti, M.T
  • Asisten : Dr. Muhammad Haris, M. Eng
  • Kode : 0003.S2.IK.TESIS.I.2022
  • Diinput oleh : RKY
  • Terakhir update : 16 Mei 2023
  • Dilihat : 202 kali

TENTANG PERPUSTAKAAN


PERPUSTAKAAN UNIVERSITAS NUSA MANDIRI


E-Library Perpustakaan Universitas Nusa Mandiri merupakan platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.


INFORMASI


Alamat : Jln. Jatiwaringin Raya No.02 RT08 RW 013 Kelurahan Cipinang Melayu Kecamatan Makassar Jakarta Timur

Email : perpustakaan@nusamandiri.ac.id

Jam Operasional
Senin - Jumat : 08.00 s/d 20.00 WIB
Isitirahat Siang : 12.00 s/d 13.00 WIB
Istirahat Sore : 18.00 s/d 19.00 WIB

Perpustakaan Universitas Nusa Mandiri @ 2020