OVERSAMPLING DAN ENSAMBLE LEARNING UNTUK PREDIKSI KEBERHASILAN STUDI MAHASISWA
- LILI DWI YULIANTO
- 14210161
ABSTRAK
ABSTRAK
Nama : Lili Dwi Yulianto
NIM : 14210161
Program Studi : Ilmu Komputer
Fakultas : Teknologi Informasi
Jenjang : Strata Dua (S2)
Konsentrasi : Data Mining
Jenis Karya : Tesis
Kualitas Pendidikan pada perguruan tinggi dapat diukur berdasarkan kualitas keberhasilan studi mahasiswa, apakah mahasiswa dapat lulus sesuai dengan batas masa studi atau mahasiswa tidak dapat menyelesaikan masa studinya. Jumlah mahasiswa lulus dibandingkan dengan habis masa studi terlampau sangat jauh, ini merupakan kondisi dimana datanya tidak seimbang atau imbalance, data imbalance merupakan kondisi kelas minoritas memiliki data lebih banyak dibandingkan dengan kelas mayoritas, untuk mengatasi data imbalance penelitain ini menggunakan metode Synthetic Minority Oversampling Technique (SMOTE), Random Undersampling dan Random Oversampling Algoritma yang digunakan yaitu Decision Tree (DT), Random Forest (RF), AdaBoost Classifier (AB), Gradient Boosting (GB) dan Extra Trees (ET), Dimensional Reduction yang digunakan yaitu Principal Component Analysis (PCA), Independent Component Analysis (ICA) dan Locally Linear Embedding (LLE). Features Selection yang digunakan yaitu Random Forest Features importance, Analysis of variance (anova) dan Recursive Feature Elimination (RFE). Hasil yang didapat Algoritma Random Forest dengan Analysis of variance (ANOVA) dan Random Oversampling menghasilkan nilai precission sebesar 85,15%, recall sebesar 84,00%, F-1 score sebesar 84,31%, dengan menggunakan SMOTE. ROC AUC Score mendapatkan nilai 88,23% dengan menggunakan algoritma Extra Trees dipadukan dengan Random Oversampling, dapat disimpuljkan metode Features Selection dan Oversampling memiliki performance yang cukup baik pada dataset ini.
KATA KUNCI
Oversampling
DAFTAR PUSTAKA
DAFTAR PUSTAKA
[1] S. Kim, E. Choi, Y. K. Jun, and S. Lee, “Student Dropout Prediction for University with High Precision and Recall,” Appl. Sci., vol. 13, no. 10, pp. 1–20, 2023, doi: 10.3390/app13106275.
[2] P. Balaji, S. Alelyani, A. Qahmash, and M. Mohana, “Contributions of machine learning models towards student academic performance prediction: A systematic review,” Appl. Sci., vol. 11, no. 21, 2021, doi: 10.3390/app112110007.
[3] D. Buenaño-Fernández, D. Gil, and S. Luján-Mora, “Application of machine learning in predicting performance for computer engineering students: A case study,” Sustain., vol. 11, no. 10, pp. 1–18, 2019, doi: 10.3390/su11102833.
[4] A. O. Alsayed et al., “Selection of the Right Undergraduate Major by Students Using Supervised Learning Techniques,” Appl. Sci., vol. 11, no. 22, p. 10639, 2021, [Online]. Available: https://doi.org/10.3390/%0Aapp112210639%0A.
[5] W. Villegas-Ch, J. Govea, and S. Revelo-Tapia, “Improving Student Retention in Institutions of Higher Education through Machine Learning: A Sustainable Approach,” Sustain., vol. 15, no. 19, 2023, doi: 10.3390/su151914512.
[6] J. L. Rastrollo-Guerrero, J. A. Gómez-Pulido, and A. Durán-Domínguez, “Analyzing and predicting students’ performance by means of machine learning: A review,” Appl. Sci., vol. 10, no. 3, 2020, doi: 10.3390/app10031042.
[7] S. Rajendran, S. Chamundeswari, and A. A. Sinha, “Predicting the academic performance of middle- and high-school students using machine learning algorithms,” Soc. Sci. Humanit. Open, vol. 6, no. 1, p. 100357, 2022, doi: 10.1016/j.ssaho.2022.100357.
[8] K. Jawad, M. A. Shah, and M. Tahir, “Students’ Academic Performance and Engagement Prediction in a Virtual Learning Environment Using Random Forest with Data Balancing,” Sustain., vol. 14, no. 22, 2022, doi: 10.3390/su142214795.
[9] H. Mastour, T. Dehghani, E. Moradi, and S. Eslami, “Early prediction of medical students’ performance in high-stakes examinations using machine learning approaches,” Heliyon, vol. 9, no. 7, p. e18248, 2023, doi: 10.1016/j.heliyon.2023.e18248.
[10] H. C. Chen et al., “Week-Wise Student Performance Early Prediction in Virtual Learning Environment Using a Deep Explainable Artificial Intelligence,” Appl. Sci., vol. 12, no. 4, pp. 1–16, 2022, doi: 10.3390/app12041885. 73 Program Studi Komputer (S2) Universitas Nusa Mandiri
[11] A. Daza Vergaray, J. C. H. Miranda, J. B. Cornelio, A. R. López Carranza, and C. F. Ponce Sánchez, “Predicting the depression in university students using stacking ensemble techniques over Oversampling method,” Informatics Med. Unlocked, vol. 41, no. June, p. 101295, 2023, doi: 10.1016/j.imu.2023.101295.
[12] T. M. Barros, P. A. S. Neto, I. Silva, and L. A. Guedes, “Predictive models for imbalanced data: A school dropout perspective,” Educ. Sci., vol. 9, no. 4, 2019, doi: 10.3390/educsci9040275.
[13] S. Thaiparnit, N. Chumuang, and M. Ketcham, “A Comparitive Study of Clasification Liver Dysfunction with Machine Learning,” 2018 Int. Jt. Symp. Artif. Intell. Nat. Lang. Process. iSAI-NLP 2018 - Proc., vol. 283, pp. 1–4, 2018, doi: 10.1109/iSAI-NLP.2018.8692808.
[14] A. Al-Zawqari, D. Peumans, and G. Vandersteen, “A flexible feature selection approach for predicting students’ academic performance in online courses,” Comput. Educ. Artif. Intell., vol. 3, no. November, p. 100103, 2022, doi: 10.1016/j.caeai.2022.100103.
[15] “Random Forest Classifier Tutorial: How to Use Tree-Based Algorithms for Machine Learning.” https://www.freecodecamp.org/news/how-to-usethe-tree-based-algorithm-for-machine-learning/ (accessed Aug. 10, 2023).
[16] V. Renò, E. Stella, C. Patruno, A. Capurso, G. Dimauro, and R. Maglietta, “Learning Analytics: Analysis of Methods for Online Assessment,” Appl. Sci., vol. 12, no. 18, pp. 1–10, 2022, doi: 10.3390/app12189296.
[17] M. V. Martins, L. Baptista, J. Machado, and V. Realinho, “Multi-Class Phased Prediction of Academic Performance and Dropout in Higher Education,” Appl. Sci., vol. 13, no. 8, 2023, doi: 10.3390/app13084702.
[18] “All About Adaboost. The article will explore the idea of… | by Akash Dawari | Towards AI.” https://pub.towardsai.net/all-about-adaboostba232b5521e9 (accessed Aug. 10, 2023).
[19] L. Yan and Y. Liu, “An ensemble prediction model for potential student recommendation using machine learning,” Symmetry (Basel)., vol. 12, no. 5, pp. 1–17, 2020, doi: 10.3390/SYM12050728.
[20] J. K. Tsai and C. H. Hung, “Improving adaboost classifier to predict enterprise performance after covid-19,” Mathematics, vol. 9, no. 18, pp. 1– 10, 2021, doi: 10.3390/math9182215.
[21] “Learn about Gradient Boosting.” https://datascience.fm/learn-beginnergradient-boosting/ (accessed Aug. 10, 2023).
[22] X. Yu et al., “Load Forecasting Based on Smart Meter Data and Gradient Boosting Decision Tree,” Proc. - 2019 Chinese Autom. Congr. CAC 2019, pp. 4438–4442, 2019, doi: 10.1109/CAC48633.2019.8996810.
[23] E. Nimy and M. Mosia, “Identifying At-Risk Students for Early Intervention – a Probabilistic Machine Learning Approach,” SSRN 74 Program Studi Komputer (S2) Universitas Nusa Mandiri Electron. J., 2022, doi: 10.2139/ssrn.4253016.
[24] M. Li and Y. Fu, “Prediction of Supply Chain Financial Credit Risk Based on PCA-GA-SVM Model,” Sustain., vol. 14, no. 24, 2022, doi: 10.3390/su142416376.
[25] S. Poudyal, M. J. Mohammadi-Aragh, and J. E. Ball, “Hybrid Feature Extraction Model to Categorize Student Attention Pattern and Its Relationship with Learning,” Electron., vol. 11, no. 9, 2022, doi: 10.3390/electronics11091476.
[26] L. Hao and F. Xu, “An investigation on electronic nose diagnosis of liver cancer,” Proc. - 2017 10th Int. Congr. Image Signal Process. Biomed. Eng. Informatics, CISP-BMEI 2017, vol. 2018-Janua, pp. 1–5, 2018, doi: 10.1109/CISP-BMEI.2017.8302211.
[27] R. I. Rasel, N. Sultana, S. Akther, and A. Haroon, “Predicting Electric Energy Use of a Low Energy House: A Machine Learning Approach,” 2nd Int. Conf. Electr. Comput. Commun. Eng. ECCE 2019, pp. 1–6, 2019, doi: 10.1109/ECACE.2019.8679479.
[28] A. Garc, “Exploring ICA for time series decomposition,” Demogr. Res., no. May, 2011.
[29] X. Xing, S. Du, and K. Wang, “Robust Hessian locally linear embedding techniques for high-dimensional data,” Algorithms, vol. 9, no. 2, pp. 1–21, 2016, doi: 10.3390/a9020036.
[30] J. X. Leon-Medina, M. Anaya, and D. A. Tibaduiza, “Locally Linear Embedding as Nonlinear Feature Extraction to Discriminate Liquids with a Cyclic Voltammetric Electronic Tongue,” p. 56, 2021, doi: 10.3390/csac2021-10426.
[31] B. Yao, J. Su, L. Wu, and Y. Guan, “Modified local linear embedding algorithm for rolling element bearing fault diagnosis,” Appl. Sci., vol. 7, no. 11, 2017, doi: 10.3390/app7111178.
[32] N. Mduma, “Data Balancing Techniques for Predicting Student Dropout Using Machine Learning,” Data, vol. 8, no. 3, 2023, doi: 10.3390/data8030049.
[33] S. Sawangarreerak and P. Thanathamathee, “Random forest with sampling techniques for handling imbalanced prediction of university student depression,” Inf., vol. 11, no. 11, pp. 1–13, 2020, doi: 10.3390/info11110519.
[34] M. M. Taamneh, S. Taamneh, A. H. Alomari, and M. Abuaddous, “Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use,” Sustain., vol. 15, no. 13, 2023, doi: 10.3390/su151310668.
[35] T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced 75 Program Studi Komputer (S2) Universitas Nusa Mandiri Classification in Educational Data Mining,” Inf., vol. 14, no. 1, 2023, doi: 10.3390/info14010054.
[36] B. K. Yousafzai et al., “Student-performulator: Student academic performance using hybrid deep neural network,” Sustain., vol. 13, no. 17, pp. 1–21, 2021, doi: 10.3390/su13179775.
[37] A. I. Kadhim, Y. N. Cheah, N. H. Ahamed, and L. A. Salman, “Feature extraction for co-occurrence-based cosine similarity score of text documents,” 2014 IEEE Student Conf. Res. Dev. SCOReD 2014, pp. 2–5, 2014, doi: 10.1109/SCORED.2014.7072954.
[38] V. Realinho, J. Machado, L. Baptista, and M. V. Martins, “Predicting Student Dropout and Academic Success,” Data, vol. 7, no. 11, 2022, doi: 10.3390/data7110146.
[39] K. Ahammed, M. S. Satu, M. I. Khan, and M. Whaiduzzaman, “Predicting Infectious State of Hepatitis C Virus Affected Patient’s Applying Machine Learning Methods,” 2020 IEEE Reg. 10 Symp. TENSYMP 2020, no. June, pp. 1371–1374, 2020, doi: 10.1109/TENSYMP50017.2020.9230464.
[40] T. I. Trishna, S. U. Emon, R. R. Ema, G. I. H. Sajal, S. Kundu, and T. Islam, “Detection of Hepatitis (A, B, C and E) Viruses Based on Random Forest, K-nearest and Naïve Bayes Classifier,” 2019 10th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2019, pp. 1–7, 2019, doi: 10.1109/ICCCNT45670.2019.8944455.
[41] C. F. Rodríguez-Hernández, M. Musso, E. Kyndt, and E. Cascallar, “Artificial neural networks in academic performance prediction: Systematic implementation and predictor evaluation,” Comput. Educ. Artif. Intell., vol. 2, no. December 2020, 2021, doi: 10.1016/j.caeai.2021.100018.
[42] W. Bagunaid, N. Chilamkurti, and P. Veeraraghavan, “AISAR: Artificial Intelligence-Based Student Assessment and Recommendation System for E-Learning in Big Data,” Sustain., vol. 14, no. 17, 2022, doi: 10.3390/su141710551.
[43] M. Hameed and N. Akhtar, “Student Performance Prediction in Intelligent E-Learning for Tertiary Education How to Cite: Mustafa Hameed and Nadeem Akhtar (2021). Student Performance Prediction in Intelligent ELearning for Tertiary Education. International Journal of Computational I,” Int. J. Comput. Intell. Control, vol. 13, no. 2, pp. 293–299, 2021.
Detail Informasi
Tesis ini ditulis oleh :
- Nama : LILI DWI YULIANTO
- NIM : 14210161
- Prodi : Ilmu Komputer
- Kampus : Margonda
- Tahun : 2023
- Periode : I
- Pembimbing : Dr. Hilman Ferdinandus Pardede, S.T, M.EICT
- Asisten :
- Kode : 0035.S2.IK.TESIS.I.2023
- Diinput oleh : NZH
- Terakhir update : 24 Juni 2024
- Dilihat : 87 kali
TENTANG PERPUSTAKAAN

E-Library Perpustakaan Universitas Nusa Mandiri merupakan
platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.
INFORMASI
Alamat : Jln. Jatiwaringin Raya No.02 RT08 RW 013 Kelurahan Cipinang Melayu Kecamatan Makassar Jakarta Timur
Email : perpustakaan@nusamandiri.ac.id
Jam Operasional
Senin - Jumat : 08.00 s/d 20.00 WIB
Isitirahat Siang : 12.00 s/d 13.00 WIB
Istirahat Sore : 18.00 s/d 19.00 WIB
Perpustakaan Universitas Nusa Mandiri @ 2020