Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest
- Muhamad Azhar
- 14002310
ABSTRAK
ABSTRAK
Nama : Muhamad Azhar
NIM : 14002310
Program Studi : Ilmu Komputer
Jenjang : Strata Dua (S2)
Konsentrasi : Data Mining
Judul Tesis : “Classification of English Speakers Dialect Using Random Forest”
Pengenalan suara merupakan salah satu bidang riset yang cukup penting yang dimana saat ini banyak digunakan secara luas untuk digunakan berbagai aplikasi [1]. Proses identifikasi pengenalan suara khususnya pengenalan dialek tidaklah mudah, hal ini disebabkan oleh bagaimana pembicara dalam melakukan penuturan dialek. Disisi lain, ada nya peningkatan peminatan dilingkungan peneliti atau komunitas Machine Learning dalam menilai pengaruh antara ketidakseimbangan kelas dan tumpang tindih dalam banyak nya ragam pada tehnik klasifikasi. Penelitian ini menggunakan dataset yang ada pada repository http://accent.gmu.edu/ berdasarkan penutur asli (Native Speaker or Leanguage Accent). Untuk kinerja model usulan MFCC + RF + ROS + GridSearchCV memiliki nilai yang paling baik dengan nilai akurasi 0.91 dan AUC 0.95.
Kata kunci:
Pengenalan Suara, Imbalance, Klasifikasi, MFCC
KATA KUNCI
Data Mining
DAFTAR PUSTAKA
DAFTAR PUSTAKA
[1] R. B. Handoko and S. Suyanto, “Klasifikasi Gender Berdasarkan Suara Menggunakan Support Vector Machine,” Indones. J. Comput., vol. 4, no. 1, p. 9, Mar. 2019, doi: 10.21108/INDOJC.2019.4.1.244.
[2] I. S. Permana, Y. Indrawaty, and A. Zulkarnain, “IMPLEMENTASI METODE MFCC DAN DTW UNTUK PENGENALAN JENIS SUARA PRIA DAN WANITA,” MIND J., vol. 3, no. 1, pp. 61–76, Jan. 2019, doi: 10.26760/mindjournal.v3i1.61-76.
[3] N. Nurhamidah, E. C. Djamal, and R. Ilyas, “Perintah Menggunakan Sinyal Suara dengan Mel- Frequency Cepstrum Coefficients dan Learning Vector Quantization,” Semin. Nas. Apl. Teknol. Inf. 2017, 2017.
[4] T. Bent, E. Atagi, A. Akbik, and E. Bonifield, “Classification of regional dialects, international dialects, and nonnative accents,” J. Phon., 2016, doi: 10.1016/j.wocn.2016.08.004.
[5] B. S. Raghuwanshi and S. Shukla, “SMOTE based class-specific extreme learning machine for imbalanced learning,” Knowledge-Based Syst., vol. 187, p. 104814, Jan. 2020, doi: 10.1016/j.knosys.2019.06.022.
[6] Steven H. Weinberger, “Speech Accent Archive,” Speech Accent Archive. https://accent.gmu.edu/.
[7] Y. Singh, A. Pillay, and E. Jembere, “Features of Speech Audio for Accent Recognition,” in 2020 International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems (icABCD), Aug. 2020, pp. 1–6, doi: 10.1109/icABCD49160.2020.9183893.
[8] G. Danao, J. Torres, J. V. Tubio, and L. Vea, “Tagalog regional accent classification in the Philippines,” in HNICEM 2017 - 9th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management, 2017, doi: 10.1109/HNICEM.2017.8269545.
[9] S. Vluymans, “Learning from Imbalanced Data,” in Studies in Computational Intelligence, 2019, pp. 81–110.
[10] A. Setiawan, A. Hidayatno, and R. Isnanto, Rizal, “Aplikasi Pengenalan Ucapan dengan Ekstraksi Mel-Frequency Cepstrum Coefficients (MFCC) Melalui Jaringan Syaraf Tiruan (JST) Learning Vector Quantization (LVQ) untuk Mengoperasikan Kursor Komputer,” Apl. Pengenalan Ucapan dengan Ekstraksi Mel-Frequency Cepstrum Coefficients Melalui Jar. Syaraf Tiruan Learn. Vector Quantization untuk Mengoperasikan Kursor Komput., 2011, doi: 10.12777/transmisi.13.3.82-86.
[11] A. Lukman and W. T. Saputro, “IDENTIFIKASI NYAMUK CULEX DAN AEDES AEGYPTI BETINA MENGGUNAKAN LINIER PREDICTIVE CODING DAN JARINGAN SYARAF TIRUAN LEARNING VECTOR QUANTIZATION,” JIKO (Jurnal Inform. dan Komputer), vol. 1, no. 2, Sep. 2016, doi: 10.26798/jiko.2016.v1i2.33.
[12] D. Satria and M. Mushthofa, “Perbandingan Metode Ekstraksi Ciri Histogram dan PCA untuk Mendeteksi Stoma pada Citra Penampang Daun Freycinetia,” J. Ilmu Komput. dan Agri-Informatika, 2013, doi: 10.29244/jika.2.1.20-28.
[13] A. K. H. Al-Ali, D. Dean, B. Senadji, V. Chandran, and G. R. Naik, “Enhanced Forensic Speaker Verification Using a Combination of DWT and MFCC Feature Warping in the Presence of Noise and Reverberation Conditions,” IEEE Access, vol. 5, pp. 15400–15413, 2017, doi: 10.1109/ACCESS.2017.2728801.
[14] Kunxia Wang, Ning An, Bing Nan Li, Yanyong Zhang, and Lian Li, “Speech Emotion Recognition Using Fourier Parameters,” IEEE Trans. Affect. Comput., vol. 6, no. 1, pp. 69–75, Jan. 2015, doi: 10.1109/TAFFC.2015.2392101.
[15] L. Juvela et al., “Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2018, doi: 10.1109/ICASSP.2018.8461852.
[16] M. Alsulaiman, G. Muhammad, and Z. Ali, “Comparison of voice features for Arabic speech recognition,” in 2011 Sixth International Conference on Digital Information Management, Sep. 2011, pp. 90–95, doi: 10.1109/ICDIM.2011.6093369.
[17] D. B. Manurung, B. Dirgantoro, and C. Setianingsih, “Speaker Recognition For Digital Forensic Audio Analysis Using Learning Vector Quantization Method,” in 2018 IEEE International Conference on Internet of Things and Intelligence System (IOTAIS), Nov. 2018, pp. 221–226, doi: 10.1109/IOTAIS.2018.8600852.
[18] D. K. Putra, I. Iwut, and R. D. Atmaja, “Simulasi Dan Analisis Speaker Recognition Menggunakan Metode Mel Frequency Cepstrum Coefficient (mfcc) Dan Gaussian Mixture Model (gmm),” eProceedings Eng., 2017.
[19] A. Sonak, R. Patankar, and N. Pise, “A new approach for handling imbalanced dataset using ANN and genetic algorithm,” in International Conference on Communication and Signal Processing, ICCSP 2016, 2016, doi: 10.1109/ICCSP.2016.7754521.
[20] Shuo Wang and Xin Yao, “Using Class Imbalance Learning for Software Defect Prediction,” IEEE Trans. Reliab., vol. 62, no. 2, pp. 434–443, Jun. 2013, doi: 10.1109/TR.2013.2259203.
[21] S. Barua, M. M. Islam, X. Yao, and K. Murase, “MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 2, pp. 405–425, Feb. 2014, doi: 10.1109/TKDE.2012.232.
[22] A. Syukron and A. Subekti, “Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit,” J. Inform., vol. 5, no. 2, pp. 175–185, Sep. 2018, doi: 10.31311/ji.v5i2.4158.
[23] M. S. Santos, J. P. Soares, P. H. Abreu, H. Araujo, and J. Santos, “Cross-Validation for Imbalanced Datasets: Avoiding Overoptimistic and Overfitting Approaches [Research Frontier],” IEEE Comput. Intell. Mag., vol. 13, no. 4, pp. 59–76, Nov. 2018, doi: 10.1109/MCI.2018.2866730.
[24] H. Li, J. Li, P.-C. Chang, and J. Sun, “Parametric prediction on default risk of Chinese listed tourism companies by using random oversampling, isomap, and locally linear embeddings on imbalanced samples,” Int. J. Hosp. Manag., vol. 35, pp. 141–151, Dec. 2013, doi: 10.1016/j.ijhm.2013.06.006.
[25] R. Blagus and L. Lusa, “Evaluation of SMOTE for High-Dimensional Class-Imbalanced Microarray Data,” in 2012 11th International Conference on Machine Learning and Applications, Dec. 2012, pp. 89–94, doi: 10.1109/ICMLA.2012.183.
[26] J. A. Sáez, J. Luengo, J. Stefanowski, and F. Herrera, “SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering,” Inf. Sci. (Ny)., 2015, doi: 10.1016/j.ins.2014.08.051.
[27] W. Xie, G. Liang, Z. Dong, B. Tan, and B. Zhang, “An Improved Oversampling Algorithm Based on the Samples’ Selection Strategy for Classifying Imbalanced Data,” Math. Probl. Eng., vol. 2019, pp. 1–13, May 2019, doi: 10.1155/2019/3526539.
[28] A. Syukron and A. Subekti, “Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit,” J. Inform., vol. 5, no. 2, pp. 175–185, Sep. 2018, doi: 10.31294/ji.v5i2.4158.
[29] K. R. Gray, P. Aljabar, R. A. Heckemann, A. Hammers, and D. Rueckert, “Random forest-based similarity measures for multi-modal classification of Alzheimer’s disease,” Neuroimage, vol. 65, pp. 167–175, Jan. 2013, doi: 10.1016/j.neuroimage.2012.09.065.
[30] M. Belgiu and L. Dr?gu?, “Random forest in remote sensing: A review of applications and future directions,” ISPRS J. Photogramm. Remote Sens., vol. 114, pp. 24–31, Apr. 2016, doi: 10.1016/j.isprsjprs.2016.01.011.
[31] T. M. Oshiro, P. S. Perez, and J. A. Baranauskas, “How Many Trees in a Random Forest?,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, pp. 154–168.
[32] W. W. Ariestya, Y. E. Praptiningsih, and W. Supriatin, “DECISION TREE LEARNING UNTUK PENENTUAN JALUR KELULUSAN MAHASISWA,” J. Ilm. FIFO, vol. 8, no. 1, p. 97, May 2016, doi: 10.22441/fifo.v8i1.1304.
[33] S. B. Kotsiantis, “Decision trees: a recent overview,” Artif. Intell. Rev., vol. 39, no. 4, pp. 261–283, Apr. 2013, doi: 10.1007/s10462-011-9272-4.
[34] J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling,” Expert Syst. Appl., vol. 134, pp. 93–101, Nov. 2019, doi: 10.1016/j.eswa.2019.05.028.
[35] Y. Shuai, Y. Zheng, and H. Huang, “Hybrid Software Obsolescence Evaluation Model Based on PCA-SVM-GridSearchCV,” in 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Nov. 2018, pp. 449–453, doi: 10.1109/ICSESS.2018.8663753.
[36] V. Podgorelec and M. Zorman, “Decision Tree Learning,” in Encyclopedia of Complexity and Systems Science, 2015.
[37] T. Setiadi and J. Jamaludin, “Penerapan Klasifikasi Bayes Untuk Memprediksi Jenis Latihan Siswa Pencak Silat (Studi Kasus Pencak Silat PSHT),” Teknika, 2018, doi: 10.34148/teknika.v7i1.69.
[38] H. M and S. M.N, “A Review on Evaluation Metrics for Data Classification Evaluations,” Int. J. Data Min. Knowl. Manag. Process, 2015, doi: 10.5121/ijdkp.2015.5201.
[39] A. Luque, A. Carrasco, A. Martín, and A. de las Heras, “The impact of class imbalance in classification performance metrics based on the binary confusion matrix,” Pattern Recognit., 2019, doi: 10.1016/j.patcog.2019.02.023.
[40] X.-Y. Liu and Z.-H. Zhou, “Ensemble Methods for Class Imbalance Learning,” in Imbalanced Learning, Hoboken, NJ, USA: John Wiley & Sons, Inc., 2013, pp. 61–82.
[41] R. Upadhyay and S. Lui, “Foreign English Accent Classification Using Deep Belief Networks,” in Proceedings - 12th IEEE International Conference on Semantic Computing, ICSC 2018, 2018, doi: 10.1109/ICSC.2018.00053.
[42] S. Helmiyah, I. Riadi, R. Umar, and A. Hanif, “Ekstraksi Fitur Pengenalan Emosi Berdasarkan Ucapan Menggunakan Linear Predictor Ceptral Coeffecient Dan Mel Frequency Cepstrum Coefficients,” Mob. Forensics, vol. 1, no. 2, p. 48, Dec. 2019, doi: 10.12928/mf.v1i2.1259.
[43] K. Mannepalli, P. N. Sastry, and M. Suman, “MFCC-GMM based accent recognition system for Telugu speech signals,” Int. J. Speech Technol., 2016, doi: 10.1007/s10772-015-9328-y.
[44] N. Kamarudin, S. A. R. Al-Haddad, S. J. Hashim, M. A. Nematollahi, and A. R. Bin Hassan, “Feature extraction using Spectral Centroid and Mel Frequency Cepstral Coefficient for Quranic Accent Automatic Identification,” in 2014 IEEE Student Conference on Research and Development, SCOReD 2014, 2014, doi: 10.1109/SCORED.2014.7072945.
[45] H. Hairani, A. S. Suweleh, and D. Susilowaty, “Penanganan Ketidak Seimbangan Kelas Menggunakan Pendekatan Level Data,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 20, no. 1, pp. 109–116, Sep. 2020, doi: 10.30812/matrik.v20i1.846.
[46] A. Y. Triyanto and R. Kusumaningrum, “Implementasi Teknik Sampling untuk Mengatasi Imbalanced Data pada Penentuan Status Gizi Balita dengan Menggunakan Learning Vector Quantization,” J. IPTEKKOM J. Ilmu Pengetah. Teknol. Inf., vol. 19, no. 1, p. 39, Jul. 2017, doi: 10.33164/iptekkom.19.1.2017.39-50.
Detail Informasi
Tesis ini ditulis oleh :
- Nama : Muhamad Azhar
- NIM : 14002310
- Prodi : Ilmu Komputer
- Kampus : Kramat Raya
- Tahun : 2020
- Periode : II
- Pembimbing : Dr. Hilman Ferdinandus Pardede, ST, M.EICT
- Asisten :
- Kode : 0045.S2.IK.TESIS.II.2020
- Diinput oleh : RKY
- Terakhir update : 25 Juli 2022
- Dilihat : 415 kali
TENTANG PERPUSTAKAAN

E-Library Perpustakaan Universitas Nusa Mandiri merupakan
platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.
INFORMASI
Alamat : Jln. Jatiwaringin Raya No.02 RT08 RW 013 Kelurahan Cipinang Melayu Kecamatan Makassar Jakarta Timur
Email : perpustakaan@nusamandiri.ac.id
Jam Operasional
Senin - Jumat : 08.00 s/d 20.00 WIB
Isitirahat Siang : 12.00 s/d 13.00 WIB
Istirahat Sore : 18.00 s/d 19.00 WIB
Perpustakaan Universitas Nusa Mandiri @ 2020