Name: EKSPLORASI PENGENALAN EMOSI MANUSIA PADA DATA SUARA BERBASIS CONVOLUTIONAL NEURAL NETWORK DENGAN FITUR SKALA MEL DAN DELTA
Author: Dwi Krisnandi

EKSPLORASI PENGENALAN EMOSI MANUSIA PADA DATA SUARA BERBASIS CONVOLUTIONAL NEURAL NETWORK DENGAN FITUR SKALA MEL DAN DELTA

Dwi Krisnandi
14207023

ABSTRAK

ABSTRAK
Nama : Dwi Krisnandi
NIM : 14207023
Program Studi : Ilmu Komputer
Fakultas : Teknologi Informasi
Jenjang : Strata Dua (S2)
Konsentrasi : Data Mining
Judul : “Eksplorasi Pengenalan Emosi Manusia Pada Data Suara Berbasis
Convolutional Neural Network Dengan Fitur Skala Mel Dan Delta”
Eksplorasi pengenalanan emosi manusia pada data suara ini dapat memberikan
informasi terkait emosi manusia berdasarkan dari intonasi suara seseorang yang
menggambarkan emosi seseorang tersebut. Pemilihan data suara untuk menentukan
emosi dibanding dengan mimik wajah adalah untuk bertujuan mengetahui emosi
pada seseorang yang dimana pada masa pandemi ini kebanyakan orang
menggunakan masker sehingga tidak bisa melihat mimik wajah orang tersebut
secara jelas. Pada penelitia ini mengeksplorasi pengenalan emosi menggunakan
skala Mel dan delta untuk mendapatakan akurasi dari dataset RAVDESS yang
digunakan dalam penelitian ini dengan menggunakan algoritman CNN 2 dimensi
yang digunakan dalam penelitian ini. Label yang di prediksi dalam penelitian ini
adalah 4 label diangtanya : Angry, Sad, Nautral, dan Happy. CNN memiliki
kemampuan untuk mengenali dan membedakan paduan dan bentuk-bentuk
frekuensi audio yang kompleks, seperti suara emosi, yang meningkatkan akurasi
dalam pengenalan emosi. Kekurangan model CNN sangat tergantung pada fitur
yang dipilih, dan pemilihan fitur yang salah dapat menurunkan akurasi.
Kata Kunci : RAVDESS, Emotion, Skala Mel, CNN

KATA KUNCI

Explorasi Pengenalan Emosi Manusia,Convolutional Neural Network,Skala MEL dan DELTA

DAFTAR PUSTAKA

Daftar Pustaka [1] A. Szymkowiak, B. Melovi?, M. Dabi?, K. Jeganathan, and G. S. Kundi, “Information technology and Gen Z: The role of teachers, the internet, and technology in the education of young people,” Technol. Soc., vol. 65, no. March, 2021, doi: 10.1016/j.techsoc.2021.101565. [2] K. Venkataramanan and H. R. Rajamohan, “Emotion Recognition from Speech,” arxiv, Dec. 2019, doi: 10.1007/978-3-319-02732-6_7. [3] I. Livieris, E. Pintelas, and P. Pintelas, “Gender Recognition by Voice using an Improved Self-Labeled Algorithm,” Mach. Learn. Knowl. Extr. , vol. 1, no. 1, pp. 492–503, 2019, doi: 10.3390/make1010030. [4] A. B. Nassif, I. Shahin, S. Hamsa, N. Nemmour, and K. Hirose, “CASA- based speaker identification using cascaded GMM-CNN classifier in noisy and emotional talking conditions,” Appl. Soft Comput. , vol. 103, p. 107141, 2021, doi: 10.1016/j.asoc.2021.107141. [5] A. Yadav and Di. K. Vishwakarma, “A Multilingual Framework of CNN and Bi-LSTM for Emotion Classification,” 2020 11th Int. Conf. Comput.
Commun. Netw. Technol. ICCCNT 2020, 2020, doi: 10.1109/ICCCNT49239.2020.9225614. [6] H. Pérez-Espinosa, R. Zatarain-Cabada, and M. L. Barrón-Estrada, “Chapter 15 - Emotion recognition: from speech and facial expressions,” in
Biosignal Processing and Classification Using Computational Learning
and Intelligence, A. A. Torres-García, C. A. Reyes-García, L. VillaseñorPineda, and O. Mendoza-Montoya, Eds. Academic Press, 2022, pp. 307– 326. doi: https://doi.org/10.1016/B978-0-12-820125-1.00028-2. [7] C. Luna-Jiménez, D. Griol, Z. Callejas, R. Kleinlein, J. M. Montero, and F. Fernández-Martínez, “Multimodal emotion recognition on RAVDESS dataset using transfer learning,” Sensors, vol. 21, no. 22, pp. 1 –29, 2021, doi: 10.3390/s21227665. [8] M. Cohn, A. Pycha, and G. Zellou, “Intelligibility of face-masked speech depends on speaking style: Comparing casual, clear, and emotional speech,” Cognition, vol. 210, p. 104570, 2021, doi: https://doi.org/10.1016/j.cognition.2020.104570. [9] T. Gupta, D.-T. Truong, T. T. Anh, and C. E. Siong, “Estimation of speaker age and height from speech signal using bi-encoder transformer mixture model,” 2022, [Online]. Available: http://arxiv.org/abs/2203.11774 [10] M. F. Kacamarga, T. W. Cenggoro, A. Budiarto, R. Rahutomo, and B. Pardamean, “Analysis of acoustic features in gender identification model for English and Bahasa Indonesia telephone speeches,” Procedia Comput.
Sci., vol. 157, pp. 199–204, 2019, doi: 10.1016/j.procs.2019.08.158. [11] F. Ertam, “An effective gender recognition approach using voice data via 50 deeper LSTM networks,” Appl. Acoust., vol. 156, pp. 351 –358, 2019, doi: 10.1016/j.apacoust.2019.07.033. [12] E. Yucesoy and V. V. Nabiyev, “Gender identification of a speaker using MFCC and GMM,” ELECO 2013 - 8th Int. Conf. Electr. Electron. Eng. , no. November, pp. 626–629, 2013, doi: 10.1109/eleco.2013.6713922. [13] P. Sunitha and K. S. Prasad, “Multi Band Spectral Subtraction for Speech Enhancement with Different Frequency Spacing Methods and their Effect on Objective Quality Measures,” Int. J. Image, Graph. Signal Process., vol. 10, no. 5, p. 54, 2019. [14] D. Doukhan, J. Carrive, F. Vallet, A. Larcher, and S. Meignier, “An OpenSource Speaker Gender Detection Framework for Monitoring Gender Equality,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2018-April, pp. 5214–5218, 2018, doi: 10.1109/ICASSP.2018.8461471. [15] S. Kanwal, S. Asghar, A. Hussain, and A. Rafique, “Identifying the evidence of speech emotional dialects using artificial intelligence: A crosscultural study,” PLoS One, vol. 17, no. 3, pp. 1 –15, 2022, doi: 10.1371/journal.pone.0265199. [16] T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues,” in Proceedings ofthe AAAI conference on artificial
intelligence, 2020, vol. 34, no. 02, pp. 1359–1367. [17] A. Muppidi and M. Radfar, “Speech emotion recognition using quaternion convolutional neural networks,” ICASSP, IEEE Int. Conf. Acoust. Speech
Signal Process. - Proc., vol. 2021 -June, pp. 6309–6313, 2021, doi: 10.1109/ICASSP39728.2021.9414248. [18] F. A. Shaqra, R. Duwairi, and M. Al-Ayyoub, “Recognizing emotion from speech based on age and gender using hierarchical models,” Procedia
Comput. Sci., vol. 151, no. 2018, pp. 37–44, 2019, doi: 10.1016/j.procs.2019.04.009. [19] A. Eronen, A. K"ah"ari, P. Kallio, and T. Virtanen, “Improving speech recognition accuracy using mel scale features in noisy and reverberant environments,” J. Signal Process. Syst. , vol. 91, no. 1, pp. 61–73, 2019. [20] J.-H. Chen, X.-J. Li, and Q.-F. Lin, “Deep learning based speech recognition with mel-scale feature,” in 2021 IEEE International
Conference on Artificial Intelligence and Computer Applications (ICAICA), 2021, pp. 63–67. [21] Y. Gao, Z. Li, X. Lin, and Y. Wang, “Emotion recognition from speech using mel frequency cepstral coefficient and convolutional neural network,” in 2021 International Conference on Electronics, Information,
and Communication (ICEIC), 2021, pp. 1–4. [22] G. Xie, S. Liu, Y. Zhang, Y. Peng, and X. Zhou, “A hybrid feature 51 extraction approach for speech recognition based on Mel-frequency cepstral coefficients and prosodic features,” Neurocomputing, vol. 275, pp. 1076–1087, 2018. [23] K. W. Cheuk, H. Anderson, K. Agres, and D. Herremans, “NnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks,” IEEE Access, vol. 8, pp. 161981 – 162003, 2020, doi: 10.1109/ACCESS.2020.3019084. [24] T. Zhang, G. Feng, J. Liang, and T. An, “Acoustic scene classification based on Mel spectrogram decomposition and model merging,” Appl.
Acoust., vol. 182, p. 108258, 2021, doi: 10.1016/j.apacoust.2021.108258. [25] A. Goni, “How to Detect COVID-19 Cough From Mel Spectrogram Using Convolutional Neural Network,”
https://www.analyticsvidhya.com/blog/2021/06/how-to-detect-covid19-
cough-from-mel-spectrogram-using-convolutional-neuralnetwork/#:~:text=Mel%20spectrogram%20is%20a%20spectrogram,is%20
contained%20by%20the%20signal. , 2021. [26] I. T. Handoko and S. Suyanto, “Klasifikasi Gender dan Usia berdasarkan Suara Pembicara Menggunakan Hidden Markov Model,” Indones. J.
Comput., vol. 4, no. 3, pp. 99–106, 2019, doi: 10.21108/indojc.2019.4.3.375. [27] A. Winursito, R. Hidayat, and A. Bejo, “Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition,” 2018 Int.
Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, pp. 379– 383, 2018, doi: 10.1109/ICOIACT.2018.8350748. [28] R. B. Handoko and S. Suyanto, “Klasifikasi Gender Berdasarkan Suara Menggunakan Support Vector Machine,” Indones. J. Comput. , vol. 4, no. 1, p. 9, 2019, doi: 10.21108/indojc.2019.4.1.244. [29] L. Rabiner and B.-H. Juang, Fundamentals of speech recognition. PrenticeHall, Inc., 1993. [30] S. Young et al., The HTK Book (for HTK Version 3.4.1). University of Cambridge, Engineering Department, 2019. [Online]. Available: http://htk.eng.cam.ac.uk/docs/docs.shtml [31] X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing: A
guide to theory, algorithm, and system development. Prentice Hall PTR, 2001. [32] L. Deng and D. Yu, Deep Learning: Methods and Applications. Now Publishers Inc, 2014. [Online]. Available: https://doi.org/10.1561/2200000040 [33] M. A. Uddin, M. S. Hossain, R. K. Pathan, and M. Biswas, “Gender Recognition from Human Voice using Multi-Layer Architecture,” INISTA
2020 - 2020 Int. Conf. Innov. Intell. Syst. Appl. Proc. , pp. 0–6, 2020, doi: 10.1109/INISTA49547.2020.9194654. 52 [34] J. Li, X. Zhang, L. Huang, F. Li, S. Duan, and Y. Sun, “Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network,” Appl. Sci., vol. 12, no. 19, p. 9518, 2022, doi: 10.3390/app12199518. [35] Y. Bengio, “Deep learning for computer vision,” Annu. Rev. Comput. Sci. , vol. 6, pp. 21–36, 2018, doi: 10.1146/annurev-cs-050317-035749. [36] M. Xu, F. Zhang, and W. Zhang, “Head Fusion: Improving the Accuracy and Robustness of Speech Emotion Recognition on the IEMOCAP and RAVDESS Dataset,” IEEE Access, vol. 9, pp. 74539–74549, 2021, doi: 10.1109/ACCESS.2021.3067460. [37] S. Han, F. Leng, and Z. Jin, “Speech Emotion Recognition with a ResNetCNN-Transformer Parallel Neural Network,” in IEEE 3rd International
Conference on Comunication, Information System and Computer
Enginering, 2021, pp. 6–10. [38] B. Salian, O. Narvade, R. Tambewagh, and S. Bharne, “Speech Emotion Recognition using Time Distributed CNN and LSTM,” ITM Web Conf., vol. 40, p. 03006, 2021, doi: 10.1051/itmconf/20214003006. [39] I. Shahin, A. B. Nassif, and N. Hindawi, “Speaker identification in stressful talking environments based on convolutional neural network,” Int. J.
Speech Technol. , vol. 24, no. 4, pp. 1055–1066, 2021, doi: 10.1007/s10772- 021-09869-1. [40] H. Dolka, M. V. Arul Xavier, and S. Juliet, “Speech emotion recognition using ANN on MFCC features,” 2021 3rd Int. Conf. Signal Process.
Commun. ICPSC 2021, no. May, pp. 431–435, 2021, doi: 10.1109/ICSPC51351.2021.9451810. [41] J. Zhao, X. Mao, and L. Chen, “Speech emotion recognition using deep 1D & 2D CNN LSTM networks,” Biomed. Signal Process. Control, vol. 47, pp. 312–323, 2019, doi: 10.1016/j.bspc.2018.08.035. [42] O. U. Kumala and A. Zahra, “Indonesian Speech Emotion Recognition using Cross-Corpus Method with the Combination of MFCC and Teager Energy Features,” Int. J. Adv. Comput. Sci. Appl. , vol. 12, no. 4, pp. 163– 168, 2021, doi: 10.14569/IJACSA.2021.0120422. [43] Z. T. Liu, A. Rehman, M. Wu, W. H. Cao, and M. Hao, “Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence,” Inf. Sci. (Ny)., vol. 563, pp. 309–325, 2021, doi: 10.1016/j.ins.2021.02.016. [44] M. S. Fahad, A. Deepak, G. Pradhan, and J. Yadav, “DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features,” Circuits, Syst. Signal Process. , vol. 40, no. 1, pp. 466–489, 2021, doi: 10.1007/s00034-020-01486-8. [45] F. F. WATI, “Analisis Sentimen Review Aplikasi Tik-Tok Dengan Algoritma K-Nearest Neighbor, Naive Bayes Dan Support Vector Machine 53 Tesis,” STMIK Nusa Mandiri Jakarta, pp. 6–31, 2020, [Online]. Available: https://repository.nusamandiri.ac.id/index.php/repo/viewitem/11534

Detail Informasi

Tesis ini ditulis oleh :

Nama : Dwi Krisnandi
NIM : 14207023
Prodi : Ilmu Komputer
Kampus : Margonda
Tahun : 2022
Periode : II
Pembimbing : Dr. Hilman Ferdinandus Pardede, ST, M.EICT
Asisten :
Kode : 0039.S2.IK.TESIS.II.2022
Diinput oleh : RKY
Terakhir update : 28 Juli 2023
Dilihat : 270 kali

TENTANG PERPUSTAKAAN

E-Library Perpustakaan Universitas Nusa Mandiri merupakan platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.