Name: Klasifikasi Teks Informal dalam IndoBERT dengan Algoritma Bi-LSTM dan CNN
Author: VALIANDA FARRADILLAH HAKIM

Klasifikasi Teks Informal dalam IndoBERT dengan Algoritma Bi-LSTM dan CNN

VALIANDA FARRADILLAH HAKIM
14210191

ABSTRAK

ABSTRAK

Nama : Valianda Farradillah Hakim

NIM : 14210191

Program Studi : Ilmu Komputer (S2)

Fakultas : Teknologi Informasi

Jenjang : Strata Dua (S2)

Konsentrasi : Data Mining

Judul : “Klasifikasi Teks Informal dalam IndoBERT dengan Algoritma Bi-LSTM dan CNN”

Aktivitas yang paling banyak digunakan dalam Twitter ini oleh penggunanya adalah melakukan tweet atau cuitan pada akun resmi tertentu, mulai dari cuitan positif yang memuji namun ada juga cuitan yang negatif seperti contohnya kritikan, salah satu akun resmi yang sering menerima cuitan dari para penggunanya adalah Telkomsel. Tujuan utama dari penelitian ini untuk mengetahui hasil akurasi tertinggi yang didapatkan dari embedding transformasi IndoBERT ditambah dengan algoritma deep learning Bi-LSTM dan CNN. Hasil dari penelitian memiliki akurasi cukup tinggi di atas angka 90% dengan akurasi tertinggi didapatkan oleh algoritma deep learning yaitu CNN sebesar 99% pada learning rate 6*10-5 dengan nilai presisi, recall dan F1 masing-masing mendapatkan nilai 98%,97% dan 97%.

KATA KUNCI

INDOBERT,Twitter,Bi-LSTM,PYTHON,Telkomsel

DAFTAR PUSTAKA

DAFTAR REFERENSI

[1] “Essential Twitter statistics and trends for 2023,” 2023. https://datareportal.com/essential-twitter-stats (accessed Jul. 20, 2023).

[2] S. Kemp, “Twitter users in Indonesia in 2023,” 2023. https://datareportal.com/reports/digital-2022-indonesia (accessed Mar. 12, 2023).

[3] A. Ahdiyat, “Operator Seluler yang Digunakan Responden (Januari 2023),” 2023. https://databoks.katadata.co.id/datapublish/2023/06/23/ini-operatorseluler-dengan-pengguna-terbanyak-di-indonesia-awal-2023 (accessed Jul. 21, 2023).

[4] D. Novianty and D. Prastya, “Riset Counterpoint: Telkomsel Jadi Operator Seluler Terbesar di Indonesia,” 2022. https://www.suara.com/tekno/2022/07/17/160231/riset-counterpointtelkomsel-jadi-operator-seluler-terbesar-di-indonesia?page=all (accessed Jul. 31, 2023).

[5] K. S. Nugroho, I. Akbar, A. N. Suksmawati, and Istiadi, “Deteksi Depresi Dan Kecemasan Pengguna Twitter Menggunakan Bidirectional Lstm,” arXiv, no. Ciastech, pp. 287–296, 2023, doi: 10.48550/arXiv.2301.04521.

[6] Y. Widhiyasana, T. Semiawan, I. Gibran, A. Mudzakir, and M. R. Noor, “Penerapan Convolutional Long Short-Term Memory untuk Klasifikasi Teks Berita Bahasa Indonesia (Convolutional Long Short-Term Memory Implementation for Indonesian News Classification),” J. Nas. Tek. Elektro dan Teknol. Inf. |, vol. 10, no. 4, pp. 354–361, 2021.

[7] I. Ayu Shafirra N, “Klasifikasi Sentimen Ulasan Film Indonesia dengan Konversi Speech-to-Text (STT) Menggunakan CNN,” J. sains dan seni ITS, vol. 9, no. 1, pp. 2301–9271, 2020.

[8] A. Kurniasih and L. P. Manik, “On the Role of Text Preprocessing in BERT 48 Program Studi Ilmu Komputer (S2) Universitas Nusa Mandiri Embedding-based DNNs for Classifying Informal Texts,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 6, pp. 927–934, 2022, doi: 10.14569/IJACSA.2022.01306109.

[9] I. R. Hidayat and W. Maharani, “General Depression Detection Analysis Using IndoBERT Method,” Int. J. Inf. Commun. Technol., vol. 8, no. 1, pp. 41–51, 2022, doi: 10.21108/ijoict.v8i1.634.

[10] A. Candra, Wella, and A. Wicaksana, “Bidirectional encoder representations from transformers for cyberbullying text detection in indonesian social media,” Int. J. Innov. Comput. Inf. Control, vol. 17, no. 5, pp. 1599–1615, 2021, doi: 10.24507/ijicic.17.05.1599.

[11] “Keluh,” 2022. https://kbbi.co.id/arti-kata/keluh (accessed Jul. 27, 2023).

[12] M. P. Bach, Ž. Krsti?, S. Seljan, and L. Turulja, “Text mining for big data analysis in financial sector: A literature review,” Sustain., vol. 11, no. 5, 2019, doi: 10.3390/su11051277.

[13] H. Hassani, C. Beneki, S. Unger, M. T. Mazinani, and M. R. Yeganegi, “Text mining in big data analytics,” Big Data Cogn. Comput., vol. 4, no. 1, pp. 1–34, 2020, doi: 10.3390/bdcc4010001.

[14] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” in Proceedings of the 28th COLING, 2020.

[15] “IndoBERT,” 2020. https://indolem.github.io/IndoBERT/ (accessed May 12, 2023).

[16] F. Romano and H. Kruger, Learn Python Programming: An in-depth introduction to the fundamentals of Python, Third Edit. Packt Publishing, 2021.

[17] J. Hao and T. K. Ho, “Machine Learning Made Easy: A Review of Scikit-learn Package in Python Programming Language,” J. Educ. Behav. Stat., vol. 44, no. 3, pp. 348–361, 2019, doi: 10.3102/1076998619832248. 49 Program Studi Ilmu Komputer (S2) Universitas Nusa Mandiri

[18] V. Saabith, A.L.Sayeth Thangarajah and M. Fareez, “Python Current Trend Applications- An Overview Popular Web Development Frameworks In Python,” Int. J. Adv. Eng. Res. Dev., no. October 2019, pp. 6–12, 2019.

[19] “NLTK.” https://www.nltk.org/ (accessed May 28, 2023).

[20] S. Loria, “TextBlob: Simplified Text Processing.” https://textblob.readthedocs.io/en/dev/ (accessed May 28, 2023).

[21] “Industrial-Strength Natural Language Processing.” https://spacy.io/ (accessed May 28, 2023).

[22] Google, “Embeddings,” 2023. https://developers.google.com/machinelearning/crash-course/embeddings/video-lecture (accessed Sep. 06, 2023).

[23] B. Dai, X. Shen, and J. Wang, “Embedding Learning,” J. Am. Stat. Assoc., vol. 117, no. 537, pp. 307–319, 2022, doi: 10.1080/01621459.2020.1775614

. [24] P. Purwono, P. Dewi, S. K. Wibisono, and B. P. Dewa, “Model Prediksi Otomatis Jenis Penyakit Hipertensi dengan Pemanfaatan Algoritma Machine Learning Artificial Neural Network,” Insect (Informatics Secur. J. Tek. Inform., vol. 7, no. 2, pp. 82–90, 2022, doi: 10.33506/insect.v7i2.1828.

[25] M. J. Hamayel and A. Y. Owda, “A Novel Cryptocurrency Price Prediction Model Using GRU, LSTM and bi-LSTM Machine Learning Algorithms,” Ai, vol. 2, no. 4, pp. 477–496, 2021, doi: 10.3390/ai2040030.

[26] F. Shahid, A. Zameer, and M. Muneeb, “Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM,” Chaos, Solitons and Fractals, vol. 140, p. 110212, 2020, doi: 10.1016/j.chaos.2020.110212.

[27] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,” IEEE Trans. Neural Networks Learn. Syst., vol. 33, no. 12, pp. 6999–7019, 2022, doi: 10.1109/TNNLS.2021.3084827.

[28] M. Umer et al., “Impact of convolutional neural network and FastText 50 Program Studi Ilmu Komputer (S2) Universitas Nusa Mandiri embedding on text classification,” Multimed. Tools Appl., vol. 82, no. 4, pp. 5569–5585, 2023, doi: 10.1007/s11042-022-13459-x.

[29] L. Alzubaidi et al., Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, vol. 8, no. 1. Springer International Publishing, 2021. doi: 10.1186/s40537-021-00444-8.

[30] R. K. G. D. & K. T. Rikiya Yamashita, Mizuho Nishio, “Convolutional neural networks: an overview and application in radiology https://doi.org/10.1007/s13244-018-0639-9,” Springer, vol. 195, pp. 21–30, 2018.

[31] A. Zafar et al., “A Comparison of Pooling Methods for Convolutional Neural Networks,” Appl. Sci., vol. 12, no. 17, pp. 1–21, 2022, doi: 10.3390/app12178643.

[32] Microsoft, “Hyperparameter tuning a model (v2),” 2023. https://learn.microsoft.com/id-id/azure/machine-learning/how-to-tunehyperparameters?view=azureml-api-2 (accessed Jul. 30, 2023).

[33] Chinwe I., Anyama O., Uzoma A. U., and Abasiama S., “Effect of Learning Rate on Artificial Neural Network in Machine Learning,” Int. J. Eng. Res. , vol. 4, no. 2, pp. 359–363, 2021.

[34] R. Y. Rubinstein and K. P Dirk, The Cross-Entropy Method - A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. Springer New York.

[35] Keras, “Probabilistic losses.” https://keras.io/api/losses/probabilistic_losses/ (accessed Jul. 20, 2023).

[36] I. Kandel and M. Castelli, “The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset,” ICT Express, vol. 6, no. 4, pp. 312–315, 2020, doi: 10.1016/j.icte.2020.04.010.

[37] R. Lin, “Analysis on the Selection of the Appropriate Batch Size in CNN Neural Network,” in 2022 International Conference on Machine Learning and 51 Program Studi Ilmu Komputer (S2) Universitas Nusa Mandiri Knowledge Engineering (MLKE), 2022, pp. 106–109. doi: 10.1109/MLKE55170.2022.00026.

[38] Keras, “Adam.” https://keras.io/api/optimizers/adam/ (accessed Jul. 20, 2023).

[39] C. Sweeney, E. Ennis, M. Mulvenna, R. Bond, and S. O’neill, “How Machine Learning Classification Accuracy Changes in a Happiness Dataset with Different Demographic Groups,” Computers, vol. 11, no. 5, 2022, doi: 10.3390/computers11050083.

[40] I. Hammad and K. El-Sankary, “Practical considerations for accuracy evaluation in sensor-based machine learning and deep learning,” Sensors (Switzerland), vol. 19, no. 16, pp. 1–13, 2019, doi: 10.3390/s19163491.

[41] Y. Kim, J. H. Kim, Y. M. Kim, S. Song, and H. J. Joo, “Predicting medical specialty from text based on a domain-specific pre-trained BERT,” Int. J. Med. Inform., vol. 170, no. November 2022, p. 104956, 2023, doi: 10.1016/j.ijmedinf.2022.104956.

[42] I. N. Yulita, V. Wijaya, R. Rosadi, I. Sarathan, Y. Djuyandi, and A. S. Prabuwono, “Analysis of Government Policy Sentiment Regarding Vacation during the COVID-19 Pandemic Using the Bidirectional Encoder Representation from Transformers (BERT),” Data, vol. 8, no. 3, p. 46, 2023, doi: 10.3390/data8030046.

[43] L. F. Simanjuntak, R. Mahendra, and E. Yulianti, “We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model,” Big Data Cogn. Comput., vol. 6, no. 3, 2022, doi: 10.3390/bdcc6030077.

[44] J. Briskilal and C. N. Subalalitha, “An ensemble model for classifying idioms and literal texts using BERT and RoBERTa,” Inf. Process. Manag., vol. 59, no. 1, p. 102756, 2022, doi: 10.1016/j.ipm.2021.102756.

[45] A. T. Bagus W and D. H. Fudholi, “KLASIFIKASI EMOSI PADA TEKS DENGAN MENGGUNAKAN METODE DEEP LEARNING,” Syntax Lit. J. Ilm. Indones., vol. 6, no. 1, p. 6, 2021, doi: 10.36418/syntax-literate.v6i1.4758. 52 Program Studi Ilmu Komputer (S2) Universitas Nusa Mandiri

[46] A. H. Oliaee, S. Das, J. Liu, and M. A. Rahman, “Using Bidirectional Encoder Representations from Transformers (BERT) to classify traffic crash severity types,” Nat. Lang. Process. J., vol. 3, no. April, p. 100007, 2023, doi: 10.1016/j.nlp.2023.100007.

[47] “Telkomsel,” 2023. https://socialblade.com/twitter/user/telkomsel (accessed Sept. 10, 2023).

[48] R. Lohith, K. E. Cholachgudda, and R. C. Biradar, “PyTorch Implementation and Assessment of Pre-Trained Convolutional Neural Networks for Tomato Leaf Disease Classification,” 2022 IEEE Reg. 10 Symp. TENSYMP 2022, pp. 1–6, 2022, doi: 10.1109/TENSYMP54529.2022.9864390.

[49] R. Haque, S. B. Ho, I. Chai, and A. Abdullah, “Parameter and Hyperparameter Optimisation of Deep Neural Network Model for Personalised Predictions of Asthma,” J. Adv. Inf. Technol., vol. 13, no. 5, pp. 512–517, 2022, doi: 10.12720/jait.13.5.512-517.

Detail Informasi

Tesis ini ditulis oleh :

Nama : VALIANDA FARRADILLAH HAKIM
NIM : 14210191
Prodi : Ilmu Komputer
Kampus : Margonda
Tahun : 2023
Periode : I
Pembimbing : Prof. Ir. Dr. Dwiza Riana, S,Si, MM, M.Kom
Asisten :
Kode : 0037.S2.IK.TESIS.I.2023
Diinput oleh : NZH
Terakhir update : 24 Juni 2024
Dilihat : 144 kali

TENTANG PERPUSTAKAAN

E-Library Perpustakaan Universitas Nusa Mandiri merupakan platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.