Name: Deteksi Ujaran Kebencian di Media Sosial menggunakan Machine Learning
Author: WURI TIRTAWATI

Deteksi Ujaran Kebencian di Media Sosial menggunakan Machine Learning

WURI TIRTAWATI
14210164

ABSTRAK

ABSTRAK

Nama : Wuri Tirtawati

NIM : 14210164

Program Studi : Ilmu Komputer

Fakultas : Teknologi Informasi

Jenjang : Strata Dua (S2)

Konsentrasi : Data Mining

Judul : “Deteksi Ujaran Kebencian di Media Sosial menggunakan Machine Learning”

Media sosial telah menjadi alat yang sangat kuat untuk pertukaran informasi, karena memungkinkan pengguna untuk tidak hanya mengonsumsi informasi tetapi juga berbagi dan mendiskusikan berbagai hal yang menarik bagi mereka. Namun, di balik kemudahan ini, platform media sosial seringkali dihadapkan pada masalah ujaran kebencian - konten yang mencerminkan ekspresi kebencian terhadap individu atau kelompok tertentu. Jenis konten semacam ini bisa menimbulkan ketakutan, intimidasi, atau bahkan menghasut pengguna lain untuk bertindak dengan kekerasan. Salah satu kompleksitas dalam pengawasan media sosial adalah kebebasan pengguna untuk mengekspresikan pemikiran mereka dalam bentuk teks tanpa harus mematuhi aturan tata bahasa yang ketat. Ini menimbulkan tantangan dalam mengidentifikasi dan menganalisis konten ujaran kebencian dengan akurat dan efisien. Walaupun kesadaran tentang masalah yang diakibatkan oleh konten negatif di media sosial semakin meningkat, namun hingga saat ini, solusi yang dapat diandalkan untuk mendeteksi ujaran kebencian masih belum sepenuhnya memadai. Oleh karena itu, tujuan utama dari penelitian ini adalah mengembangkan alat yang handal untuk mendeteksi tweet yang mengandung ujaran kebencian. Dalam penelitian ini, sebuah pendekatan inovatif diajukan untuk mendeteksi dan mengklasifikasikan konten ujaran kebencian dengan menggunakan data dari komunitas yang secara khusus mengidentifikasi diri sebagai kelompok yang menyebarkan kebencian di platform Twitter. Dengan adanya pendekatan yang lebih cermat dan tepat sasaran ini, diharapkan dapat memberikan kontribusi dalam menangani masalah ujaran kebencian yang semakin mendalam di dunia media sosial. Hasil dari percobaan menggunakan algoritma klasifikasi Machine Learning menunjukkan algoritma Logistic Regression mencapai kinerja yang sama dengan hasil perhitungan Voting Ensemble dalam algoritma deteksi ujaran kebencian dengan nilai akurasi sebesar 96,67%.

KATA KUNCI

Ujaran kebencian,machine learning,Klasifikasi Teks,Analisis Sentimen

DAFTAR PUSTAKA

DAFTAR PUSTAKA

[1] S. Malmasi and M. Zampieri, “Detecting hate speech in social media,” Int. Conf. Recent Adv. Nat. Lang. Process. RANLP, vol. 2017-Septe, pp. 467– 472, 2017, doi: 10.26615/978-954-452-049-6-062.

[2] D. Elisabeth, I. Budi, and M. O. Ibrohim, “Hate Code Detection in Indonesian Tweets using Machine Learning Approach: A Dataset and Preliminary Study,” 2020 8th Int. Conf. Inf. Commun. Technol. ICoICT 2020, 2020, doi: 10.1109/ICoICT49345.2020.9166251.

[3] O. Oriola and E. Kotze, “Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets,” IEEE Access, vol. 8, pp. 21496–21509, 2020, doi: 10.1109/ACCESS.2020.2968173.

[4] M. A. Paz, J. Montero-Díaz, and A. Moreno-Delgado, “Hate Speech: A Systematized Review,” SAGE Open, vol. 10, no. 4, 2020, doi: 10.1177/2158244020973022.

[5] B. Mathew, R. Dutt, P. Goyal, and A. Mukherjee, “Spread of Hate Speech in Online Social Media,” WebSci 2019 - Proc. 11th ACM Conf. Web Sci., pp. 173–182, 2019, doi: 10.1145/3292522.3326034.

[6] A. Alsaeedi and M. Z. Khan, “A study on sentiment analysis techniques of Twitter data,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 2, pp. 361–374, 2019, doi: 10.14569/ijacsa.2019.0100248.

[7] S. A. El Rahman, F. A. Alotaibi, and W. A. Alshehri, “Sentiment Analysis of Twitter Data,” 2019 Int. Conf. Comput. Inf. Sci. ICCIS 2019, 2019, doi: 10.1109/ICCISci.2019.8716464.

[8] S. S. Alaoui, Y. Farhaoui, and B. Aksasse, “Classification Algorithms in Data Mining – A Survey,” A Comp. Study Classif. Tech. Data Min. Algorithms, vol. 6, no. 1, pp. 1–6, 2017, [Online]. Available: https://www.researchgate.net/profile/BAksasse/publication/326866871_Classification_algorithms_in_Data_Minin g/links/5b9785ae4585153a5329962d/Classification-algorithms-in-DataMining.pdf.

[9] W. Haoxiang and S. S, “Big Data Analysis and Perturbation using Data 47 Mining Algorithm,” J. Soft Comput. Paradig., vol. 3, no. 1, pp. 19–28, 2021, doi: 10.36548/jscp.2021.1.003.

[10] A. S. Osman, “Data mining techniques: Review,” Int. J. Data Sci. Res., vol. 2, no. 1, pp. 1–4, 2019.

[11] W. Sun, Z. Cai, Y. Li, F. Liu, S. Fang, and G. Wang, “Data processing and text mining technologies on electronic medical records: A review,” J. Healthc. Eng., vol. 2018, 2018, doi: 10.1155/2018/4302425.

[12] O. Kononova, T. He, H. Huo, A. Trewartha, E. A. Olivetti, and G. Ceder, “Opportunities and challenges of text mining in aterials research,” iScience, vol. 24, no. 3, p. 102155, 2021, doi: 10.1016/j.isci.2021.102155.

[13] D. Ramachandran and R. Parvathi, “Analysis of Twitter Specific Preprocessing Technique for Tweets,” Procedia Comput. Sci., vol. 165, pp. 245–251, 2019, doi: 10.1016/j.procs.2020.01.083.

[14] A. I. Kadhim, “An Evaluation of Preprocessing Techniques for Text Classification,” Int. J. Comput. Sci. Inf. Secur., vol. 16, no. 6, pp. 22–32, 2018, [Online]. Available: https://sites.google.com/site/ijcsis/.

[15] Q. Bi, K. E. Goodman, J. Kaminsky, and J. Lessler, “What is machine learning? A primer for the epidemiologist,” Am. J. Epidemiol., vol. 188, no. 12, pp. 2222–2239, 2019, doi: 10.1093/aje/kwz189.

[16] I. G. S. Mas Diyasa, N. M. I. Marini Mandenni, M. I. Fachrurrozi, S. I. Pradika, K. R. Nur Manab, and N. R. Sasmita, “Twitter Sentiment Analysis as an Evaluation and Service Base On Python Textblob,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1125, no. 1, p. 012034, 2021, doi: 10.1088/1757- 899x/1125/1/012034.

[17] L. Pang, Opinion mining dan sentiment analysis, vol. 1, no. 2. 2006.

[18] R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, and J. Saeed, “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction,” J. Appl. Sci. Technol. Trends, vol. 1, no. 2, pp. 56–70, 2020, doi: 10.38094/jastt1224.

[19] S. H. Alizadeh Moghaddam, M. Mokhtarzade, and B. A. Beirami, “A feature extraction method based on spectral segmentation and integration of hyperspectral images,” Int. J. Appl. Earth Obs. Geoinf., vol. 89, no. 48 February, p. 102097, 2020, doi: 10.1016/j.jag.2020.102097.

[20] D. M. Sulaiman, A. M. Abdulazeez, H. Haron, and S. S. Sadiq, “Unsupervised Learning Approach-Based New Optimization K-Means Clustering for Finger Vein Image Localization,” 2019 Int. Conf. Adv. Sci. Eng. ICOASE 2019, pp. 82–87, 2019, doi: 10.1109/ICOASE.2019.8723749.

[21] M. Marcinczuk, M. Gniewkowski, T. Walkowiak, and M. Bedkowski, “Text document clustering: Wordnet vs. TF-IDF vs. word embeddings,” GWC 2021 - Proc. 11th Glob. Wordnet Conf., pp. 207–214, 2021.

[22] S. Wang, Y. Dai, J. Shen, and J. Xuan, “Research on expansion and classification of imbalanced data based on SMOTE algorithm,” Sci. Rep., vol. 11, no. 1, pp. 1–11, 2021, doi: 10.1038/s41598-021-03430-5.

[23] C. Zhu, C. U. Idemudia, and W. Feng, “Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques,” Informatics Med. Unlocked, vol. 17, no. January, p. 100179, 2019, doi: 10.1016/j.imu.2019.100179.

[24] B. Ke, M. Khandelwal, P. G. Asteris, A. D. Skentou, A. Mamou, and D. J. Armaghani, “Rock-Burst Occurrence Prediction Based on Optimized Naïve Bayes Models,” IEEE Access, vol. 9, pp. 91347–91360, 2021, doi: 10.1109/ACCESS.2021.3089205.

[25] W. Deng, Y. Guo, J. Liu, Y. Li, D. Liu, and L. Zhu, “A missing power data filling method based on improved random forest algorithm,” Chinese J. Electr. Eng., vol. 5, no. 4, pp. 33–39, 2019, doi: 10.23919/CJEE.2019.000025.

[26] M. Goudjil, M. Koudil, M. Bedda, and N. Ghoggali, “A Novel Active Learning Method Using SVM for Text Classification,” Int. J. Autom. Comput., vol. 15, no. 3, pp. 290–298, 2018, doi: 10.1007/s11633-015- 0912-z.

[27] H. Mo, H. Sun, J. Liu, and S. Wei, “Developing window behavior models for residential buildings using XGBoost algorithm,” Energy Build., vol. 205, p. 109564, 2019, doi: 10.1016/j.enbuild.2019.109564.

[28] J. S. Challa, P. Goyal, S. Nikhil, A. Mangla, S. S. Balasubramaniam, and 49 N. Goyal, “DD-Rtree: A dynamic distributed data structure for efficient data distribution among cluster nodes for spatial data mining algorithms,” Proc. - 2016 IEEE Int. Conf. Big Data, Big Data 2016, pp. 27–36, 2016, doi: 10.1109/BigData.2016.7840586.

[29] R. T. Mutanga, N. Naicker, and O. O. Olugbara, “Detecting Hate Speech on Twitter Network using Ensemble Machine Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 3, pp. 331–339, 2022, doi: 10.14569/IJACSA.2022.0130341.

[30] D. R. Patil and T. M. Pattewar, “Multi-Model Learning to Detect Twitter Hate Speech,” Preprints, no. March, pp. 1–13, 2022, doi: 10.20944/preprints202203.0333.v1.

[31] A. Kulkarni and S. Mhaske, “Tweet Sentiment Analysis and Study and Comparison of Various Approaches and Classification Algorithms Used,” Int. Res. J. Eng. Technol., pp. 2619–2624, 2020, [Online]. Available: www.irjet.net.

[32] A. Gaydhani, V. Doma, S. Kendre, and L. Bhagwat, “Detecting Hate Speech and Offensive Language on Twitter using Machine Learning: An N-gram and TFIDF based Approach,” 2018, [Online]. Available: http://arxiv.org/abs/1809.08651.

[33] P. Burnap and M. L. Williams, “Us and them: identifying cyber hate on Twitter across multiple protected characteristics,” EPJ Data Sci., vol. 5, no. 1, 2016, doi: 10.1140/epjds/s13688-016-0072-6.

[34] S. Tulkens, L. Hilte, E. Lodewyckx, B. Verhoeven, and W. Daelemans, “A Dictionary-based Approach to Racism Detection in Dutch Social Media,” 2016, [Online]. Available: http://arxiv.org/abs/1608.08738.

[35] N. D. Gitari, Z. Zuping, H. Damien, and J. Long, “A lexicon-based approach for hate speech detection,” Int. J. Multimed. Ubiquitous Eng., vol. 10, no. 4, pp. 215–230, 2015, doi: 10.14257/ijmue.2015.10.4.21.

[36] S. Sharma, S. Agrawal, and M. Shrivastava, “Degree based Classification of Harmful Speech using Twitter Data,” COLING 2018 - 1st Work. Trolling, Aggress. Cyberbullying, TRAC 2018 - Proc. Work., pp. 106–112, 2018. 50

[37] S. Köffer, D. M. Riehle, S. Höhenberger, and J. Becker, “Discussing the value of automatic hate speech detection in online debates,” MKWI 2018 - Multikonferenz Wirtschaftsinformatik, vol. 2018-March, no. March, pp. 83– 94, 2018.

[38] G. I. Sigurbergsson and L. Derczynski, “O FFENSIVE L ANGUAGE AND H ATE S PEECH D ETECTION,” pp. 1–13, 2019.

[39] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, “Abusive language detection in online user content,” 25th Int. World Wide Web Conf. WWW 2016, pp. 145–153, 2016, doi: 10.1145/2872427.2883062.

[40] A. Schmidt and M. Wiegand, “A Survey on Hate Speech Detection using Natural Language Processing,” Soc. 2017 - 5th Int. Work. Nat. Lang. Process. Soc. Media, Proc. Work. AFNLP SIG Soc., no. 2012, pp. 1–10, 2017, doi: 10.18653/v1/w17-1101.

[41] Analytics Vidhya, “No Title.” https://www.kaggle.com/datasets/arkhoshghalb/twitter-sentiment-analysishatred-speech?select=train.csv.

Detail Informasi

Tesis ini ditulis oleh :

Nama : WURI TIRTAWATI
NIM : 14210164
Prodi : Ilmu Komputer
Kampus : Margonda
Tahun : 2023
Periode : I
Pembimbing : Prof. Ir. Dr. Dwiza Riana, S,Si, MM, M.Kom
Asisten :
Kode : 0013.S2.IK.TESIS.I.2023
Diinput oleh : NZH
Terakhir update : 10 Juni 2024
Dilihat : 334 kali

TENTANG PERPUSTAKAAN

E-Library Perpustakaan Universitas Nusa Mandiri merupakan platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.