MENINGKATKAN PREDIKSI PEMILIHAN WAJIB PAJAK DALAM PEMERIKSAAN

TEGUH HERWANTO
14220001

ABSTRAK

Peningkatan penerimaan pajak melalui kegiatan pemeriksaan merupakan strategi kunci untuk mengoptimalkan pendapatan negara di sektor perpajakan. Proses pemeriksaan pajak, yang bertujuan untuk menguji kepatuhan wajib pajak, diawali dengan seleksi wajib pajak yang akan diperiksa. Penelitian ini mengembangkan model prediktif berbasis Transformer dengan menggunakan model TabNet untuk menentukan wajib pajak yang akan diperiksa. Model ini dikembangkan dengan memanfaatkan dataset privat dari Direktorat Jenderal Pajak (DJP). Metodologi yang diimplementasikan menggabungkan pembelajaran semi-supervised learning (SSL) dengan teknik undersampling untuk mengatasi ketidakseimbangan kelas dalam dataset. Evaluasi kinerja model menunjukkan hasil yang signifikan, dengan nilai recall mencapai 0,96739 untuk model berbasis TabNet, jauh melampaui performa LightGBM (0,70190) dan Artificial Neural Network (0,73988). Penerapan SSL dengan teknik undersampling terbukti sangat efektif dalam meningkatkan sensitivitas model. Hasil penelitian ini berkontribusi signifikan pada pengembangan sistem prediksi berbasis kecerdasan buatan untuk optimalisasi proses pemeriksaan pajak. Temuan ini berpotensi meningkatkan efisiensi dan efektivitas proses seleksi wajib pajak untuk pemeriksaan, yang pada gilirannya dapat berdampak positif terhadap penerimaan negara dari sektor perpajakan.

Kata kunci: Undersampling, Semi-Supervised Learning, Transformer Model, Seleksi Pemeriksaan Perpajakan

KATA KUNCI

Undersampling,Transformasi

DAFTAR PUSTAKA

[1] M. Javornik, N. Nadoh, and D. Lange, “Data Is the New Oil: How Data Will Fuel the Transportation Industry—The Airline Industry as an Example,” in Lecture Notes in Mobility, Springer Science and Business Media Deutschland GmbH, 2019, pp. 295–308. doi: 10.1007/978-3-319-99756-8_19.

[2] Z. Obermeyer and E. J. Emanuel, “Predicting the Future - Big Data, Machine Learning, and Clinical Medicine,” New England Journal of Medicine, vol. 375, no. 13, pp. 1216–1219, Sep. 2016, doi: 10.1056/nejmp1609300.

[3] R. Heinrich, “Structured Data Preparation Pipeline for Machine Learning- Applications in Production,” 17th IMEKO TC, no. 10, pp. 241–246, 2020.

[4] R. Shwartz-Ziv and A. Armon, “Tabular Data: Deep Learning is Not All You Need,” Jun. 2021, [Online]. Available: http://arxiv.org/abs/2106.03253

[5] R. M. D. Saputra, Y. Chairul, D. Riana, A. S. Hewiz, and F. Aziz, “Stroke Prediction Based on Random Forest with SMOTE,” in 2023 International Conference on Information Technology Research and Innovation (ICITRI), IEEE, Aug. 2023, pp. 17–21. doi: 10.1109/ICITRI59340.2023.10249261.

[6] R. A. Nugraha, H. F. Pardede, and A. Subekti, “Oversampling based on generative adversarial networks to overcome imbalance data in predicting fraud insurance claim,” Kuwait Journal of Science, 2022, doi: 10.48129/kjs.splml.19119.

[7] S. Rahmadani, A. Subekti, and M. Haris, “Improving Classification Performance on Imbalance Medical Data using Generative Adversarial Network,” Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information), vol. 17, no. 1, pp. 9–17, 2024.

[8] R. R. Achmad and M. Haris, “Hyperparameter Tuning Deep Learning for Imbalanced Data,” Tepian - Politeknik Pertanian Negeri Samarinda, vol. 4, no. 2, pp. 90–101, Jun. 2023.

[9] A. P. Giovani, H. F. Pardede, and A. Subekti, “Autoencoder-Based Feature Learning for Predicting Cardiovascular Disease,” International Journal of Computing and Digital Systems, vol. 14, no. 1, pp. 759–768, 2023, doi: 10.12785/ijcds/140158.

[10] J. E. van Engelen and H. H. Hoos, “A survey on semi-supervised learning,” Mach Learn, vol. 109, no. 2, pp. 373–440, Feb. 2020, doi: 10.1007/s10994-019- 05855-6.

[11] X. Yang, Z. Song, I. King, and Z. Xu, “A Survey on Deep Semi-supervised Learning,” Feb. 2021, doi: 10.1109/TKDE.2022.3220219.

[12] OECD, “Revenue Statistics in Asia and the Pacific 2023: Strengthening Property Taxation in Asia,” Paris, 2023. doi: https://doi.org/10.1787/e7ea496f- en. 78 Program Studi Ilmu Komputer (S2) FTI Universitas Nusa Mandiri

[13] Z. H. S. Purnomo, “Yuks, Mengenal apa itu Tax Ratio Lebih,” https://www.pajak.go.id/index.php/id/artikel/yuks-mengenal-apa-itu-tax-ratio.

[14] T. Kementerian Keuangan, Dit. P. DJA, B. Segara, and R. I. Prakoso, “Informasi APBN 2023,” 2022. Accessed: Jul. 15, 2024. [Online]. Available: https://www.kemenkeu.go.id/informasi-publik/publikasi/siaran-pers/Siaran- Pers-APBN-2023

[15] “Surat Edaran Dirjen Pajak Nomor : SE - 15/PJ/2018,” https://perpajakan.ddtc.co.id/sumber-hukum/peraturan-pusat/surat-edaran- direktur-jenderal-pajak-se-15pj2018.

[16] T. Chan, C. E. Tan, and I. Tagkopoulos, “Audit lead selection and yield prediction from historical tax data using artificial neural networks,” PLoS One, vol. 17, no. 11 November, Nov. 2022, doi: 10.1371/journal.pone.0278121.

[17] V. Baghdasaryan, H. Davtyan, A. Sarikyan, and Z. Navasardyan, “Improving Tax Audit Efficiency Using Machine Learning: The Role of Taxpayer’s Network Data in Fraud Detection,” Applied Artificial Intelligence, vol. 36, no. 1, 2022, doi: 10.1080/08839514.2021.2012002.

[18] D. M. W. Powers and Ailab, “Evaluation: From Precision, Recall And F- Measure To Roc, Informedness, Markedness & Correlation.”

[19] P. Kumar, S. Arpan, K. Kar, Y. Singh, M. H. Kolekar, and S. Tanwar, “Lecture Notes in Electrical Engineering 597 Proceedings of ICRIC 2019 Recent Innovations in Computing,” Nov. 2019. [Online]. Available: http://www.springer.com/series/7818

[20] “Pengertian Pajak,” https://www.pajak.go.id/id/pajak.

[21] “Sistem Perpajakan,” https://pajak.go.id/id/sistem-perpajakan.

[22] “Pemeriksaan,” https://www.pajak.go.id/id/pemeriksaan.

[23] Direktorat Jenderal Pajak, “Laporan Kinerja Direktorat Jenderal Pajak Tahun 2019,” 2020. Accessed: May 01, 2024. [Online]. Available: www.pajak.go.id

[24] Direktorat Jenderal Pajak, “Laporan Kinerja Direktorat Jenderal Pajak Tahun 2020,” 2021. Accessed: May 01, 2024. [Online]. Available: www.pajak.go.id

[25] Direktorat Jenderal Pajak, “Laporan Kinerja Direktorat Jenderal Pajak Tahun 2021,” 2022. Accessed: May 01, 2024. [Online]. Available: www.pajak.go.id

[26] Direktorat Jenderal Pajak, “Laporan Kinerja Direktorat Jenderal Pajak Tahun 2022,” 2023. Accessed: May 01, 2024. [Online]. Available: www.pajak.go.id

[27] Direktorat Jenderal Pajak, “Laporan Tahunan Direktorat Jenderal Pajak Tahun 2019,” 2020. Accessed: May 01, 2024. [Online]. Available: www.pajak.go.id

[28] Direktorat Jenderal Pajak, “Laporan Tahunan Direktorat Jenderal Pajak Tahun 2020,” 2021. Accessed: May 01, 2024. [Online]. Available: www.pajak.go.id 79 Program Studi Ilmu Komputer (S2) FTI Universitas Nusa Mandiri

[29] Direktorat Jenderal Pajak, “Laporan Tahunan Direktorat Jenderal Pajak Tahun 2021,” 2022. Accessed: May 01, 2024. [Online]. Available: www.pajak.go.id

[30] Direktorat Jenderal Pajak, “Laporan Tahunan Direktorat Jenderal Pajak Tahun 2022,” 2023. Accessed: May 01, 2024. [Online]. Available: www.pajak.go.id

[31] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. Raffel, “MixMatch: A Holistic Approach to Semi-Supervised Learning,” May 2019, [Online]. Available: http://arxiv.org/abs/1905.02249

[32] K. Sohn et al., “FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence.” [Online]. Available: https://github.com/google- research/fixmatch.

[33] A. Vaswani et al., “Attention Is All You Need,” Jun. 2017, [Online]. Available: http://arxiv.org/abs/1706.03762

[34] S. O. Arik and T. Pfister, “TabNet: Attentive Interpretable Tabular Learning,” Aug. 2019, [Online]. Available: http://arxiv.org/abs/1908.07442

[35] G. Ke et al., “LightGBM: A Highly Efficient Gradient Boosting Decision Tree.” [Online]. Available: https://github.com/Microsoft/LightGBM.

[36] F. Ernawan, K. Handayani, M. Fakhreldin, and Y. Abbker, “Light Gradient Boosting with Hyper Parameter Tuning Optimization for COVID-19 Prediction,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 8, pp. 514–523, 2022, doi: 10.14569/IJACSA.2022.0130859.

[37] M. G. M. Abdolrasol et al., “Artificial neural networks based optimization techniques: A review,” Nov. 01, 2021, MDPI. doi: 10.3390/electronics10212689.

[38] Y. Chen, L. Song, Y. Liu, L. Yang, and D. Li, “A review of the artificial neural network models for water quality prediction,” Sep. 01, 2020, MDPI AG. doi: 10.3390/app10175776.

[39] R. Annisa, D. Rosiyadi, and D. Riana, “Improved Point Center Algorithm for K-Means Clustering to Increase Software Defect Prediction,” Int. J. Adv. Intell. Informatics, vol. 6, no. 3, pp. 328–339, 2020.

[40] Deviana Sely Wita and A. Subekti, “Mobilenet-based Transfer Learning for Detection of Eucalyptus Pellita Diseases,” Jurnal Nasional Pendidikan Teknik Informatika (JANAPATI), vol. 12, no. 1, pp. 1–7, Mar. 2023, doi: 10.23887/janapati.v12i1.53220.

[41] D. Riana, S. Hadianti, S. Rahayu, F. Aziz, and O. Kalsoem, “Deeprepomedunm: A Train Deep Learning Network And Extraction Feature For The Classification Of Pap Smear Images,” J Theor Appl Inf Technol, vol. 15, p. 19, 2022, [Online]. Available: www.jatit.org 80 Program Studi Ilmu Komputer (S2) FTI Universitas Nusa Mandiri

[42] J. Muschelli, “ROC and AUC with a Binary Predictor: a Potentially Misleading Metric,” J Classif, vol. 37, no. 3, pp. 696–708, Oct. 2020, doi: 10.1007/s00357- 019-09345-1.

[43] I. Sasono, R. Hartati, G. Chidir, and W. Gata, “Algoritma Regresi Linier untuk Memprediksi Pengaruh Promosi terhadap Penjualan Produk K31 S-2 dengan Frameworks CRISP-DM,” Journal Of Communication Education, vol. 2, no. 17, 2023.

[44] N. Visitpanya and T. Samanchuen, “Synthesis of Tax Return Datasets for Development of Tax Evasion Detection,” IEEE Access, vol. 11, pp. 48203– 48220, 2023, doi: 10.1109/ACCESS.2023.3276761.

[45] C. Kleanthous and S. Chatzis, “Gated Mixture Variational Autoencoders for Value Added Tax audit case selection,” Knowl Based Syst, vol. 188, Jan. 2020, doi: 10.1016/j.knosys.2019.105048.

[46] T. Matos et al., “Leveraging feature selection to detect potential tax fraudsters,” Expert Syst Appl, vol. 145, May 2020, doi: 10.1016/j.eswa.2019.113128.

[47] L. Kou, D. Zhao, H. Han, X. Xu, S. Gong, and L. Wang, “SSCL-TransMD: Semi-Supervised Continual Learning Transformer for Malicious Software Detection,” Applied Sciences (Switzerland), vol. 13, no. 22, Nov. 2023, doi: 10.3390/app132212255.

[48] P. Mavaie, L. Holder, and M. K. Skinner, “Hybrid deep learning approach to improve classification of low-volume high-dimensional data,” BMC Bioinformatics, vol. 24, no. 1, Dec. 2023, doi: 10.1186/s12859-023-05557-w.

[49] J. Nam, J. Tack, K. Lee, H. Lee, and J. Shin, “STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables,” Mar. 2023, [Online]. Available: http://arxiv.org/abs/2303.00918

[50] S. Darabi, S. Fazeli, A. Pazoki, S. Sankararaman, and M. Sarrafzadeh, “Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain,” Aug. 2021, [Online]. Available: http://arxiv.org/abs/2108.12296

[51] S. Hadianti, “Optimization of The Machine Learning Approach using Optuna in Heart Disease Prediction,” Journal Medical Informatics Technology, pp. 59– 64, Sep. 2023, doi: 10.37034/medinftech.v1i3.15.

[52] Q. H. Nguyen et al., “Influence of data splitting on performance of machine learning models in prediction of shear strength of soil,” Math Probl Eng, vol. 2021, 2021, doi: 10.1155/2021/4832864.

[53] W. Bismi, D. Riana, and A. Shafira Hewiz, “Disease Identification on Fig Leaf Images Using Deep Learning Method.” [Online]. Available: www.ijasce.org/index.php/ijasce

[54] Sebastien Fischman, “Welcome to pytorch_tabnet’s documentation!,” https://dreamquark-ai.github.io/tabnet/. 81 Program Studi Ilmu Komputer (S2) FTI Universitas Nusa Mandiri

[55] A. Rosid, “Artificial Neural Networks for predicting taxpaying behaviour of Indonesian firms,” Scientax, vol. 4, no. 2, pp. 174–204, Apr. 2023, doi: 10.52869/st.v4i2.526.

[56] S. Xiang, “Self-Supervised Learning on Tabular Data with TabNet,” https://medium.com/@vanillaxiangshuyang/self-supervised-learning-on- tabular-data-with-tabnet-544b3ec85cee

Detail Informasi

Tesis ini ditulis oleh :

Nama : TEGUH HERWANTO
NIM : 14220001
Prodi : Ilmu Komputer
Kampus : Margonda
Tahun : 2024
Periode : I
Pembimbing : Dr. Muhammad Haris, S.Kom., M.Eng
Asisten :
Kode : 0012.S2.IK.TESIS.I.2024
Diinput oleh : SGM
Terakhir update : 17 Februari 2025
Dilihat : 584 kali

TENTANG PERPUSTAKAAN

E-Library Perpustakaan Universitas Nusa Mandiri merupakan platform digital yang menyedikan akses informasi di lingkungan kampus Universitas Nusa Mandiri seperti akses koleksi buku, jurnal, e-book dan sebagainya.