Optimization of Sentiment Classification on Online Comments using Multinomial Naïve Bayes and TF-IDF Feature Extraction and N-grams

Main Article Content

Alfin Gerliandeva
Yulison Chrisnanto
Herdi Ashaury

Abstract

The Naïve Bayes (NB) algorithm is a classifier method that calculates simple probabilities and it is suitable for text classification in the context of sentiment analysis. The classic variant of NB is Multinomial Naïve Bayes (MNB). The weakness of the MNB algorithm is the assumption of feature independence. This research uses a dataset of comments and reviews from various online platforms. This study uses the proposed method to handle the weakness of the MNB algorithm, namely the use of TF-IDF feature extraction and N-grams (1-gram to 5-gram), and the use of Chi-Square feature selection, as well as handling dataset imbalance using SMOTE (oversampling and undersampling method). The results of this study show that the use of pentagram (5-gram) with data that has been oversampled by SMOTE produces the highest accuracy value of 94% and an Area Under Curve (AUC) value of 100%

Article Details

Section
Informatics

References

Abbas, M., Ali, K., Jamali, A., Ali Memon, K., & Aleem Jamali, A. (2019). Multinomial Naive Bayes Classification Model for Sentiment Analysis. IJCSNS International Journal of Computer Science and Network Security, 19(3), 62. https://doi.org/10.13140/RG.2.2.30021.40169

Amal, M. I., Rahmasita, E. S., Suryaputra, E., & Rakhmawati, N. A. (2022). Analisis Klasifikasi Sentimen Terhadap Isu Kebocoran Data Kartu Identitas Ponsel di Twitter. Jurnal Teknik Informatika Dan Sistem Informasi, 8(3), 645–660. https://doi.org/10.28932/jutisi.v8i3.5483

Anam, M. K., Triyani, ;, Fitri, A., Agustin, ;, Lusiana, ;, Muhammad, ;, Firdaus, B., Agus, ;, & Nurhuda, T. (2023). Sentiment Analysis for Online Learning using The Lexicon-Based Method and The Support Vector Machine Algorithm. ILKOM Jurnal Ilmiah, 15(2), 290–302. http://dx.doi.org/10.33096/ilkom.v15i2.1590.290-302

Ernayanti, T., Mustafid, M., Rusgiyono, A., & Hakim, A. R. (2023). Penggunaan Seleksi Fitur Chi-Square Dan Algoritma Multinomial Naïve Bayes Untuk Analisis Sentimen Pelangggan Tokopedia. Jurnal Gaussian, 11(4), 562–571. https://doi.org/10.14710/j.gauss.11.4.562-571

Farisi, A. A., Sibaroni, Y., & Faraby, S. Al. (2019). Sentiment analysis on hotel reviews using Multinomial Naïve Bayes classifier. Journal of Physics: Conference Series, 1192(1). https://doi.org/10.1088/1742-6596/1192/1/012024

Handayani, Y., Hakim, A. R., & Muljono. (2020). Sentiment analysis of Bank BNI user comments using the support vector machine method. Proceedings - 2020 International Seminar on Application for Technology of Information and Communication: IT Challenges for Sustainability, Scalability, and Security in the Age of Digital Disruption, ISemantic 2020, 202–207. https://doi.org/10.1109/iSemantic50169.2020.9234230

Hossain, E., Sharif, O., & Hoque, M. M. (n.d.). Book Reviews Using Multinomial Naïve Bayes.

Prastyo, P. H., Ardiyanto, I., & Hidayat, R. (2020). Indonesian Sentiment Analysis: An Experimental Study of Four Kernel Functions on SVM Algorithm with TF-IDF. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy, ICDABI 2020. https://doi.org/10.1109/ICDABI51230.2020.9325685

Purwarianti, A., & Crisdayanti, I. A. P. A. (2019). Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector. Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, ICAICTA 2019. https://doi.org/10.1109/ICAICTA.2019.8904199

Sholehurrohman, R., & Sabda Ilman, I. (2022). Analisis Sentimen Tweet Kasus Kebocoran Data Penggunaan Facebook Oleh Cambrigde Analytica. Jurnal Pepadun, 3(1), 140–147. https://doi.org/10.23960/pepadun.v3i1.108

Singh, G., Kumar, B., Gaur, L., & Tyagi, A. (2019). Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification. 2019 International Conference on Automation, Computational and Technology Management, ICACTM 2019, 593–596. https://doi.org/10.1109/ICACTM.2019.8776800

Surya, P. P. M., Seetha, L. V., & Subbulakshmi, B. (2019). Analysis of user emotions and opinion using Multinomial Naive Bayes Classifier. Proceedings of the 3rd International Conference on Electronics and Communication and Aerospace Technology, ICECA 2019, 410–415. https://doi.org/10.1109/ICECA.2019.8822096

Taufiqi, A. M., & Nugroho, A. (2023). Sentimen Pengguna Twitter Mengenai Isu Kebocoran Data Dengan Algoritma Naïve Bayes. Jurnal Nasional Ilmu Komputer, 4(1), 1–11. https://doi.org/10.47747/jurnalnik.v4i1.1091

Wibowo, N. I., Maulana, T. A., Muhammad, H., & Rakhmawati, N. A. (2021). Perbandingan Algoritma Klasifikasi Sentimen Twitter Terhadap Insiden Kebocoran Data Tokopedia. JISKA (Jurnal Informatika Sunan Kalijaga), 6(2), 120–129. https://doi.org/10.14421/jiska.2021.6.2.120-129

Zul, M. I., Yulia, F., & Nurmalasari, D. (2018). Social media sentiment analysis using K-means and naïve bayes algorithm. Proceedings - 2018 2nd International Conference on Electrical Engineering and Informatics: Toward the Most Efficient Way of Making and Dealing with Future Electrical Power System and Big Data Analysis, ICon EEI 2018, October, 24–29. https://doi.org/10.1109/ICon-EEI.2018.8784326