Optimization of Sentiment Classification on Online Comments using Multinomial Naïve Bayes and TF-IDF Feature Extraction and N-grams
Main Article Content
Abstract
The Naïve Bayes (NB) algorithm is a classifier method that calculates simple probabilities and it is suitable for text classification in the context of sentiment analysis. The classic variant of NB is Multinomial Naïve Bayes (MNB). The weakness of the MNB algorithm is the assumption of feature independence. This research uses a dataset of comments and reviews from various online platforms. This study uses the proposed method to handle the weakness of the MNB algorithm, namely the use of TF-IDF feature extraction and N-grams (1-gram to 5-gram), and the use of Chi-Square feature selection, as well as handling dataset imbalance using SMOTE (oversampling and undersampling method). The results of this study show that the use of pentagram (5-gram) with data that has been oversampled by SMOTE produces the highest accuracy value of 94% and an Area Under Curve (AUC) value of 100%
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The proposed policy for journals that offer open access
Authors who publish with this journal agree to the following terms:
- Copyright on any article is retained by the author(s).
- Author grant the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
References
Abbas, M., Ali, K., Jamali, A., Ali Memon, K., & Aleem Jamali, A. (2019). Multinomial Naive Bayes Classification Model for Sentiment Analysis. IJCSNS International Journal of Computer Science and Network Security, 19(3), 62. https://doi.org/10.13140/RG.2.2.30021.40169
Amal, M. I., Rahmasita, E. S., Suryaputra, E., & Rakhmawati, N. A. (2022). Analisis Klasifikasi Sentimen Terhadap Isu Kebocoran Data Kartu Identitas Ponsel di Twitter. Jurnal Teknik Informatika Dan Sistem Informasi, 8(3), 645–660. https://doi.org/10.28932/jutisi.v8i3.5483
Anam, M. K., Triyani, ;, Fitri, A., Agustin, ;, Lusiana, ;, Muhammad, ;, Firdaus, B., Agus, ;, & Nurhuda, T. (2023). Sentiment Analysis for Online Learning using The Lexicon-Based Method and The Support Vector Machine Algorithm. ILKOM Jurnal Ilmiah, 15(2), 290–302. http://dx.doi.org/10.33096/ilkom.v15i2.1590.290-302
Ernayanti, T., Mustafid, M., Rusgiyono, A., & Hakim, A. R. (2023). Penggunaan Seleksi Fitur Chi-Square Dan Algoritma Multinomial Naïve Bayes Untuk Analisis Sentimen Pelangggan Tokopedia. Jurnal Gaussian, 11(4), 562–571. https://doi.org/10.14710/j.gauss.11.4.562-571
Farisi, A. A., Sibaroni, Y., & Faraby, S. Al. (2019). Sentiment analysis on hotel reviews using Multinomial Naïve Bayes classifier. Journal of Physics: Conference Series, 1192(1). https://doi.org/10.1088/1742-6596/1192/1/012024
Handayani, Y., Hakim, A. R., & Muljono. (2020). Sentiment analysis of Bank BNI user comments using the support vector machine method. Proceedings - 2020 International Seminar on Application for Technology of Information and Communication: IT Challenges for Sustainability, Scalability, and Security in the Age of Digital Disruption, ISemantic 2020, 202–207. https://doi.org/10.1109/iSemantic50169.2020.9234230
Hossain, E., Sharif, O., & Hoque, M. M. (n.d.). Book Reviews Using Multinomial Naïve Bayes.
Prastyo, P. H., Ardiyanto, I., & Hidayat, R. (2020). Indonesian Sentiment Analysis: An Experimental Study of Four Kernel Functions on SVM Algorithm with TF-IDF. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy, ICDABI 2020. https://doi.org/10.1109/ICDABI51230.2020.9325685
Purwarianti, A., & Crisdayanti, I. A. P. A. (2019). Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector. Proceedings - 2019 International Conference on Advanced Informatics: Concepts, Theory, and Applications, ICAICTA 2019. https://doi.org/10.1109/ICAICTA.2019.8904199
Sholehurrohman, R., & Sabda Ilman, I. (2022). Analisis Sentimen Tweet Kasus Kebocoran Data Penggunaan Facebook Oleh Cambrigde Analytica. Jurnal Pepadun, 3(1), 140–147. https://doi.org/10.23960/pepadun.v3i1.108
Singh, G., Kumar, B., Gaur, L., & Tyagi, A. (2019). Comparison between Multinomial and Bernoulli Naïve Bayes for Text Classification. 2019 International Conference on Automation, Computational and Technology Management, ICACTM 2019, 593–596. https://doi.org/10.1109/ICACTM.2019.8776800
Surya, P. P. M., Seetha, L. V., & Subbulakshmi, B. (2019). Analysis of user emotions and opinion using Multinomial Naive Bayes Classifier. Proceedings of the 3rd International Conference on Electronics and Communication and Aerospace Technology, ICECA 2019, 410–415. https://doi.org/10.1109/ICECA.2019.8822096
Taufiqi, A. M., & Nugroho, A. (2023). Sentimen Pengguna Twitter Mengenai Isu Kebocoran Data Dengan Algoritma Naïve Bayes. Jurnal Nasional Ilmu Komputer, 4(1), 1–11. https://doi.org/10.47747/jurnalnik.v4i1.1091
Wibowo, N. I., Maulana, T. A., Muhammad, H., & Rakhmawati, N. A. (2021). Perbandingan Algoritma Klasifikasi Sentimen Twitter Terhadap Insiden Kebocoran Data Tokopedia. JISKA (Jurnal Informatika Sunan Kalijaga), 6(2), 120–129. https://doi.org/10.14421/jiska.2021.6.2.120-129
Zul, M. I., Yulia, F., & Nurmalasari, D. (2018). Social media sentiment analysis using K-means and naïve bayes algorithm. Proceedings - 2018 2nd International Conference on Electrical Engineering and Informatics: Toward the Most Efficient Way of Making and Dealing with Future Electrical Power System and Big Data Analysis, ICon EEI 2018, October, 24–29. https://doi.org/10.1109/ICon-EEI.2018.8784326