Performance Comparison of Feature Selection Methods for Detecting Trojan Activity

Main Article Content

Muhammad Rijal
Amil Ahmad Ilham
Ady Wahyudi Paundu

Abstract

- Viruses are malicious programs that can be harmful. One of the most dangerous viruses is the trojan virus, where the trojan virus hides on the user's device without being aware of its existence. Trojan viruses can be very difficult to spot because they hide on network devices and disguise themselves as part of the device. However, when a network device is infected by a trojan virus attack, the activities that occur on the network will be different from usual activities. In network activity, there are various parameters that cause classification to take longer to predict. In this study, various comparisons of feature reduction algorithms between Coefficient Correlation, Information Gain, PCA, and LDA were carried out and tested the combination of classification model algorithms (Random Forest, Decision Tree, KNN, Naïve Bayes, AdaBoost) to detect the best trojan activity on the internet network. faster to increase security against trojan viruses. The results of the study show that the classification with maximum accuracy with the best time is obtained by a combination of Coefficient Correlation, Information Gain, and PCA using the Decision Tree classification, using a combination of feature selection and classification methods obtained 99% accuracy and prediction time of 0.0033 seconds.

Article Details

Section
Informatics

References

Al-Saadoon, G. M. W., & Al-Bayatti, H. M. Y. (2011). A Comparison of Trojan Virus Behavior in Linux and Windows Operating Systems. 1(3), 56–62. http://arxiv.org/abs/1105.1234

Choi, T. M., Wallace, S. W., & Wang, Y. (2018). Big Data Analytics in Operations Management. Production and Operations Management, 27(10), 1868–1883. https://doi.org/10.1111/poms.12838

Ghosh, J., & Shuvo, S. B. (2019). Improving Classification Model’s Performance Using Linear Discriminant Analysis on Linear Data. 2019 10th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2019, 8–12. https://doi.org/10.1109/ICCCNT45670.2019.8944632

Han, X., & Tan, Q. (2010). Dynamical behavior of computer virus on Internet. Applied Mathematics and Computation, 217(6), 2520–2526. https://doi.org/10.1016/j.amc.2010.07.064

Kaur, G., & Oberai, N. (2014). A Review Article on Naïve Bayes Classifier with Various Smoothing Techniques. International Journal of Computer Science and Mobile Computing, 3(10), 864–868. www.ijcsmc.com

Kherif, F., & Latypova, A. (2019). Principal component analysis. Machine Learning: Methods and Applications to Brain Disorders, 1(C), 209–225. https://doi.org/10.1016/B978-0-12-815739-8.00012-2

Kok, C. H., Ooi, C. Y., Inoue, M., Moghbel, M., Baskara Dass, S., Choo, H. S., Ismail, N., & Hussin, F. A. (2019). Net Classification Based on Testability and Netlist Structural Features for Hardware Trojan Detection. Proceedings of the Asian Test Symposium, 2019-Decem, 105–110. https://doi.org/10.1109/ATS47505.2019.00020

Kotsiantis, S. B. (2007). Supervised Machine Learning: A Review of Classification Techniques. Hyperfine Interactions, 1, 12.

Kumar, V. (2014). Feature Selection: A literature Review. The Smart Computing Review, 4(3). https://doi.org/10.6029/smartcr.2014.03.007

Kurihara, T., & Togawa, N. (2021). Hardware-trojan classification based on the structure of trigger circuits utilizing random forests. Proceedings - 2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design, IOLTS 2021, 24–27. https://doi.org/10.1109/IOLTS52814.2021.9486700

Lu, J., Chen, Y., Herodotou, H., & Babu, S. (2018). Speedup your analytics: Automatic parameter tuning for databases and big data systems. Proceedings of the VLDB Endowment, 12(12), 1970–1973. https://doi.org/10.14778/3352063.3352112

Plotnikova, V., Dumas, M., & Milani, F. (2020). Adaptations of data mining methodologies: A systematic literature review. PeerJ Computer Science, 6, 1–43. https://doi.org/10.7717/PEERJ-CS.267

Pramono, F., Didi Rosiyadi, & Windu Gata. (2019). Integrasi N-gram, Information Gain, Particle Swarm Optimation di Naïve Bayes untuk Optimasi Sentimen Google Classroom. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 3(3), 383–388. https://doi.org/10.29207/resti.v3i3.1119

Saed-moucheshi, A., Fasihfar, E., Hasheminasab, H., Rahmani, A., & Ahmadi, A. (2013). A Review on Applied Multivariate Statistical Techniques in Agriculture and Plant Science. International Journal of Agronomy and Plant Production, 4(1), 127–141.

Sinaga, K. P., Hussain, I., & Yang, M. S. (2021). Entropy K-Means Clustering with Feature Reduction under Unknown Number of Clusters. IEEE Access, 9, 67736–67751. https://doi.org/10.1109/ACCESS.2021.3077622

Tharwat, A., Gaber, T., Ibrahim, A., & Hassanien, A. E. (2017). Linear discriminant analysis: A detailed tutorial. AI Communications, 30(2), 169–190. https://doi.org/10.3233/AIC-170729

Thimbleby, H., Anderson, S., & Cairns, P. (1998). A Framework for Modelling Trojans and Computer Virus Infection. Computer Journal, 41(7), 443–458.

Tian, R., Batten, L., Islam, R., & Versteeg, S. (2009). An automated classification system based on the strings of trojan and virus families. 2009 4th International Conference on Malicious and Unwanted Software, MALWARE 2009, 23–30. https://doi.org/10.1109/MALWARE.2009.5403021

Wu, S., & Nagahashi, H. (2014). Parameterized adaboost: Introducing a parameter to speed up the training of real adaboost. IEEE Signal Processing Letters, 21(6), 687–691. https://doi.org/10.1109/LSP.2014.2313570

Yang, M. S., & Sinaga, K. P. (2019). A feature-reduction multi-view k-means clustering algorithm. IEEE Access, 7, 114472–114486. https://doi.org/10.1109/ACCESS.2019.2934179

Yeh, W. C., Lin, E., & Huang, C. L. (2021). Predicting Spread Probability of Learning-Effect Computer Virus. Complexity, 2021. https://doi.org/10.1155/2021/6672630

Zhan, Z. H., Shi, L., Tan, K. C., & Zhang, J. (2022). A survey on evolutionary computation for complex continuous optimization. In Artificial Intelligence Review (Vol. 55, Nomor 1). Springer Netherlands. https://doi.org/10.1007/s10462-021-10042-y

Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R., Zhang, S., Zong, M., & Zhu, X. (2017). Efficient kNN Classification With Different Numbers of Nearest Neighbors. Ieee Transactions on Neural Networks and Learning Systems, 1–12. http://ieeexplore.ieee.org.