Performance Comparison of Feature Selection Methods for Detecting Trojan Activity

Main Article Content

Muhammad Rijal
Amil Ahmad Ilham
Ady Wahyudi Paundu


- Viruses are malicious programs that can be harmful. One of the most dangerous viruses is the trojan virus, where the trojan virus hides on the user's device without being aware of its existence. Trojan viruses can be very difficult to spot because they hide on network devices and disguise themselves as part of the device. However, when a network device is infected by a trojan virus attack, the activities that occur on the network will be different from usual activities. In network activity, there are various parameters that cause classification to take longer to predict. In this study, various comparisons of feature reduction algorithms between Coefficient Correlation, Information Gain, PCA, and LDA were carried out and tested the combination of classification model algorithms (Random Forest, Decision Tree, KNN, Naïve Bayes, AdaBoost) to detect the best trojan activity on the internet network. faster to increase security against trojan viruses. The results of the study show that the classification with maximum accuracy with the best time is obtained by a combination of Coefficient Correlation, Information Gain, and PCA using the Decision Tree classification, using a combination of feature selection and classification methods obtained 99% accuracy and prediction time of 0.0033 seconds.


Article Details

How to Cite
Rijal, M., Ilham, A. A., & Paundu, A. W. (2022). Performance Comparison of Feature Selection Methods for Detecting Trojan Activity. Jurnal Pekommas, 7(2). Retrieved from


Al-Saadoon, G. M. W., & Al-Bayatti, H. M. Y. (2011). A Comparison of Trojan Virus Behavior in Linux and Windows Operating Systems. 1(3), 56–62.

Choi, T. M., Wallace, S. W., & Wang, Y. (2018). Big Data Analytics in Operations Management. Production and Operations Management, 27(10), 1868–1883.

Ghosh, J., & Shuvo, S. B. (2019). Improving Classification Model’s Performance Using Linear Discriminant Analysis on Linear Data. 2019 10th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2019, 8–12.

Han, X., & Tan, Q. (2010). Dynamical behavior of computer virus on Internet. Applied Mathematics and Computation, 217(6), 2520–2526.

Kaur, G., & Oberai, N. (2014). A Review Article on Naïve Bayes Classifier with Various Smoothing Techniques. International Journal of Computer Science and Mobile Computing, 3(10), 864–868.

Kherif, F., & Latypova, A. (2019). Principal component analysis. Machine Learning: Methods and Applications to Brain Disorders, 1(C), 209–225.

Kok, C. H., Ooi, C. Y., Inoue, M., Moghbel, M., Baskara Dass, S., Choo, H. S., Ismail, N., & Hussin, F. A. (2019). Net Classification Based on Testability and Netlist Structural Features for Hardware Trojan Detection. Proceedings of the Asian Test Symposium, 2019-Decem, 105–110.

Kotsiantis, S. B. (2007). Supervised Machine Learning: A Review of Classification Techniques. Hyperfine Interactions, 1, 12.

Kumar, V. (2014). Feature Selection: A literature Review. The Smart Computing Review, 4(3).

Kurihara, T., & Togawa, N. (2021). Hardware-trojan classification based on the structure of trigger circuits utilizing random forests. Proceedings - 2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design, IOLTS 2021, 24–27.

Lu, J., Chen, Y., Herodotou, H., & Babu, S. (2018). Speedup your analytics: Automatic parameter tuning for databases and big data systems. Proceedings of the VLDB Endowment, 12(12), 1970–1973.

Plotnikova, V., Dumas, M., & Milani, F. (2020). Adaptations of data mining methodologies: A systematic literature review. PeerJ Computer Science, 6, 1–43.

Pramono, F., Didi Rosiyadi, & Windu Gata. (2019). Integrasi N-gram, Information Gain, Particle Swarm Optimation di Naïve Bayes untuk Optimasi Sentimen Google Classroom. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 3(3), 383–388.

Saed-moucheshi, A., Fasihfar, E., Hasheminasab, H., Rahmani, A., & Ahmadi, A. (2013). A Review on Applied Multivariate Statistical Techniques in Agriculture and Plant Science. International Journal of Agronomy and Plant Production, 4(1), 127–141.

Sinaga, K. P., Hussain, I., & Yang, M. S. (2021). Entropy K-Means Clustering with Feature Reduction under Unknown Number of Clusters. IEEE Access, 9, 67736–67751.

Tharwat, A., Gaber, T., Ibrahim, A., & Hassanien, A. E. (2017). Linear discriminant analysis: A detailed tutorial. AI Communications, 30(2), 169–190.

Thimbleby, H., Anderson, S., & Cairns, P. (1998). A Framework for Modelling Trojans and Computer Virus Infection. Computer Journal, 41(7), 443–458.

Tian, R., Batten, L., Islam, R., & Versteeg, S. (2009). An automated classification system based on the strings of trojan and virus families. 2009 4th International Conference on Malicious and Unwanted Software, MALWARE 2009, 23–30.

Wu, S., & Nagahashi, H. (2014). Parameterized adaboost: Introducing a parameter to speed up the training of real adaboost. IEEE Signal Processing Letters, 21(6), 687–691.

Yang, M. S., & Sinaga, K. P. (2019). A feature-reduction multi-view k-means clustering algorithm. IEEE Access, 7, 114472–114486.

Yeh, W. C., Lin, E., & Huang, C. L. (2021). Predicting Spread Probability of Learning-Effect Computer Virus. Complexity, 2021.

Zhan, Z. H., Shi, L., Tan, K. C., & Zhang, J. (2022). A survey on evolutionary computation for complex continuous optimization. In Artificial Intelligence Review (Vol. 55, Nomor 1). Springer Netherlands.

Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R., Zhang, S., Zong, M., & Zhu, X. (2017). Efficient kNN Classification With Different Numbers of Nearest Neighbors. Ieee Transactions on Neural Networks and Learning Systems, 1–12.