Performance Comparison of Feature Selection Methods for Detecting Trojan Activity
Main Article Content
Abstract
- Viruses are malicious programs that can be harmful. One of the most dangerous viruses is the trojan virus, where the trojan virus hides on the user's device without being aware of its existence. Trojan viruses can be very difficult to spot because they hide on network devices and disguise themselves as part of the device. However, when a network device is infected by a trojan virus attack, the activities that occur on the network will be different from usual activities. In network activity, there are various parameters that cause classification to take longer to predict. In this study, various comparisons of feature reduction algorithms between Coefficient Correlation, Information Gain, PCA, and LDA were carried out and tested the combination of classification model algorithms (Random Forest, Decision Tree, KNN, Naïve Bayes, AdaBoost) to detect the best trojan activity on the internet network. faster to increase security against trojan viruses. The results of the study show that the classification with maximum accuracy with the best time is obtained by a combination of Coefficient Correlation, Information Gain, and PCA using the Decision Tree classification, using a combination of feature selection and classification methods obtained 99% accuracy and prediction time of 0.0033 seconds.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The proposed policy for journals that offer open access
Authors who publish with this journal agree to the following terms:
- Copyright on any article is retained by the author(s).
- Author grant the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
References
Al-Saadoon, G. M. W., & Al-Bayatti, H. M. Y. (2011). A Comparison of Trojan Virus Behavior in Linux and Windows Operating Systems. 1(3), 56–62. http://arxiv.org/abs/1105.1234
Choi, T. M., Wallace, S. W., & Wang, Y. (2018). Big Data Analytics in Operations Management. Production and Operations Management, 27(10), 1868–1883. https://doi.org/10.1111/poms.12838
Ghosh, J., & Shuvo, S. B. (2019). Improving Classification Model’s Performance Using Linear Discriminant Analysis on Linear Data. 2019 10th International Conference on Computing, Communication and Networking Technologies, ICCCNT 2019, 8–12. https://doi.org/10.1109/ICCCNT45670.2019.8944632
Han, X., & Tan, Q. (2010). Dynamical behavior of computer virus on Internet. Applied Mathematics and Computation, 217(6), 2520–2526. https://doi.org/10.1016/j.amc.2010.07.064
Kaur, G., & Oberai, N. (2014). A Review Article on Naïve Bayes Classifier with Various Smoothing Techniques. International Journal of Computer Science and Mobile Computing, 3(10), 864–868. www.ijcsmc.com
Kherif, F., & Latypova, A. (2019). Principal component analysis. Machine Learning: Methods and Applications to Brain Disorders, 1(C), 209–225. https://doi.org/10.1016/B978-0-12-815739-8.00012-2
Kok, C. H., Ooi, C. Y., Inoue, M., Moghbel, M., Baskara Dass, S., Choo, H. S., Ismail, N., & Hussin, F. A. (2019). Net Classification Based on Testability and Netlist Structural Features for Hardware Trojan Detection. Proceedings of the Asian Test Symposium, 2019-Decem, 105–110. https://doi.org/10.1109/ATS47505.2019.00020
Kotsiantis, S. B. (2007). Supervised Machine Learning: A Review of Classification Techniques. Hyperfine Interactions, 1, 12.
Kumar, V. (2014). Feature Selection: A literature Review. The Smart Computing Review, 4(3). https://doi.org/10.6029/smartcr.2014.03.007
Kurihara, T., & Togawa, N. (2021). Hardware-trojan classification based on the structure of trigger circuits utilizing random forests. Proceedings - 2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design, IOLTS 2021, 24–27. https://doi.org/10.1109/IOLTS52814.2021.9486700
Lu, J., Chen, Y., Herodotou, H., & Babu, S. (2018). Speedup your analytics: Automatic parameter tuning for databases and big data systems. Proceedings of the VLDB Endowment, 12(12), 1970–1973. https://doi.org/10.14778/3352063.3352112
Plotnikova, V., Dumas, M., & Milani, F. (2020). Adaptations of data mining methodologies: A systematic literature review. PeerJ Computer Science, 6, 1–43. https://doi.org/10.7717/PEERJ-CS.267
Pramono, F., Didi Rosiyadi, & Windu Gata. (2019). Integrasi N-gram, Information Gain, Particle Swarm Optimation di Naïve Bayes untuk Optimasi Sentimen Google Classroom. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 3(3), 383–388. https://doi.org/10.29207/resti.v3i3.1119
Saed-moucheshi, A., Fasihfar, E., Hasheminasab, H., Rahmani, A., & Ahmadi, A. (2013). A Review on Applied Multivariate Statistical Techniques in Agriculture and Plant Science. International Journal of Agronomy and Plant Production, 4(1), 127–141.
Sinaga, K. P., Hussain, I., & Yang, M. S. (2021). Entropy K-Means Clustering with Feature Reduction under Unknown Number of Clusters. IEEE Access, 9, 67736–67751. https://doi.org/10.1109/ACCESS.2021.3077622
Tharwat, A., Gaber, T., Ibrahim, A., & Hassanien, A. E. (2017). Linear discriminant analysis: A detailed tutorial. AI Communications, 30(2), 169–190. https://doi.org/10.3233/AIC-170729
Thimbleby, H., Anderson, S., & Cairns, P. (1998). A Framework for Modelling Trojans and Computer Virus Infection. Computer Journal, 41(7), 443–458.
Tian, R., Batten, L., Islam, R., & Versteeg, S. (2009). An automated classification system based on the strings of trojan and virus families. 2009 4th International Conference on Malicious and Unwanted Software, MALWARE 2009, 23–30. https://doi.org/10.1109/MALWARE.2009.5403021
Wu, S., & Nagahashi, H. (2014). Parameterized adaboost: Introducing a parameter to speed up the training of real adaboost. IEEE Signal Processing Letters, 21(6), 687–691. https://doi.org/10.1109/LSP.2014.2313570
Yang, M. S., & Sinaga, K. P. (2019). A feature-reduction multi-view k-means clustering algorithm. IEEE Access, 7, 114472–114486. https://doi.org/10.1109/ACCESS.2019.2934179
Yeh, W. C., Lin, E., & Huang, C. L. (2021). Predicting Spread Probability of Learning-Effect Computer Virus. Complexity, 2021. https://doi.org/10.1155/2021/6672630
Zhan, Z. H., Shi, L., Tan, K. C., & Zhang, J. (2022). A survey on evolutionary computation for complex continuous optimization. In Artificial Intelligence Review (Vol. 55, Nomor 1). Springer Netherlands. https://doi.org/10.1007/s10462-021-10042-y
Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R., Zhang, S., Zong, M., & Zhu, X. (2017). Efficient kNN Classification With Different Numbers of Nearest Neighbors. Ieee Transactions on Neural Networks and Learning Systems, 1–12. http://ieeexplore.ieee.org.