Pengkategorian Otomatis Artikel Ilmiah dalam Pangkalan Data Perpustakaan Digital Menggunakan Metode Kernel Graph
Main Article Content
Abstract
Artikel ilmiah dalam pangkalan data perpustakaan digital dikelompokkan dalam kategori-kategori tertentu. Pengelompokan artikel ilmiah dalam jumlah besar yang dilakukan secara manual membutuhkan sumber daya manusia yang banyak dan waktu yang tidak singkat. Penelitian ini bertujuan untuk membantu tim pengolah bahan pustaka dalam mengelompokkan artikel ilmiah sesuai dengan kategorinya masing-masing secara otomatis. Dalam penelitian ini, pengkategorian otomatis artikel ilmiah dilakukan dengan menggunakan kernel graph yang diterapkan pada graph bipartite antara dokumen artikel ilmiah dengan kata kuncinya. Lima fungsi kernel digunakan untuk menghitung nilai matriks kernel, yaitu KEGauss, KELinear, KVGauss, KVLinear dan KRW. Matriks kernel dihitung dari proyeksi satu-moda graph bipartit, lalu digunakan sebagai masukan pengklasifikasi SVM (support vector machine) dalam menentukan kategori yang tepat. Kinerja pengkategorian otomatis dihitung dari ketepatan yang merupakan perbandingan antara jumlah artikel yang dikategorikan secara tepat dengan jumlah keseluruhan artikel dalam dataset. Penerapan metode ini dalam pangkalan data ISJD (Indonesian Scientific Journal Database) menghasilkan rata-rata ketepatan yang signifikan yaitu 87,43% untuk fungsi kernel KVGauss. Sedangkan kernel lainnya memberikan hasil berturut-turut 86,14% (KELinear), 85,86% (KEGauss), 42,23% (KVLinear dan 25,15% (KRW). Hasil ini menunjukkan bahwa penggunaan metode kernel graf efektif untuk mengelompokkan artikel ilmiah ke dalam kategori yang ditentukan dalam pangkalan data perpustakaan digital.
Article Details
Authors who publish with this journal agree to the following terms:
- Author (s) hold copyrights and retain copyrights of articles if the article is accepted for publishing.
- The author grants the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material are distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Public allowed to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material) this journal article content.
References
Andrej, M., and Doreian, P.”Partitioning signed two-mode networks”. Journal of Mathematical Sociology, 33(2009): 196–221
Banerjee, S., K. Sarkar, S. Gokalp, A. Sen, and H. Davulcu. “Partitioning signed bipartite graphs for classification of individuals and organizations”. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7227 LNCS, 196–204. 2012.
Dhillon, I. S. “Co-clustering documents and words using Bipartite Spectral Graph Partitioning”. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’01, San Francisco, CA, USA,(2001) pp. 269–274.
de Paulo Faleiros,T., Rossi, R.G., and de Andrade Lopes,A. “Optimizing the class information divergence for transductive classification of texts using propagation in bipartite graphs”. Pattern Recognition Letters. (2016). http://dx.doi.org/10.1016/j.patrec.2016.04.006
Grace, G.H. and Desikan, K. “Document clustering using a new similarity measure based on energy of a bipartite graph”. Indian Journal of Science and Technology 9(40) (2010). http://dx.doi.org/10.17485/ijst/2016/v9i40/99005
Kim, H., Howland, P., & Park, H. Dimension Reduction in Text Classification with Support Vector Machines. Journal ofMachine Learning Research, 6 (2005): 37–53. https://doi.org/10.1021/bi702018v
Martın De Diego, I., A. Munoz, and J. M. Moguerza. “Methods for the combination of kernel˜ matrices within a support vector framework”. Machine Learning 78(1-2) (2010): 137–174.
Radev, D. R. “Weakly supervised graph-based methods for classification”. Ann Arbor 1001(1) (2009): 48109–1092.
Srivastava, A., A. Soto, and E. Milios. “Text clustering using one-mode projection of documentword bipartite graphs”. In Proceedings of the 28th Annual ACM Symposium on Applied Computing - SAC ’13, Coimbra, Portugal, (2013): 927–932.
Stankova, M., D. Martens, and F. Provost. “Classification over Bipartite Graphs through Projection”. Technical Report D/2015/1169/001, University of Antwerp, Antwerp, Belgium Research. 2015.
Sugiyama, M. and K. Borgwardt. “Halting in Random Walk Kernels”. Advances in Neural Information Processing Systems (Section 2) (2015): 1639–1647.
Yoo, I., X. Hu, and I.-y. Song. “Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering”. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’06, Philadelphia, PA, USA, (2006): 791.
Zha, H., X. He, C. Ding, M. Gu, and H. Simon. “Bipartite graph partitioning and data clustering”. In Proceedings of the tenth international conference on Information and knowledge management - CIKM ’01, Volume pages, Atlanta, Georgia, USA, (2001): 25.
Zha, H. and X. Ji. “Correlating multilingual documents via bipartite graph modeling”. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’02, Tampere, Finland, (2002): 443.