COMBINING SUPERVISED AND UNSUPERVISED METHODS IN TOURISM VISITOR DATA
Keywords:Bootstrap samples, Cluster analysis, Poisson regression, Supervised method, Unsupervised method
Combining supervised and unsupervised method can assist in the data analysis process. This research aims to apply a supervised method, i.e. Poisson regression, that is followed by an unsupervised method, namely cluster analysis of the visitors in a tourism dataset. The samples were taken 80 persons purposively from the visitors of the Flower Garden X in Serang Regency, Banten Province. The dataset consists of the number of visits, travel cost, income/ stipend per month, gender, age, distance from the place of origin, and perception, which is formed by 11 questions of facilities and services. The Poisson regression was applied in the 30, 40, and 50 bootstrap samples resulted in the perception as the significant features. Then, medoid-based cluster analysis, i.e. pam and simple k-medoids, in the perception dataset was applied. They compared simple matching and cooccurrence distances and were validated via medoid-based shadow value. It grouped the visitors into five clusters as the most suitable number of clusters. The combined methods of supervised and unsupervised provided the cleanliness as the important indicator. The improvement of the tourism object had to be focus on the cleanliness aspect.
James G, Witten D, Hastie T, Tibshirani R, “An Introduction to Statistical Learning with Applications in R” 1st ed. New Jersey, US, Springer, 2013.
Omta WA, van Heesbeen RG, Shen I, de Nobel J, Robers D, van der Velden LM. (2020). Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening. SLAS Discovery 25 (6) 655–64.
El Aissaoui O, El Alami El Madani Y, Oughdir L, El Allioui Y. (2019). Combining Supervised and Unsupervised Machine Learning Algorithms to Predict the Learners’ Learning Styles. Procedia Computer Science 148 87-96.
Perea-Ortega JM, Martinez-Camara E, Martin-Valdivia MT, Urena-Lopez LA. (2013). Combining Supervised and Unsupervised Polarity Classification for non-English Reviews. International Conference on Intelligent Text Processing and Computational Linguistics 63–74.
Lee C. (2021). Predicting Land Prices and Measuring Uncertainty by Combining Supervised and Unsupervised Learning International. Journal of Strategic Property Management 25 (2) 169–78.
Krueger R, Beyer J, Jang WD, Kim NW, Sokolov A, Sorger PK. (2020). Facetto: Combining Unsupervised and Supervised Learning for Hierarchical Phenotype Analysis in Multi-Channel Image Data. IEEE Transactions on Visualization and Computer Graphics 26 (1) 227–37.
Bhowmik KR, Das S, Islam MA. (2020). Modelling The Number of Antenatal Care Visits in Bangladesh to Determine the Risk Factors for Reduced Antenatal Care Attendance. Plos One 15 (1) 1–17.
Uti U, Essi ID. (2020). Poisson Regression Model with Application to Doctor Visits. International Journal of Applied Science and Mathematical Theory 6 (1) 48–68.
Purbowo P, Daroini A. (2021). Determining Tourist Visits and Economic Valuation of Natural Attraction of Tretes Waterfall of Wonosalam. Agriscience 1 (3) 625–37.
Hendarto KA, Hasan RA, Yumantoko Y, Nur A, Ariawan K. (2019). The Economic Value of Recreational Benefit of Aik Nyet Nature Tourism, KPHL Rinjani Barat, An Application of The Travel Cost Method. Jurnal Penelitian Sosial dan Ekonomi Kehutanan 16 (1) 43–54.
Budiaji W. (2013). Skala Pengukuran dan Jumlah Respon Skala Likert (The Measurement Scale and The Number of Responses in Likert Scale). Jurnal Ilmu Pertanian dan Perikanan 2 (2) 127–33.
Zeileis A, Kleiber C, Jackman S. (2008). Regression Models for Count Data in R. Journal of Statistical Software 27 (8) 1–25.
Efron B, Tibshirani R, “An Introduction to the Bootstrap”, New York, London, Chapman and Hall, 1993.
Kaufman L, Rousseeuw P, “Finding Groups in Data”, New York, US, John Wiley and Sons, 1990.
Budiaji W, Leisch F. (2019). Simple K-Medoids Partitioning Algorithm for Mixed Variable Data. Algorithms 12 177.
Budiaji W. (2019). Medoid-Based Shadow Value Validation and Visualization. International Journal of Advances in Intelligent Informatics 5 76–88.
R Core Team. (2020). R: A Language and Environment for Statistical Computing [Online] (Vienna, Austria: R Foundation for Statistical Computing) Available from: https://www.R-project.org/
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. (2019). cluster: Cluster Analysis Basics and Extensions [Online] (R package version 2.1.0 --- For new features, see the “Changelog” file (in the package source)) Available from: https://CRAN.R-project.org/package=cluster
Budiaji W. (2021). kmed: Distance-Based k-Medoids [Online] (R package version 0.4.0) Available from: https://CRAN.R-project.org/package=kmed
Budiaji W. (2019). Penerapan Reproducible Research pada RStudio dengan Bahasa R dan Paket Knitr. Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika 5 (1) 1–5.
Huang Z. (1997). Clustering Large Data Sets with Mixed Numeric and Categorical Values The First Pacific-Asia Conference on Knowledge Discovery and Data Mining 21–34.
Ahmad A, Dey L. (2007). A K-mean Clustering Algorithm for Mixed Numeric and Categorical Data. Data and Knowledge Engineering 63 503–27
How to Cite
Copyright (c) 2022 Journal of Information Technology and Its Utilization
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The proposed policy for journals that offer open access
Authors who publish with this journal agree to the following terms:
- Copyright on any article is retained by the author(s).
- Author grant the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License