Bootstrap samples, Cluster analysis, Poisson regression, Supervised method, Unsupervised method


Combining supervised and unsupervised method can assist in the data analysis process. This research aims to apply a supervised method, i.e. Poisson regression, that is followed by an unsupervised method, namely cluster analysis of the visitors in a tourism dataset. The samples were taken 80 persons purposively from the visitors of the Flower Garden X in Serang Regency, Banten Province. The dataset consists of the number of visits, travel cost, income/ stipend per month, gender, age, distance from the place of origin, and perception, which is formed by 11 questions of facilities and services. The Poisson regression was applied in the 30, 40, and 50 bootstrap samples resulted in the perception as the significant features. Then, medoid-based cluster analysis, i.e. pam and simple k-medoids, in the perception dataset was applied. They compared simple matching and cooccurrence distances and were validated via medoid-based shadow value. It grouped the visitors into five clusters as the most suitable number of clusters. The combined methods of supervised and unsupervised provided the cleanliness as the important indicator. The improvement of the tourism object had to be focus on the cleanliness aspect.


James G, Witten D, Hastie T, Tibshirani R, “An Introduction to Statistical Learning with Applications in R” 1st ed. New Jersey, US, Springer, 2013.

Omta WA, van Heesbeen RG, Shen I, de Nobel J, Robers D, van der Velden LM. (2020). Combining Supervised and Unsupervised Machine Learning Methods for Phenotypic Functional Genomics Screening. SLAS Discovery 25 (6) 655–64.

El Aissaoui O, El Alami El Madani Y, Oughdir L, El Allioui Y. (2019). Combining Supervised and Unsupervised Machine Learning Algorithms to Predict the Learners’ Learning Styles. Procedia Computer Science 148 87-96.

Perea-Ortega JM, Martinez-Camara E, Martin-Valdivia MT, Urena-Lopez LA. (2013). Combining Supervised and Unsupervised Polarity Classification for non-English Reviews. International Conference on Intelligent Text Processing and Computational Linguistics 63–74.

Lee C. (2021). Predicting Land Prices and Measuring Uncertainty by Combining Supervised and Unsupervised Learning International. Journal of Strategic Property Management 25 (2) 169–78.

Krueger R, Beyer J, Jang WD, Kim NW, Sokolov A, Sorger PK. (2020). Facetto: Combining Unsupervised and Supervised Learning for Hierarchical Phenotype Analysis in Multi-Channel Image Data. IEEE Transactions on Visualization and Computer Graphics 26 (1) 227–37.

Bhowmik KR, Das S, Islam MA. (2020). Modelling The Number of Antenatal Care Visits in Bangladesh to Determine the Risk Factors for Reduced Antenatal Care Attendance. Plos One 15 (1) 1–17.

Uti U, Essi ID. (2020). Poisson Regression Model with Application to Doctor Visits. International Journal of Applied Science and Mathematical Theory 6 (1) 48–68.

Purbowo P, Daroini A. (2021). Determining Tourist Visits and Economic Valuation of Natural Attraction of Tretes Waterfall of Wonosalam. Agriscience 1 (3) 625–37.

Hendarto KA, Hasan RA, Yumantoko Y, Nur A, Ariawan K. (2019). The Economic Value of Recreational Benefit of Aik Nyet Nature Tourism, KPHL Rinjani Barat, An Application of The Travel Cost Method. Jurnal Penelitian Sosial dan Ekonomi Kehutanan 16 (1) 43–54.

Budiaji W. (2013). Skala Pengukuran dan Jumlah Respon Skala Likert (The Measurement Scale and The Number of Responses in Likert Scale). Jurnal Ilmu Pertanian dan Perikanan 2 (2) 127–33.

Zeileis A, Kleiber C, Jackman S. (2008). Regression Models for Count Data in R. Journal of Statistical Software 27 (8) 1–25.

Efron B, Tibshirani R, “An Introduction to the Bootstrap”, New York, London, Chapman and Hall, 1993.

Kaufman L, Rousseeuw P, “Finding Groups in Data”, New York, US, John Wiley and Sons, 1990.

Budiaji W, Leisch F. (2019). Simple K-Medoids Partitioning Algorithm for Mixed Variable Data. Algorithms 12 177.

Budiaji W. (2019). Medoid-Based Shadow Value Validation and Visualization. International Journal of Advances in Intelligent Informatics 5 76–88.

R Core Team. (2020). R: A Language and Environment for Statistical Computing [Online] (Vienna, Austria: R Foundation for Statistical Computing) Available from:

Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. (2019). cluster: Cluster Analysis Basics and Extensions [Online] (R package version 2.1.0 --- For new features, see the “Changelog” file (in the package source)) Available from:

Budiaji W. (2021). kmed: Distance-Based k-Medoids [Online] (R package version 0.4.0) Available from:

Budiaji W. (2019). Penerapan Reproducible Research pada RStudio dengan Bahasa R dan Paket Knitr. Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika 5 (1) 1–5.

Huang Z. (1997). Clustering Large Data Sets with Mixed Numeric and Categorical Values The First Pacific-Asia Conference on Knowledge Discovery and Data Mining 21–34.

Ahmad A, Dey L. (2007). A K-mean Clustering Algorithm for Mixed Numeric and Categorical Data. Data and Knowledge Engineering 63 503–27




How to Cite

Budiaji, W., Vebriana, V., & Pancawati, J. (2022). COMBINING SUPERVISED AND UNSUPERVISED METHODS IN TOURISM VISITOR DATA. Journal of Information Technology and Its Utilization, 5(1), 14–17.