Analysis On Internet Pattern of Youtube Browsing in Indonesia Using Web Crawling and Unsupervised Learning (Analisis Pola Minat Tayangan Youtube DI Indonesia dengan Web Crawling dan Supervised Learning)
Main Article Content
Abstract
YouTube is a popular video sharing website, specifically in Indonesia. Every day, in every country, the list of trending videos is updated on YouTube’s Trending page. The data of trending videos can be used for information exploration, such as analysis on the pattern of interests of YouTube browsing. This research aims to grab and analyse the metadata of trending videos to generate a classifier model and statistics of trending YouTube videos in Indonesia. The data is grabbed from YouTube’s Trending page using Scraper and Screaming Frog SEO Spider tools, every day for 10 consecutive days. The data is later classified into video categories. The approach used for this purpose is rule-based classification using J48 tree algorithm and TF-IDF filter. The result of this research shows that videos about people, blogs, sports, news, politics, comedy, entertainment and music are what interest the people in Indonesia the most.
Article Details
Authors who publish with this journal agree to the following terms:
- Author (s) hold copyrights and retain copyrights of articles if the article is accepted for publishing.
- The author grants the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material are distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Public allowed to Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material) this journal article content.
References
Afonso, Alexandre Ribeiro and Claudio Gottschalg Duque. “Automated Text Clustering of Newspaper and Scientific Texts in Brazilian Protuguese: Analysis and Comparison of Methods.” JISTEM – Journal of Information Systems and Technology Management. Brazil: University of Sao Paulo, 2014.
Backlinko. “We Analyzed 1.3 Million YouTube Videos. Here’s Wat We Learned About YouTube SEO.” Accessed on March 3rd, 2018.
https://backlinko.com/youtube-ranking-factors/
GCF Global. “What is YouTube?”. Accessed on March, 2nd 2018.
https://www.gcflearnfree.org/youtube/what-is-youtube/1/. (Official website)
Fitri, Meisya. Perancangan Sistem Temu Balik Informasi dengan Metode Pembobotan Kombiasi TF-IDF untuk Pencarian Dokumen Berbahasa Indonesia. Semarang: Universitas Tanjungpura, 2013.
Herwijayanti, Bening, et al. Klasifikasi Berita Online dengan Menggunakan Pembobotan TF-IDF dan Cosine Similarity. Jurnal Pengembangan Teknologi Informasi dan Komputer. Malang: Universitas Brawijaya, 2018.
Hootsuite Media. Indonesia Digital Landscape 2018, 2018. Accessed on February 8th, 2018. https://hootsuite.com/resources/digital-in-2018-apac/
Kawade, Dipak Ramchandra and Kavita S.Oza. “News Classification: A Data Mining Approach.” Indian Journal of Science and Technology. India: Indian Society of Education and Environment, 2016.
Langgeni, Diah Pudi, et al. “Clustering Artikel Berita Bahasa Indonesia Menggunakan Unsupervised Feature Selection.” Seminar Nasional Informatika. Yogyakarta: UPN, 2010.
Lo, Tsz-wai Rachel, et al. “Automatically Building A Stopword List for An Information Retrieval System.” Glasgow: University of Glasgow, 2005.
Loria, Steven. “Tutorial: Finding Important Words in Text Using TF-IDF”. 2013. Accessed on March 15th, 2018.
https://stevenloria.com/tf-idf/
Nirmaldasan. “The Average Sentence Length.” 2008. Accessed on March 15th, 2018.
https://strainindex.wordpress.com/2008/07/28/the-average-sentence-length/
Noviyanto, Hendri, et al. “Pengklasifikasian Laman Web Berdasarkan Genre Menggunakan URL Feature.” Seminar Nasional Teknologi Informasi dan Komunikasi. Yogyakarta: UGM, 2015.
Potthast, Martin, et al. “Clickbait Detection.” Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 16). The Netherlands: Bauhaus Universität Weimar, 2016.
Sharma, Priyanka. “Comparative Analysis of Various Decision Tree Classification Algorithms using Weka.” International Journal on Recentand Innovation Trends in Computing and Communication Volume: 3 Issue: 2. India: Auricle Technologies Pvt. Ltd, 2015.
Suadaa, Lya Hulliyyatus. “Tracking Commuter Train Intrusion Through Twitter Crawling.” Jurnal Aplikasi Statistika dan Komputasi Statistik. Jakarta: Politeknik Statistika STIS, 2016.
Waikato University. “Weka 3: Data Mining Software.” Accessed on March 3rd, 2018.
https://www.cs.waikato.ac.nz/ml/weka/
Witten, Ian H. Text Mining. New Zealand: Waikato University, 2004.
YouTube. YouTube Lesson: Video Categories. Accessed on March 5th, 2018.
https://creatoracademy.youtube.com/page/lesson/overview-categories