WEBPAGE CLUSTERING – TAKING THE ZERO STEP: A CASE STUDY OF AN IRANIAN WEBSITE
The expansion of websites and their too many pages not only have pushed their visitors to frustration but also have made the websites ever more difficult to be managed and controlled by their owners. In the past few years data mining (clustering) has been of great help so as to assist website’s owner to address the complexities related to owners’ extracting their visitor’s preferences and their coming to know their websites properly. In this line of literature, this paper contains several parts and features. First, with regard to the fact that SOM has been the popular algorithm in dealing with page clustering, a comparison between SOM and K-means (another popular clustering algorithm) were performed to show the superiority of SOM in dealing with the task of webpage clustering. Second, due to the clustering tasks’ complication not being able to be tested (unlike Classification), this study aims at proposing a mind-set by which one before taking any other actions has to go through some steps in order to choose the best set of data. Thirdly, looking at the literature, one can see the question about the suitability of types of data (content, structure and usage) and the task they are being used for has never been raised. Using an Iranian website’s data, a field study and SOM algorithm, we presented that the popular belief about the type of data and the task they are appropriate for should be open to doubt. It was also depicted that different sets of data in two chosen tasks – webpage profiling and extracting visitors’ preference - can influence the results tremendously. Last but not least, apart from observing the influence of different sets of data, both data mining tasks have been performed to the end and the results are presented in the paper. Additionally, using the second clustering task’s results (the extraction of visitors’ preferences) a novel recommendation system is presented. The recommendation system in question was installed in the website for more than a month and its influence on the whole website is observed and analysed.
S.-T. Yuan, H.-S. Chen, A study on VRM-awareness enterprise websites, Expert Systems with Applications, 22 (2002) 147-162.
S. Park, N.C. Suresh, B.-K. Jeong, Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm, Data & Knowledge Engineering, 65 (2008) 512-543.
M.J.A. Berry, The Virtuous Cycle of Data Mining, in: Data Mining Techniques For Marketing, Sales, and Customer Relationship Management, Wiley, Indiana, 2004.
C.-C. Lin, L.-C. Tseng, Website reorganization using an ant colony system, Expert Systems with Applications, 37 (2010) 7598-7605.
K.A. Smith, A. Ng, Web page clustering using a self-organizing map of user navigation patterns, Decision Support Systems, 35 (2003) 245-256.
B. Prasetyo, I. Pramudiono, K. Takahashi, M. Kitsuregawa, Naviz:Website Navigational Behavior Visualizer, in: M.-S. Chen, P. Yu, B. Liu (Eds.) Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2002, pp. 276-289.
T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, J. Honkela, V. Paatero, A. Saarela, Self organization of a massive document collection, Neural Networks, IEEE Transactions on, 11 (2000) 574-585.
S.-H. Huang, H.-R. Ke, W.-P. Yang, Structure clustering for Chinese patent documents, Expert Systems with Applications, 34 (2008) 2290-2297.
Z. Su, Q. Yang, H. Zhang, X. Xu, Y.-H. Hu, S. Ma, Correlation-based web document clustering for adaptive web interface design, Knowledge and Information Systems, 4 (2002) 151-167.
A. Ypma, E. Ypma, T. Heskes, Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov models, Proceedings of the International Workshop on Web Knowledge Discovery and Data Mining, Edmonton, Canada, (2002) 31--43.
D. Qi, C.-c. Li, Self-Organizing Map based Web Pages Clustering using Web Logs.
Y. Kim, Weighted order-dependent clustering and visualization of web navigation patterns, Decision Support Systems, 43 (2007) 1630-1645.
G.E. Tsekouras, C. Anagnostopoulos, D. Gavalas, E. Dafni, Classification of Web Documents using Fuzzy Logic Categorical Data Clustering, International Federation for Information Processing, 247 (2007) 93-100.
T. Kohonen, Self-organizing maps, Springer, 2001.
M.J. Berry, G.S. Linoff, Artificial Neural Networks, in: Data mining techniques: for marketing, sales, and customer relationship management, Wiley. com, 2004.
P.-N. Tan, Introduction to data mining, Pearson Education India, 2007.
D. Delling, M. Gaertler, R. Görke, Z. Nikoloski, D. Wagner, How to evaluate clustering techniques, Univ., Fak. für Informatik, Bibliothek, 2006.
M. Meilă, Comparing clusterings—an information based distance, Journal of Multivariate Analysis, 98 (2007) 873-895.