AN INTEGRATED TECHNIQUE FOR WEB SITE USAGE SEMANTIC ANALYSIS: THE ORGAN SYSTEM
Keywords:
Web Usage Mining, Web Traffic Analysis, Knowledge Acquisition, OWL ontologyAbstract
In this work, a new log analysis system is proposed and implemented, called ORGAN (Ontology-oRiented usaGe ANalysis system). ORGAN aims to enhance and ease log analysis by using semantic knowledge.It is able to offer typical statistical analysis of Web usage logs taking into consideration at the same time site’s underlying semantics. We evaluated ORGAN using Web site data for different cases to verify and exhibit its promising behavior. The experimental outcomes were encouraging and valuable conclusions for the Web site usage under analysis were reached. Consequently, we believe and show paradigms that ORGAN could become a useful tool for Web log analysts and assist the Web site managers in the decision-making for reorganization tasks. Finally, we discuss open problems to motivate further research efforts towards the incorporation of semantic Web technologies into Web site log mining analysis.
Downloads
References
Berkhin, P., Becher, J.D. and Randall, D.J. Interactive path analysis of Web site traffic.
Proceedings of KDD01, 2001, pp. 414-419.
Botafogo, R.A., Rivlin, E. and Shneiderman, B. Structural Analysis of Hypertext: Identifying
Hierarchies and Useful Metrics. ACM Transactions on Information Systems, April 1992, vol. 10,
no 2, 142-180.
Catledge, L. D. and Pitkow, J. E. "Characterizing browsing strategies in the World Wide Web.
Computer Network and ISDN Systems, 1995, vol. 27, 1065-1073.
Chen, MS, Park, JS, Yu, JS, Seoul, K. Data mining for path traversal patterns in a Web
environment. In Proceedings of the 16th International Conference on Distributed Computing
Systems, 1996, 385-392.
Christopoulou, E., Garofalakis, J., Makris, C., Panagis, Y., Psaras-Chatzigeorgiou, A.,
Sakkopoulos, E., Tsakalidis, A. Techniques and Metrics For Improving Website Structure. Journal
of Web Engineering, 2003, 2(1-2): 90-104.
Cooley, R. The use of Web structure and content to identify subjectively interesting Web usage
patterns. ACM Transactions on Internet Technology, 2003, portal.acm.org.
Dai, H. and Mobasher, B. Integrating semantic knowledge with Web usage mining for
personalization. Web Mining: Applications and Techniques, A. Scime (Ed.), Hershey: Idea
GroupPublishing, 2004, 276-306.
Eirinaki, M., Vazirgiannis, M., Varlamis, I. SEWeP: Using Site Semantics and a Taxonomy to
Enhance the Web Personalization Process. In Proceedings of the 9th SIGKDD Conference, 2003.
Extended Log File Format Specification. http://www.w3.org/TR/WD-logfile.html.
Garofalakis, J., Kappos, P. & Mourloukos, D. Web Site Optimization Using Page Popularity.
IEEE Internet Computing, ,1999, 3(4): 22-29.
Google Web Apis Home Page. http://www.google.com.gr/apis/.
Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M. THESUS: Organizing Web Document
Collections Based on Link Semantics, in VLDB Journal, special issue on Semantic Web, 2003.
Hong, J.I., Heer, J., Waterson, S., Landay, J.A. WebQuilt: A proxy-based approach to remote Web
usability testing. ACM Transactions on Information Systems, 2001, Vol 19, no 3, 263-285.
Index the Web with .NET. http://www.vsj.co.uk/articles/display.asp?id=407.
Jansen, B. J. Search log analysis: What is it; what's been done; how to do it. Library and
Information Science Research, 2006, 28(3), 407-432
Jin, X., Zhou, Y. and Mobasher, B. Web usage mining based on probabilistic latent semantic
analysis. In Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, ACM Press, 2004, 197–205.
Kosala, R. and Blockeel, H. Web Mining Research: A Survey. ACM SIGKDD, July 2000.
Microsoft Visual Studio. http://msdn.microsoft.com/vstudio/.
Miller,GA. WordNet: A lexical database for English. Communications of the ACM, 1995,
(11):39--41.
Pfizer Glossary. http://www.pfizer.com/pfizer/privacy/ mn_privacy_glossary.jsp.
Pitkow, J.E., Bharat, K.A. Webviz: A Tool For World-Wide Web Access Log Analysis. In
Proceedings of 1st World Wide Web Conference (WWW1), Geneva, Switzerland, May 1994,
–277. Elsevier Science BV, Amsterdam, 1994.
Srikant, R. and Yang, Y. Mining Web logs to improve Website organization. In Proceedings of
the WWW10, Hong-Kong, May 2001, 430-437.
The DARPA Agent Markup Language Web Site. http://www.daml.org/ontologies/64.
The Protégé Ontology Editor and Knowledge Acquisition System.
Visual C# Developer Center. http://msdn.microsoft.com/vcsharp/.
Web reference:Analog. http://www.analog.cx.
Web reference:SurfStats. http://www.surfstats.com.
Web reference:Web Trends. http://www.Webtrends.com.
Web reference:WebLogs. http://www.cape.com.
Web Services Activity. http://www.w3.org/2002/ws/.
Wu and Palmer, M. Verb semantics and lexical selection. In Proceedings of the 32nd Annual
Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, 1994, 133–