AN AUTOMATED WEB PAGE CLASSIFIER AND AN ALGORITHM FOR THE EXTRACTION OF NAVIGATIONAL PATTERN FROM THE WEB DATA

Authors

  • ABDUL RAHAMAN WAHAB SAIT Research Scholar Dept. of Computer Science and Engineering Alagappa University, India
  • T. MEYYAPPAN Professor Dept. of Computer Science and Engineering Alagappa University, India

Keywords:

Web page classification, Browsing pattern, Neural fitted Q – Iteration, Weblog, Web mining, Machine learning, Reinforcement learning

Abstract

There is a demand for web intelligence in e-business and internet oriented markets. Many data crunching tools are available for the vendors to predict the customer behaviour on their website; still, there is a vacuum exist, and they fail to grab visitor attention on their products. Internet crimes are increasing exponentially with the growth of popularity of the internet. Web page classification (WPC) is a technique to classify the web page into a particular category by using its content and attributes like URL, Meta, and Title tags. Classification of web pages provides an option for an organization/ University to either block or allow a web page to the employees / students. Weblog pattern (WLP) mining is a favourite tool to extract useful patterns and deduce knowledge for the development of the website. The proposed work found the solutions for the extraction of WLP and WPC. The work has executed neural fitted Q-Iteration (NFQ) [1] method to classify Tamil and English web pages and extract the types of visitor visits the web page using a weblog. The experiment results show that there are an economic time and memory usage of the proposed method and improved percentage of accuracy comparing to existing methods.

Downloads

Download data is not yet available.

References

Martin Riedmiller, “Neural Fitted Q Iteration – First experiences with a data efficient neural reinforcement learning method”, ECML 2005, Volume 3720 of the series, Lecture note in computer science, pp. 317 – 328.

Ganesan S, Sivaneri A.I.U, and Selvaraju S.,” Evolving interest-based user groups using PSO algorithm”, International conference on recent trends in information Technology, 2014, pp. 1 – 6.

SuhasiniParvatikar and Bharti Joshi, “ Analysis of user behavior through the web usage mining”, IJCA Proceedings on International conference on advances in science and technology, IJCAST 2014 (3), Feb 2015, pp. 27 – 31.

E.Baykon, M.Henzinger, L.Marian, and I.Weber, “ Purely URL – based topic classification. In proceedings of the 18th International conference on World Wide Web, pp. 1109 – 1110, New York, USA, 2009, ACM.

KobraEtminani, Mohammad – R. Akbarzadeh – T, and NooraliRaeejiyahehsari, “ Web usage mining: users’ navigational patterns extraction from weblogs using Ant – based clusters method”, IFSA – EUSFLAT 2009, pp. 396 – 401.

C.Castillo and B.D.Davison, “Adversial web search, foundations and trends in Information Retrieval”, 4(5), pp. 377 – 486, 2010.

P.N.Bennett and N.Nguyen, “ Refined experts: improving classification in large taxonomies”, In proceedings of the 32nd International ACM SIGR conference on research and development in information retrieval, pp. 11 – 18, ACM, 2009.

XG.Qi, ”Web page classification and hierarchy adaptation”, Ph.D. Thesis, Lehigh University, January 2012.

E.Baykon, M.Henzinger, L.Marian, and I.Weber, “A comprehensive study of features and algorithms for URL – based topic classification”, ACM transactions on the web, pp. 5:15:1 – 15:29, July 2011.

SaelN.,MarkA.,andBehza H., “The Web usage mining data preprocessing and multi-level analysis on Moodle”, International conference on computer systems and Applications (AICCSA), IEEE, ACS, 27 – 30 May 2013, pp 1 – 7.

Abdul rahaman and Dr.T.Meyappan,”Data processing and transformation technique to generate pattern from the web log”, International conference on computer science and Information Systems, Oct 17 – 18,2014 Dubai(UAE), pp. 6 – 9

R.Rajalakshmi and Chandrabose Aravindan, “Web page classification using n – gram based URL features”, 2013, 5th International Conference on Advance Computing, IEEE, pp. 15 – 21.

ChakerJebari, “A pure URL – based Genre classification of web pages”, 25th International workshop on database and expert systems applications, 2014, pp.233 – 237.

Win Thanda Aung and Khin hay mar saw hla, “ Random forest classifier for multi – category classification of web pages,” IEEE Asia – pacific services computing conference, 7 – 11 Dec 2009, pp. 372 – 376.

Makoto Tsukada, Takashi Washio, Hiroshi Motoda Automatic web-page classification by using Machine Learning Methods, Web Intelligence: Research and Development, Volume 2198 of the series, Lecture Notes in Computer Science pp 303 – 313, 2001.

Dou Shen, Zheng Chen, Qiang Yang, Hua – Jun, Zeng Benyu Zhang, Yuchang Lu, Wei – Ying Ma ,Web – page classification through summarization, , Copyright 2004, ACM.

Fu, Y., Sadhu.K, and Shin M.Y.,”Clustering of web users based on access patterns”, In. Proceedings of the 5th ACM SIGKDD, 1999, International conference on knowledge discovery and data mining, Springer, San Diego.

S.Haken Yilmaz and Pinar senkul, “ Using ontology and sequence information for extracting behavior patterns from web navigation log”, IEEE International conference on Data mining workshops – 2010, pp. 549 – 556.

Sameendra samarawickrama and Lakshmanjayaratne,” Effect of named entities in web page classification”, Fourth International Conference on Computational intelligence, modeling and simulation, 2012, pp. 38 – 42.

Sudheer Reddy, Kantha Reddy M, and SitaramuluV.,”An effective data preprocessing method for web usage mining”, International conference on information communication and embedded systems, 21 – 22 Feb 2013, pp. 7 – 10.

http://www.cs.cmu.edu/~webkb/

Lin Kewen,”Analysis of preprocessing methods for web usage data”, International Conference on Measurement, Information, and control (MIC), 18 – 20 May 2012, pp. 383 – 386.

GoongWei,”A new path filling method on data preprocessing in web mining”, International conference on control engineering and communication technology (ICCECT), 7 – 9 Dec 2012, pp. 1033 – 1035.

ftp://ita.ee.lbl.gov/html/contrib/clarknet-http.html.

http://www.rahablog.com

http://www.urakkapesu.com

http://www.ijcsit.org

Igel C. and M.H Sken,” Empirical evaluation of the improved RPROP learning algorithms”, NeuroComputing, No. 50, pp. 105 – 123, 2003.

Riedmiller M., “RPROP – Description and Implementation details”, Technical Report, Jan – 1994, University of Karlsruhe.

Riedmiller M and Braun H,” A direct adaptive method for faster back – propagation learning: The RPROP algorithm”, Proceedings of International conference on neural networks, pp. 586 – 591.

Anastasiadis A.D., Magoulas G.D., and VrahatisM.N.,”An efficient improvement of the RPROP algorithm”, Proceedings of the first International Workshop on Artificial neural networks in pattern recognition, 2003.

http://www.cs.waikato.ac.nz/~ml/weka/

Downloads

Published

2016-06-27

Issue

Section

Articles