THE MODIFIED CONCEPT BASED FOCUSED CRAWLING USING ONTOLOGY

Authors

  • S. THENMALAR Anna University, Chennai
  • T. V. GEETHA Anna University, Chennai

Keywords:

Concept Vector, Focused Crawling, Information Retrieval, Ontology

Abstract

The major goal of focused crawlers is to crawl web pages that are relevant to a specific topic One of the important issues of focuses crawlers is the difficulty in determining which web pages are relevant to the desired topic. The ontology based web crawler uses domain ontology to estimate the semantic content of the URL and the relevancy of the URL is determined by the association metric. In concept based focused crawling a topic is represented by an overall concept vector, determined by combining concept vectors of individual pages associated with the seed URLs. The pages are ranked in comparison between concept vectors at each depth, across depths and between the overall topics indicating concept vector. However in this work, we determine and rank the seed page set from the seed URLs. We rank and filter the page sets at the succeeding depths of crawl. We propose a method to include relevant concepts from the ontology that have been missed out by the initial set of seed URLs. The performance of the proposed work is evaluated based on the two new evaluation metrics – convergence and density contour. The modified concept based focused crawling process produces the convergence value of 0.82 and with the inclusion of missing concepts produces the density contour value of 0.58.

 

Downloads

Download data is not yet available.

References

Altingoyde, I. S., and Ozgur U., Exploiting Interclass Rules for Focused Crawling. Journal of

IEEE Intelligent Systems, 19, 2004, 66-73.

Assis G. T. D., Laender A.H. F, Goncalves M. A. and Silva A. S. D., Exploiting Genre in Focused

Crawling, Proceedings of 14th International Conference on String processing and information

retrieval, 2007, 62-73.

Assis G. T. D., Laender A.H. F, Goncalves M. A., and Silva A. S. D., A Genre-Aware Approach

to Focused Crawling, Journal of World Wide Web, 12, 2009, 285-319.

Batsakis S., Petrakis E. G.M., and Milios E. E., Improving the performance of focused web

crawlers, Journal of Data Knowledge Engineering, 68, 2009, 1001-1013.

Chauhan, Naresh. and Sharma, A. K ., “Design of an agent based context driven focused crawler”,

International journal of Information Technology, 2008, 61-66.

Cheng Q., Beizhan W. and Pianpian W., Efficient focused crawling strategy using combination of

link structure and content similarity, IEEE International Symposium on IT in Medicine and

Education, pp. 1045-1048, 2008.

Ehrig M. and Maedche A., Ontology-Focused Crawling of Web Documents, Proceedings of ACM

Symposium on Applied computing, 2003, 1174-1178.

Felix A. A., Taofiki A. A., and Adetokunbo S., On Algebraic Spectrum of Ontology Evaluation,

International Journal of Advanced Computer Science and Applications, 2, 2011, 159-168.

Ganesh S., Jayaraj M., Kalyan V., Murthy, S. and Aghila, G., Ontology–based Web Crawler,

Proceedings of International Conference on Information Technology: Coding and Computing,

, 337-341.

Ghosh J. and Strehl A., Similarity-Based Text Clustering: A Comparative Study, In Grouping

Multidimensional data, Berlin-Heidelberg:Springer, 2006, 73-97.

Goyal R.K., Gupta V., Sharma V. and Mittal P., Ontology based web retrieval. Proceedings of

International Symposium of Computer Science and Technology, 2008, 141-144.

Hati D. and Kumar A., An approach for identifying URLs based on Division score and link score

in focused crawler, International Journal of Computer Applications, 2, 2010, 48-53.

Hati D., Mishra L. and Kumar A., Unvisited URL Relevancy Calculation in Focused Crawling

based on Naive Bayesian Classification, International Journal of Computer Applications, 3, 2010,

-30.

Jamali M., Sayyadi H., Hariri B. B and Abolhassani H., A method of focused crawling using

combination of link structure and content similarity, Proceedings of International Conference on

Web Intelligence, 2006, 753-756.

Kao H. Y., Lin S. H., Ho J. M. and Chen M. S., Mining web Informative Structures and Contents

based on Entropy Analysis, Journal of IEEE Transactions on Knowledge and Data Engineering,

, 2004, 41-55.

Ke Y., Deng L., Ng W. and Lee D.L., Web dynamics and their ramifications for the development

of web search engines, International Journal of Computer and Telecommunications Networking-

Web dynamics, 50, 2006, 1430-1447.

Kozanidis, Lefteris, “An ontology based focused crawler”, Proceedings of the 13th International

Conference on Natural Language and Information Systems: Applications of Natural Language to

Information Systems, NLDB '08,2008, 376—379.

Kumar, Muhesh. and Vig, Renu., “Design of CORE: Context Ontology Rule Enhanced Focused

Web Crawler”, International conference on Advances in Computing, Communication and Control,

, 494-497.

Lawrence S. and Giles C. L., “Searching the World Wide Web”, Science Journal, 280, 1998, 98-

Lokhande, Kiran. P., Honale, Sonal. S. and Gangavane, H. N., “Web Crawler Using Priority

Queue”, International Journal of Research in Advent Technology, 2014.

Luong H. P., Gauch S. and Wang Q., Ontology-based Focused Crawling, International Conference

on Information, Process, and Knowledge Management, 2009, 123-128.

Mukhopadhyay D., Biswas A. and Sinha S., A new approach to design domain specific ontology

based crawler, 10th International Conference on Information Technology, 2007, 289-291.

Nioche, Julien., “Large Scale Crawling with Apache Nutch”, ApacheCon Europe 2012.

Nutch, http://nutch.apache.org/.

Nutch Crawler, http://nutch.apache.org/downloads.html.

Pal, Anshika., Tomar, Deepak. Singh. and Shrivastava S.C., “Effective Focused Crawling Based

on Content and Link Structure Analysis”, International Journal of Computer Science and

Information Security (IJCSIS), 2009.

Thenmalar S. and Geetha T. V., Concept based Focused crawling using Ontology, International

Journal of Computer Applications, 26, 2011, 29-32.

Yang J., Kang J. and Choi J., A Focused Crawler with Document Segmentation, Proceedings of

International Conference on Intelligent Data Engineering and Automated Learning, 2005, 94-101.

Yuvarani M., Iyengar N.Ch. S. N. and Kannan A., LSCrawler: A Framework for an Enhanced

Focused Web Crawler based on Link Semantics, Proceedings of IEEE/WIC/ACM International

Conference on Web Intelligence, 2006, 794-800.

Zhuang Z., Wagle R. and Giles C. L., What's there and what's not? Focused Crawling for Missing

Documents in Digital Libraries, Proceedings of Joint Conference on digital libraries, 2005, 301-

Downloads

Published

2014-06-30

How to Cite

THENMALAR, S. ., & GEETHA, T. V. . (2014). THE MODIFIED CONCEPT BASED FOCUSED CRAWLING USING ONTOLOGY. Journal of Web Engineering, 13(5-6), 525–538. Retrieved from https://journals.riverpublishers.com/index.php/JWE/article/view/3919

Issue

Section

Articles