WEB SITE METADATA

ERIK  WILDE; ANURADHA  ROY

Authors

ERIK WILDE School of Information, UC Berkeley Berkeley, CA 94720, USA
ANURADHA ROY School of Information, UC Berkeley Berkeley, CA 94720, USA

Keywords:

Sitemaps, Web Site Structure, Web Site Navigation

Abstract

The currently established formats for how a Web site can publish metadata about a site's pages, the robots.txt le and sitemaps, focus on how to provide information to crawlers about where to not go and where to go on a site. This is sucient as input for crawlers, but does not allow Web sites to publish richer metadata about their site's structure, such as the navigational structure. This paper looks at the availability of Web site metadata on today's Web in terms of available information resources and quantitative aspects of their contents. Such an analysis of the available Web site metadata not only makes it easier to understand what data is available today; it also serves as the foundation for investigating what kind of information retrieval processes could be driven by that data, and what additional data could be provided by Web sites if they had richer data formats to publish metadata.

Downloads

Download data is not yet available.

References

G. Pant, P. Srinivasan, and F. Menczer. Crawling the Web. In M. Levene and A. Poulovassilis,

editors, Web Dynamics: Adapting to Change in Content, Size, Topology and Use, pages 153{178.

Springer-Verlag, Berlin, Germany, November 2004.

M. Koster. A Method for Web Robots Control. Internet Draft draft-koster-robots-00, December

B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Accessing the Deep Web. Communications of

the ACM, 50(5):94{101, May 2007.

J. Madhavan, D. Ko, L. Kot, V. Ganapathy, A. Rasmussen, and A. Halevy. Google's Deep Web

Crawl. In 34th International Conference on Very Large Data Bases, pages 1241{1252, Auckland,

New Zealand, August 2008. ACM Press.

E. Wilde. Site Metadata on the Web. In Second Workshop on Human-Computer Interaction and

Information Retrieval, Redmond, Washington, October 2008.

G. Cobena, T. Abdessalem, and Y. Hinnach. WebWatching UK Web Communities: Final Report

For The WebWatch Project. Technical Report British Library Research and Innovation Report

, British Library Research and Innovation Centre, July 1999.

M. C. Drott. Indexing Aids at Corporate Websites: The Use of Robots.txt and META Tags.

Information Processing and Management, 38(2):209{219, March 2002.

Y. Sun, Z. Zhuang, I. G. Councill, and C. L. Giles. Determining Bias to Search Engines from

Robots.txt. In 2007 IEEE/WIC/ACM International Conference on Web Intelligence, pages 149{

, Silicon Valley, California, November 2007.

Y. Sun, Z. Zhuang, and C. L. Giles. A Large-Scale Study of Robots.txt. In Posters 16th Interna-

tional World Wide Web Conference, pages 1123{1124, Ban , Alberta, May 2007. ACM Press.

Y. Sun, I. G. Councill, and C. L. Giles. BotSeer: An Automated Information System for Analyzing

Web Robots. In D. Schwabe, F. Curbera, and P. Dantzig, editors, 8th International Conference

on Web Engineering, Yorktown Heights, NY, July 2008.

S. Raghavan and H. Garcia-Molina. Representing Web Graphs. In U. Dayal, K. Ramamritham,

and T. M. Vijayaraman, editors, 19th International Conference on Data Engineering, pages 405{

, Bangalore, India, March 2003. IEEE Computer Society Press.

B. Caldwell, M. Cooper, L. G. Reid, and G. Vanderheiden. Web Content Accessibility Guidelines

0. World Wide Web Consortium, Recommendation REC-WCAG20-20081211, December 2008.

S. Ceri, P. Fraternali, and M. Matera. Conceptual Modeling of Data-Intensive Web Applications.

IEEE Internet Computing, 6(4):20{30, 2002.

D. R. Danielson. Web Navigation and the Behavioral E ects of Constantly Visible Site Maps.

Interacting with Computers, 14(5):601{618, October 2002.

WEB SITE METADATA

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

IEEE Xplore

ImpactScore

specialissue

issn

cover

Make a Submission

subreq

indexed