CYBERGENRE: AUTOMATIC IDENTIFICATION OF HOME PAGES ON THE WEB
Keywords:
Web systems, genre, cybergenre, web page genreAbstract
The research reported in this paper is part of a larger project on the automatic classification of web pages by their genres. The long term goal is the incorporation of web page genre into the search process to improve the quality of the search results. In this phase, a neural net classifier was trained to distinguish home pages from non-home pages and to classify those home pages as personal home page, corporate home page or organization home page. In order to evaluate the importance of the functionality attribute of cybergenre in such classification, the web pages were characterized by the cybergenre attributes of <content, form, functionality> and the resulting classifications compared to classifications in which the web pages were characterized by the genre attributes of <content, form>. Results indicate that the classifier is able to distinguish home pages from non-home pages and within the home page genre it is able to distinguish personal from corporate home pages. Organization home pages, however, were more difficult to distinguish from personal and corporate home pages. A significant improvement was found in identifying personal and corporate home pages when the functionality attribute was included.
Downloads
References
Crowston, K. and Kwasnik, B.H., A Framework for Creating a Facetted Classification for Genres:
Addresssing Issues of Multidimensionality. in Proc. of the 37th Hawaii International Conference
on System Sciences, (IEEE Computer Society, Hawaii, 5-8 January 2004).
Crowston, K. and Williams, M., Reproduced and Emergent Genres of Communication on the
World Wide Web. in Proc. of the 30th Hawaii International Conference on System Sciences,
(IEEE Computer Society, Hawaii, 1997).
Dewdney, N., VanEss-Dykema, C. and MacMillan, R., The Form is the Substance: Classification
of Genres in Text, [http://www.elsnet.org/km2001/dewdnew.pdf] Available 14 June 2004.
Erickson, T., Social Interaction on the Net: Virtual Community as Participatory Genre. In
Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, (Maui,
Hawaii, 1997, Vol. 6, pp. 13-21).
Finn, A. and Kushmerick, N., Learning to Classify Documents According to Genre. IJCAI-03
Workshop on Computational Approaches to Style Analysis and Synthesis, (2003).
Karlgren, J. and Cutting, D., Recognizing Text Genres with Simple Metrics using Discriminant
Analysis. In Proc. of the 15th International Conference on Computational Linguistics (Coling 94),
volume II, (Kyoto, Japan, 1994., pp. 1071 – 1075).
Kessler, B. Nunberg, G. and Schutze, H., Automatic Detection of Text Genre. In Philip R. Cohen
and Wolfgang Wahlster, (eds.) Proc. of the Thirty-Fifth Annual Meeting of the Association for
Computational Linguistics and Eighth Conference of the European Chapter of the Association for
Computational Linguistics, (Association for Computational Linguistics, Somerset, New Jersey,
, pp. 32–38).
Lee, Y-B. and Myaeng, S.H., Automatic Identification of Text Genres and Their Roles in Subject-
Based Categorization. In Proc. 37th Annual Hawaii International Conference on System Sciences,
(IEEE Computer Society, Hawaii, 2004).
McLuhan, M., Is it natural that one medium should appropriate and exploit another? In Gerald E.
Stern (ed.), McLuhan: Hot and Cool. (New American Library, Signet Books, New York, 1967).
Reprinted in, Eric McLuhan and Frank Zingrone (eds.), Essential McLuhan, (House of Anansi
Press Limited, Concord, Ontario, 1995).
Rehm, G., Towards Automatic Web Genre Identification. In Proc. of the 35th Annual Hawaii
International Conference on System Sciences, (IEEE Computer Society, Hawaii, 2002).
Rosmarin, A., The Power of Genre, (University of Minneapolis Press, Minneapolis, 1985).
Roussinov, D., Crowston, K., Nilan, N., Kwasnik, B., Cai, J. and Liu, X., Genre Based Navigation
on the Web. In Proc. of the 34th Annual Hawaii International Conference on System Sciences,
(IEEE Computer Society, Maui, Hawaii, 2001).
Satamatatos, E., Fakotakis, N. and Kokkinakis, G., Text Genre Detection Using Common Word
Frequencies. In Proc. Of the 18th International Converence on Computational Linguistics, (2000).
Shepherd, M. and Watters, C., The Evolution of Cybergenres. In Proc. of the 31st Annual Hawaii
International Conference on System Sciences, (Maui, Hawaii, 1998).
Shepherd, M. and Watters, C., The Functionality Attribute of Cybergenres. In Proc. of the 32nd
Annual Hawaii International Conference on System Sciences, (Hawaii, 1999).
Shepherd, M. and Watters, C., Identifying Web Genre: Hitting A Moving Target. In Proc. of the
WWW2004 Conference. Workshop on Measureing Web Searach Effectiveness: The User
Perspective, (New York, 18 May 2004).
Wolf, M.J.P. The Medium and the Video Game. (University of Austin Press, Austin, Texas,
.
Yates, J. and Orlikowski, W., Genres of Organizational Communication: A Structurational
Approach to Studying Communication and Media. In Academy of Management Review, 17(2),
, pp. 299-326.
Yates, J., Orlikowski, W. and Rennecker, J., Collaborative Genres for Collaboration: Genre
Systems in Digital Media. In Proceedings of the Thirtieth Annual Hawaii International
Conference on System Sciences, (Maui, Hawaii, 1997, Vol. 6, pp. 50-59).