IMPROVING SEARCH AND EXPLORATION IN TAG SPACES USING AUTOMATED TAG CLUSTERING

  • JONI RADELAAR Econometric Institute, Erasmus University Rotterdam Burgemeester Oudlaan 51, PO Box 1738, NL-3000 DR Rotterdam, the Netherlands
  • AART-JAN BOOR Econometric Institute, Erasmus University Rotterdam Burgemeester Oudlaan 51, PO Box 1738, NL-3000 DR Rotterdam, the Netherlands
  • DAMIR VANDIC Econometric Institute, Erasmus University Rotterdam Burgemeester Oudlaan 51, PO Box 1738, NL-3000 DR Rotterdam, the Netherlands
  • JAN-WILLEM VAN DAM Econometric Institute, Erasmus University Rotterdam Burgemeester Oudlaan 51, PO Box 1738, NL-3000 DR Rotterdam, the Netherlands
  • FLAVIUS FASINCAR Econometric Institute, Erasmus University Rotterdam Burgemeester Oudlaan 51, PO Box 1738, NL-3000 DR Rotterdam, the Netherlands
Keywords: Tagging, syntactic clustering, semantic clustering, tag disambiguation

Abstract

In recent years we have experienced an increase in the usage of tags to describe resources. However, the free nature of tagging presents some challenges regarding the search and exploration of tag spaces. In order to deal with these challenges we propose the Semantic Tag Clustering Search (STCS) framework. The framework rst groups syntactic varia- tions using several measures based on the Levenshtein distance and the cosine similarity based on tag co-occurrences. We nd that a measure that combines the newly introduced variable cost Levenshtein similarity measure with the cosine similarity signicantly out- performs the other methods we evaluated in terms of precision. After grouping syntactic variations, the framework clusters semantically related tags using the cosine similarity based on tag co-occurrences. We compare the STCS framework to a state-of-the-art clustering technique and nd that the STCS framework performs signicantly better in terms of precision. For the evaluation we used a large data set gathered from Flickr, which contains all the pictures uploaded in the year 2009.

 

Downloads

Download data is not yet available.

References

Flickr Online Photo Sharing Service: http://www.flickr.com.

Colt Libraries for High Performance Scienti c and Technical Computing in Java: http://acs.

lbl.gov/~hoschek/colt/.

Amazon Elastic Compute Cloud (Amazon EC2): http://aws.amazon.com/ec2.

SimMetrics Java Library: http://www.dcs.shef.ac.uk/~sam/simmetrics.html.

Java Universal Network Graph (JUNG) Framework: http://jung.sourceforge.net.

Wikipedia Online Encyclopedia: http://en.wikipedia.org.

G. Begelman, P. Keller, and F. Smadja. Automated tag clustering: Improving search and explo-

ration in the tag space. In Collaborative Web Tagging Workshop (WWW 2006), pages 22{26,

C. Cattuto, D. Benz, A. Hotho, and G. Stumme. Semantic grounding of tag relatedness in social

bookmarking systems. In 7th International Semantic Web Conference (ISWC 2008), pages 615{

Springer, 2008.

A. Dattolo, F. Tomasi, and F. Vitali. Towards disambiguating social tagging systems. In San

Murugesan, editor, Handbook of Research on Web 2.0, 3.0 and X.0: Technologies, Business, and

Social Applications, chapter 20, pages 349{369. IGI Global, 2010.

F. Echarte, J.J. Astrain, A. Crdoba, and J. Villadangos. Ontology of folksonomy: A new modeling

method. In Semantic Authoring, Annotation and Knowledge Markup Workshop (SAAKM 2007),

pages 28{31. CEUR-WS, 2007.

F. Echarte, J.J. Astrain, A. Crdoba, and J. Villadangos. Pattern matching techniques to identify

syntactic variations of tags in folksonomies. In 1st World Summit on The Knowledge Society

(WSKS 2008), pages 557{564. Springer, 2008.

F. Echarte, J.J. Astrain, A. Crdoba, and J. Villadangos. Improving folksonomies quality by

syntactic tag variations grouping. In 2009 ACM Symposium on Applied Computing (SAC 2009),

pages 1226{1230. ACM, 2009.

S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of

Information Science, 32(2):198{208, 2006.

R.W. Hamming. Error detecting and error correcting codes. Bell System Technical Journal,

(2):147{160, 1950.

J.J. Jiang and D.W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy.

In International Conference on Research in Computational Linguistics (ROCLING X), pages 19{

, 1997.

B. Larsen and C. Aone. Fast and e ective text mining using linear-time document clustering. In

Fifth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD

, pages 16{22. ACM, 1999.

V.I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet

Physics Doklady, 10(8):707{710, 1966.

C.D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge

University Press, 2008.

B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluating similarity

measures for emergent semantics of social tagging. In 18th World Wide Web Conference (WWW

, pages 641{650. ACM, 2009.

A. Mathes. Folksonomies - cooperative classi cation and communication through shared meta-

data, 2004. Computer Mediated Communication, LIS590CMC (Doctoral Seminar), Gradu-

ate School of Library and Information Science, University of Illinois Urbana-Champaign http:

//www.adammathes.com/academic/computer-mediated-communication/folksonomies.html.

D.R. Millen and J. Feinberg. Using social tagging to improve social navigation. In Workshop

on the Social Navigation and Community-based Adaptation Technologies (SNC-BAT 2006) at AH

, pages 532{541, 2006.

G.A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. WordNet: An on-line lexical

database. International Journal of Lexicography, 3:235{244, 1990.

M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical

Review E, 69(2):026113, 2004.

P. Pantel. Clustering by Committee. PhD thesis, University of Alberta, 2003. http://www.

patrickpantel.com/cgi-bin/web/tools/getfile.pl?type=paper&id=2003/cbc.pdf.

J. Radelaar, A.J. Boor, D. Vandic, J.W. van Dam, F. Hogenboom, and F. Frasincar. Improving the

exploration of tag spaces using automated tag clustering. In Eleventh International Conference

on Web Engineering (ICWE 2011), volume 6757 of Lecture Notes in Computer Science, pages

{288. Springer, 2011.

M. Sanderso. and B. Croft. Deriving concept hierarchies from text. In 22nd ACM SIGIR Confer-

ence on Research and Development in Information Retrieval (SIGIR 1999), pages 206{213. ACM,

P. Schmitz. Inducing ontology from

ickr tags. In Collaborative Web Tagging Workshop (WWW

, pages 206{209, 2006.

L. Specia and E. Motta. Integrating folksonomies with the semantic web. In 4th European Semantic

Web Conference (ESWC 2007), pages 503{517. Springer, 2007.

J.W. van Dam, D. Vandic, F. Hogenboom, and F. Frasincar. Searching and browsing tag spaces

using the semantic tag clustering search framework. In 4th International Conference on Semantic

Computing (ICSC 2010), pages 436{439. IEEE, 2010.

M. van Leeuwen, F. Bonchi, B. Sigurbjrnsson, and A. Siebes. Compressing tags to nd interesting

media groups. In 18th ACM Conference on Information and Knowledge Management (CIKM

, pages 1147{1156. ACM, 2009.

D. Vandic, F. Frasincar, and F. Hogenboom. Scaling pair-wise similarity-based algorithms in

tagging spaces. In 12th International Conference on Web Engineering (ICWE 2012), volume 7387

of Lecture Notes in Computer Science, pages 46{60. Springer, 2012.

D. Vandic, J.W. van Dam, and F. Frasincar. A Semantic-Based Approach for Searching and

Browsing Tag Spaces. Decision Support Systems, 54(1):644{654, 2012.

D. Vandic, J.W. van Dam, F. Hogenboom, and F. Frasincar. A semantic clustering-based approach

for searching and browsing tag spaces. In 26th Symposium on Applied Computing (SAC 2011),

pages 1693{1699. ACM, 2011.

R. Vermaas, D. Vandic, and F. Frasincar. Incremental cosine computations for search and explo-

ration of tag spaces. In 22nd Database and Expert Systems Applications (DEXA 2012), volume

of Lecture Notes in Computer Science, pages 156{167. Springer, 2012.

C.A. Yeung, N. Gibbins, and N. Shadbolt. Contextualising tags in collaborative tagging systems.

In 20th ACM Conference on Hypertext and Hypermedia (HT 2009), pages 251{260. ACM, 2009.

Published
2014-05-30
Section
Articles