IMPROVING SEARCH AND EXPLORATION IN TAG SPACES USING AUTOMATED TAG CLUSTERING
Keywords:
Tagging, syntactic clustering, semantic clustering, tag disambiguationAbstract
In recent years we have experienced an increase in the usage of tags to describe resources. However, the free nature of tagging presents some challenges regarding the search and exploration of tag spaces. In order to deal with these challenges we propose the Semantic Tag Clustering Search (STCS) framework. The framework rst groups syntactic varia- tions using several measures based on the Levenshtein distance and the cosine similarity based on tag co-occurrences. We nd that a measure that combines the newly introduced variable cost Levenshtein similarity measure with the cosine similarity signicantly out- performs the other methods we evaluated in terms of precision. After grouping syntactic variations, the framework clusters semantically related tags using the cosine similarity based on tag co-occurrences. We compare the STCS framework to a state-of-the-art clustering technique and nd that the STCS framework performs signicantly better in terms of precision. For the evaluation we used a large data set gathered from Flickr, which contains all the pictures uploaded in the year 2009.
Downloads
References
Flickr Online Photo Sharing Service: http://www.flickr.com.
Colt Libraries for High Performance Scienti c and Technical Computing in Java: http://acs.
lbl.gov/~hoschek/colt/.
Amazon Elastic Compute Cloud (Amazon EC2): http://aws.amazon.com/ec2.
SimMetrics Java Library: http://www.dcs.shef.ac.uk/~sam/simmetrics.html.
Java Universal Network Graph (JUNG) Framework: http://jung.sourceforge.net.
Wikipedia Online Encyclopedia: http://en.wikipedia.org.
G. Begelman, P. Keller, and F. Smadja. Automated tag clustering: Improving search and explo-
ration in the tag space. In Collaborative Web Tagging Workshop (WWW 2006), pages 22{26,
C. Cattuto, D. Benz, A. Hotho, and G. Stumme. Semantic grounding of tag relatedness in social
bookmarking systems. In 7th International Semantic Web Conference (ISWC 2008), pages 615{
Springer, 2008.
A. Dattolo, F. Tomasi, and F. Vitali. Towards disambiguating social tagging systems. In San
Murugesan, editor, Handbook of Research on Web 2.0, 3.0 and X.0: Technologies, Business, and
Social Applications, chapter 20, pages 349{369. IGI Global, 2010.
F. Echarte, J.J. Astrain, A. Crdoba, and J. Villadangos. Ontology of folksonomy: A new modeling
method. In Semantic Authoring, Annotation and Knowledge Markup Workshop (SAAKM 2007),
pages 28{31. CEUR-WS, 2007.
F. Echarte, J.J. Astrain, A. Crdoba, and J. Villadangos. Pattern matching techniques to identify
syntactic variations of tags in folksonomies. In 1st World Summit on The Knowledge Society
(WSKS 2008), pages 557{564. Springer, 2008.
F. Echarte, J.J. Astrain, A. Crdoba, and J. Villadangos. Improving folksonomies quality by
syntactic tag variations grouping. In 2009 ACM Symposium on Applied Computing (SAC 2009),
pages 1226{1230. ACM, 2009.
S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of
Information Science, 32(2):198{208, 2006.
R.W. Hamming. Error detecting and error correcting codes. Bell System Technical Journal,
(2):147{160, 1950.
J.J. Jiang and D.W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy.
In International Conference on Research in Computational Linguistics (ROCLING X), pages 19{
, 1997.
B. Larsen and C. Aone. Fast and e ective text mining using linear-time document clustering. In
Fifth ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD
, pages 16{22. ACM, 1999.
V.I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet
Physics Doklady, 10(8):707{710, 1966.
C.D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge
University Press, 2008.
B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluating similarity
measures for emergent semantics of social tagging. In 18th World Wide Web Conference (WWW
, pages 641{650. ACM, 2009.
A. Mathes. Folksonomies - cooperative classi cation and communication through shared meta-
data, 2004. Computer Mediated Communication, LIS590CMC (Doctoral Seminar), Gradu-
ate School of Library and Information Science, University of Illinois Urbana-Champaign http:
//www.adammathes.com/academic/computer-mediated-communication/folksonomies.html.
D.R. Millen and J. Feinberg. Using social tagging to improve social navigation. In Workshop
on the Social Navigation and Community-based Adaptation Technologies (SNC-BAT 2006) at AH
, pages 532{541, 2006.
G.A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. WordNet: An on-line lexical
database. International Journal of Lexicography, 3:235{244, 1990.
M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical
Review E, 69(2):026113, 2004.
P. Pantel. Clustering by Committee. PhD thesis, University of Alberta, 2003. http://www.
patrickpantel.com/cgi-bin/web/tools/getfile.pl?type=paper&id=2003/cbc.pdf.
J. Radelaar, A.J. Boor, D. Vandic, J.W. van Dam, F. Hogenboom, and F. Frasincar. Improving the
exploration of tag spaces using automated tag clustering. In Eleventh International Conference
on Web Engineering (ICWE 2011), volume 6757 of Lecture Notes in Computer Science, pages
{288. Springer, 2011.
M. Sanderso. and B. Croft. Deriving concept hierarchies from text. In 22nd ACM SIGIR Confer-
ence on Research and Development in Information Retrieval (SIGIR 1999), pages 206{213. ACM,
P. Schmitz. Inducing ontology from
ickr tags. In Collaborative Web Tagging Workshop (WWW
, pages 206{209, 2006.
L. Specia and E. Motta. Integrating folksonomies with the semantic web. In 4th European Semantic
Web Conference (ESWC 2007), pages 503{517. Springer, 2007.
J.W. van Dam, D. Vandic, F. Hogenboom, and F. Frasincar. Searching and browsing tag spaces
using the semantic tag clustering search framework. In 4th International Conference on Semantic
Computing (ICSC 2010), pages 436{439. IEEE, 2010.
M. van Leeuwen, F. Bonchi, B. Sigurbjrnsson, and A. Siebes. Compressing tags to nd interesting
media groups. In 18th ACM Conference on Information and Knowledge Management (CIKM
, pages 1147{1156. ACM, 2009.
D. Vandic, F. Frasincar, and F. Hogenboom. Scaling pair-wise similarity-based algorithms in
tagging spaces. In 12th International Conference on Web Engineering (ICWE 2012), volume 7387
of Lecture Notes in Computer Science, pages 46{60. Springer, 2012.
D. Vandic, J.W. van Dam, and F. Frasincar. A Semantic-Based Approach for Searching and
Browsing Tag Spaces. Decision Support Systems, 54(1):644{654, 2012.
D. Vandic, J.W. van Dam, F. Hogenboom, and F. Frasincar. A semantic clustering-based approach
for searching and browsing tag spaces. In 26th Symposium on Applied Computing (SAC 2011),
pages 1693{1699. ACM, 2011.
R. Vermaas, D. Vandic, and F. Frasincar. Incremental cosine computations for search and explo-
ration of tag spaces. In 22nd Database and Expert Systems Applications (DEXA 2012), volume
of Lecture Notes in Computer Science, pages 156{167. Springer, 2012.
C.A. Yeung, N. Gibbins, and N. Shadbolt. Contextualising tags in collaborative tagging systems.
In 20th ACM Conference on Hypertext and Hypermedia (HT 2009), pages 251{260. ACM, 2009.