Tagging, syntactic clustering, semantic clustering, tag disambiguationAbstract
In recent years we have experienced an increase in the usage of tags to describe resources. However, the free nature of tagging presents some challenges regarding the search and exploration of tag spaces. In order to deal with these challenges we propose the Semantic Tag Clustering Search (STCS) framework. The framework rst groups syntactic varia- tions using several measures based on the Levenshtein distance and the cosine similarity based on tag co-occurrences. We nd that a measure that combines the newly introduced variable cost Levenshtein similarity measure with the cosine similarity signicantly out- performs the other methods we evaluated in terms of precision. After grouping syntactic variations, the framework clusters semantically related tags using the cosine similarity based on tag co-occurrences. We compare the STCS framework to a state-of-the-art clustering technique and nd that the STCS framework performs signicantly better in terms of precision. For the evaluation we used a large data set gathered from Flickr, which contains all the pictures uploaded in the year 2009.
