Abstract

Identifying topics associated with a set of documents is a common task for many applications and can be used to improve various tasks involving documents on the Web, such as search, retrieval, recommendation, and clustering. To address this problem, this paper introduces a tool, called TagTheWeb, as a proposition of a generic classification method, that relies on the knowledge expressed by the taxonomic structure of Wikipedia, based on the generation of a fingerprint through the semantic relation between nodes of the Wikipedia Category Graph. TagTheWeb can be used as a WEB interface or as an API to classify any text based resource.