We would like to announce a new research paper that uses Wikipedia
for computing semantic relatedness of natural language texts.
Evgeniy Gabrilovich and Shaul Markovitch (2007).
''Computing Semantic Relatedness using Wikipedia-based Explicit Semantic
Proceedings of The 20th International Joint Conference on Artificial
Hyderabad, India, January 2007
Computing semantic relatedness of natural language texts requires
access to vast amounts of common-sense and domain-specific world
knowledge. We propose Explicit Semantic Analysis (ESA), a novel
method that represents the meaning of texts in a high-dimensional
space of concepts derived from Wikipedia. We use machine learning
techniques to explicitly represent the meaning of any text as a
weighted vector of Wikipedia-based concepts. Assessing the
relatedness of texts in this space amounts to comparing the
corresponding vectors using conventional metrics (e.g., cosine).
Compared with the previous state of the art, using ESA results in
substantial improvements in correlation of computed relatedness
scores with human judgments: from r=0.56 to 0.75 for individual
words and from r=0.60 to 0.72 for texts. Importantly, due to the use
of natural concepts, the ESA model is easy to explain to human users.
Ph.D. student in Computer Science
Department of Computer Science, Technion - Israel Institute of Technology
Technion City, Haifa 32000, Israel
Email: gabr(a)cs.technion.ac.il WWW: http://www.cs.technion.ac.il/~gabr