Fwd: [Wiki-research-l] The use of Wiktionary in Natural Language Processing - Wiktionary-l

30 Apr 2008


      Hoi,
I think this is of interest to us all.
Thanks,
      GerardM
---------- Forwarded message ----------
From: Torsten Zesch zesch@tk.informatik.tu-darmstadt.de
Date: Tue, Apr 29, 2008 at 4:31 PM
Subject: [Wiki-research-l] The use of Wiktionary in Natural Language
Processing
To: wiki-research-l@lists.wikimedia.org
In contrast to Wikipedia, Wiktionary has received little attention by
the NLP research community so far.
I know of its use for subjectivity and polarity classification (Chesley
et al., 2006), and for diachronic phonology (Bouchard et al., 2007).
Alexandre Bouchard, Percy Liang, Thomas Grifﬁths, and Dan Klein. 2007.
 A probabilistic approach to diachronic phonology. In Proceedings of
 the 2007. In Proceedings of EMNLP-CoNLL, pages 887–896.
Paula Chesley, Bruce Vincent, Li Xu, and Rohini Srihari. 2006.
 Using verbs and adjectives to automatically classify blog sentiment.
 In Proceedings of AAAI-CAAW-06, the Spring Symposia on Computational
 Approaches to Analyzing Weblogs.
If anybody knows of other papers that describe work where Wiktionary has
been used in NLP, I would be happy to hear about it.
At UKP Lab, we have recently used Wiktionary as a lexical semantic resource
for
computing semantic relatedness.
Our main findings are:
* Wiktionary offers an astonishing amount of lexical semantic
 information, but also poses new challenges due to its collaborative
 construction approach and the resulting occasional instance
 incompleteness and inconsistency.
* Wiktionary can be used as a substitute for traditional semantic networks
 like Princeton WordNet for some tasks, for example computing semantic
 relatedness. Somewhat surprisingly, it outperforms traditional wordnets
 as well as Wikipedia on this task.
Some recent publications devoted to this issue are:
Zesch, T.; Mueller, C. & Gurevych, I.
 Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary.
 In Proceedings of the Conference on Language Resources and Evaluation
 (LREC), 2008
Abstract:
Recently, collaboratively constructed resources such as Wikipedia and
Wiktionary have been discovered as valuable lexical semantic knowledge
bases with a high potential in diverse Natural Language Processing (NLP)
tasks. Collaborative knowledge bases however significantly differ from
traditional linguistic knowledge bases in various respects, and this
constitutes both an asset and an impediment for research in NLP. This paper
addresses one such major impediment, namely the lack of suitable
programmatic access mechanisms to the knowledge stored in these large
semantic knowledge bases. We present two application programming interfaces
for Wikipedia and Wiktionary which are especially designed for mining the
rich lexical semantic information dispersed in the knowledge bases, and
provide efficient and structured access to the available knowledge. As we
believe them to be of general interest to the NLP community, we have made
them freely available for research purposes.
and
Zesch, T.; Mueller, C. & Gurevych, I.
 Using Wiktionary for Computing Semantic Relatedness.
 In Proceedings of AAAI, 2008
Abstract:
We introduce Wiktionary as an emerging lexical semantic resource that can be
used as a substitute for expert-made resources in AI applications. We
evaluate
Wiktionary on the pervasive task of computing semantic relatedness for
English
and German by means of correlation with human rankings and solving word
choice
problems. For the ﬁrst time, we apply a concept vector based measure to a
set
of different concept representations like Wiktionary pseudo glosses, the
first
paragraph of Wikipedia articles, English WordNet glosses, and GermaNet
pseudo
glosses. We show that: (i) Wiktionary is the best lexical semantic resource
in
the ranking task and performs comparably to other resources in the word
choice
task, and (ii) the concept vector based approach yields the best results on
all
datasets in both evaluations.
-------------------------------------------------------------------------------
UKP Lab is working on the release of a freely available Java-based API to
access the lexical semantic information contained in Wiktionary.
The release is scheduled for June 2008 at
http://www.ukp.tu-darmstadt.de/software/.
There is also a new release of the Java-based API for Wikipedia.
It is much faster now and contains a Mediawiki markup parser that
can be used to analyze the contents of a Wikipedia page. The parser
can also be used stand-alone to analyze further web pages using
MediaWiki markup.
-Torsten
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l