Hoi, I think this is of interest to us all. Thanks, GerardM
---------- Forwarded message ---------- From: Torsten Zesch zesch@tk.informatik.tu-darmstadt.de Date: Tue, Apr 29, 2008 at 4:31 PM Subject: [Wiki-research-l] The use of Wiktionary in Natural Language Processing To: wiki-research-l@lists.wikimedia.org
In contrast to Wikipedia, Wiktionary has received little attention by the NLP research community so far.
I know of its use for subjectivity and polarity classification (Chesley et al., 2006), and for diachronic phonology (Bouchard et al., 2007).
Alexandre Bouchard, Percy Liang, Thomas Griffiths, and Dan Klein. 2007. A probabilistic approach to diachronic phonology. In Proceedings of the 2007. In Proceedings of EMNLP-CoNLL, pages 887–896.
Paula Chesley, Bruce Vincent, Li Xu, and Rohini Srihari. 2006. Using verbs and adjectives to automatically classify blog sentiment. In Proceedings of AAAI-CAAW-06, the Spring Symposia on Computational Approaches to Analyzing Weblogs.
If anybody knows of other papers that describe work where Wiktionary has been used in NLP, I would be happy to hear about it.
At UKP Lab, we have recently used Wiktionary as a lexical semantic resource for computing semantic relatedness.
Our main findings are: * Wiktionary offers an astonishing amount of lexical semantic information, but also poses new challenges due to its collaborative construction approach and the resulting occasional instance incompleteness and inconsistency.
* Wiktionary can be used as a substitute for traditional semantic networks like Princeton WordNet for some tasks, for example computing semantic relatedness. Somewhat surprisingly, it outperforms traditional wordnets as well as Wikipedia on this task.
Some recent publications devoted to this issue are:
Zesch, T.; Mueller, C. & Gurevych, I. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In Proceedings of the Conference on Language Resources and Evaluation (LREC), 2008
Abstract: Recently, collaboratively constructed resources such as Wikipedia and Wiktionary have been discovered as valuable lexical semantic knowledge bases with a high potential in diverse Natural Language Processing (NLP) tasks. Collaborative knowledge bases however significantly differ from traditional linguistic knowledge bases in various respects, and this constitutes both an asset and an impediment for research in NLP. This paper addresses one such major impediment, namely the lack of suitable programmatic access mechanisms to the knowledge stored in these large semantic knowledge bases. We present two application programming interfaces for Wikipedia and Wiktionary which are especially designed for mining the rich lexical semantic information dispersed in the knowledge bases, and provide efficient and structured access to the available knowledge. As we believe them to be of general interest to the NLP community, we have made them freely available for research purposes.
and
Zesch, T.; Mueller, C. & Gurevych, I. Using Wiktionary for Computing Semantic Relatedness. In Proceedings of AAAI, 2008
Abstract:
We introduce Wiktionary as an emerging lexical semantic resource that can be used as a substitute for expert-made resources in AI applications. We evaluate Wiktionary on the pervasive task of computing semantic relatedness for English and German by means of correlation with human rankings and solving word choice problems. For the first time, we apply a concept vector based measure to a set of different concept representations like Wiktionary pseudo glosses, the first paragraph of Wikipedia articles, English WordNet glosses, and GermaNet pseudo glosses. We show that: (i) Wiktionary is the best lexical semantic resource in the ranking task and performs comparably to other resources in the word choice task, and (ii) the concept vector based approach yields the best results on all datasets in both evaluations.
-------------------------------------------------------------------------------
UKP Lab is working on the release of a freely available Java-based API to access the lexical semantic information contained in Wiktionary. The release is scheduled for June 2008 at http://www.ukp.tu-darmstadt.de/software/.
There is also a new release of the Java-based API for Wikipedia. It is much faster now and contains a Mediawiki markup parser that can be used to analyze the contents of a Wikipedia page. The parser can also be used stand-alone to analyze further web pages using MediaWiki markup.
-Torsten
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiktionary-l@lists.wikimedia.org