Hoi,
I think this is of interest to us all.
Thanks,
GerardM
---------- Forwarded message ----------
From: Torsten Zesch <zesch(a)tk.informatik.tu-darmstadt.de>
Date: Tue, Apr 29, 2008 at 4:31 PM
Subject: [Wiki-research-l] The use of Wiktionary in Natural Language
Processing
To: wiki-research-l(a)lists.wikimedia.org
In contrast to Wikipedia, Wiktionary has received little attention by
the NLP research community so far.
I know of its use for subjectivity and polarity classification (Chesley
et al., 2006), and for diachronic phonology (Bouchard et al., 2007).
Alexandre Bouchard, Percy Liang, Thomas Griffiths, and Dan Klein. 2007.
A probabilistic approach to diachronic phonology. In Proceedings of
the 2007. In Proceedings of EMNLP-CoNLL, pages 887–896.
Paula Chesley, Bruce Vincent, Li Xu, and Rohini Srihari. 2006.
Using verbs and adjectives to automatically classify blog sentiment.
In Proceedings of AAAI-CAAW-06, the Spring Symposia on Computational
Approaches to Analyzing Weblogs.
If anybody knows of other papers that describe work where Wiktionary has
been used in NLP, I would be happy to hear about it.
At UKP Lab, we have recently used Wiktionary as a lexical semantic resource
for
computing semantic relatedness.
Our main findings are:
* Wiktionary offers an astonishing amount of lexical semantic
information, but also poses new challenges due to its collaborative
construction approach and the resulting occasional instance
incompleteness and inconsistency.
* Wiktionary can be used as a substitute for traditional semantic networks
like Princeton WordNet for some tasks, for example computing semantic
relatedness. Somewhat surprisingly, it outperforms traditional wordnets
as well as Wikipedia on this task.
Some recent publications devoted to this issue are:
Zesch, T.; Mueller, C. & Gurevych, I.
Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary.
In Proceedings of the Conference on Language Resources and Evaluation
(LREC), 2008
Abstract:
Recently, collaboratively constructed resources such as Wikipedia and
Wiktionary have been discovered as valuable lexical semantic knowledge
bases with a high potential in diverse Natural Language Processing (NLP)
tasks. Collaborative knowledge bases however significantly differ from
traditional linguistic knowledge bases in various respects, and this
constitutes both an asset and an impediment for research in NLP. This paper
addresses one such major impediment, namely the lack of suitable
programmatic access mechanisms to the knowledge stored in these large
semantic knowledge bases. We present two application programming interfaces
for Wikipedia and Wiktionary which are especially designed for mining the
rich lexical semantic information dispersed in the knowledge bases, and
provide efficient and structured access to the available knowledge. As we
believe them to be of general interest to the NLP community, we have made
them freely available for research purposes.
and
Zesch, T.; Mueller, C. & Gurevych, I.
Using Wiktionary for Computing Semantic Relatedness.
In Proceedings of AAAI, 2008
Abstract:
We introduce Wiktionary as an emerging lexical semantic resource that can be
used as a substitute for expert-made resources in AI applications. We
evaluate
Wiktionary on the pervasive task of computing semantic relatedness for
English
and German by means of correlation with human rankings and solving word
choice
problems. For the first time, we apply a concept vector based measure to a
set
of different concept representations like Wiktionary pseudo glosses, the
first
paragraph of Wikipedia articles, English WordNet glosses, and GermaNet
pseudo
glosses. We show that: (i) Wiktionary is the best lexical semantic resource
in
the ranking task and performs comparably to other resources in the word
choice
task, and (ii) the concept vector based approach yields the best results on
all
datasets in both evaluations.
-------------------------------------------------------------------------------
UKP Lab is working on the release of a freely available Java-based API to
access the lexical semantic information contained in Wiktionary.
The release is scheduled for June 2008 at
http://www.ukp.tu-darmstadt.de/software/.
There is also a new release of the Java-based API for Wikipedia.
It is much faster now and contains a Mediawiki markup parser that
can be used to analyze the contents of a Wikipedia page. The parser
can also be used stand-alone to analyze further web pages using
MediaWiki markup.
-Torsten
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
---------- Forwarded message ----------
From: Jay Walsh <jwalsh(a)wikimedia.org>
Date: Fri, Apr 11, 2008 at 8:56 PM
Subject: [Foundation-l] Wikimedia Blog is live
To: Wikimedia Foundation Mailing List <foundation-l(a)lists.wikimedia.org>
Hi all,
We are pleased to notify you that today we flipped the switch on the Wikimedia
Foundation's official blog! Wikimedia Blog can be found at
http://blog.wikimedia.org
Some background info: this will be a space for WMF staff to post news and
information about the work we're engaged in. We'll also bring in guest
contributors, board members etc to post. Comments are pre-moderated
for the time
being, and we're hoping for lots of civility. Comments will be moderated by
several volunteer moderators and staff as necessary.
We have some basic posting guidelines (generally short, fairly simple english,
conversational etc) so we can keep it reader friendly and useful. We expect a
wide audience - media, public, users, you name it! And we'll work
hard to keep it
interesting - and regular (hopefully posts every other business day).
Always welcome your views, and your understanding about our work-in-progress :)
Hope you enjoy! Looking forward to hearing your views.
Thanks,
--
Jay Walsh
Head of Communications
WikimediaFoundation.org
+1 (415) 839 6885 x 609
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
--
Casey Brown
Cbrown1023
---
Note: This e-mail address is used for mailing lists. Personal emails sent to
this address will probably get lost.