Really interesting! Can you post an HTML or text only version so I could read it using
Google Translate?
At WikiMania 07, I presented a paper that looked at how useful wiki resources like
wikipedia, wiktionary and OmegaWiki might be for the needs of translators.
One of the things we found was that in isolation, each of those resources at best covered
~30% of the translation difficulties typically encountered by professional translators for
the English-French pair. But combined, they were able to cover ~50%. We also found that
the presentation of information on Wikipedia and Wiktionary was not suited for the needs
of translators.
Based on those two findings, we proposed the idea of a robot capable of pulling
cross-lingual information from those resources and presenting in a way that is better
suited for the needs of translators. Sounds like you may have just done this!
Is there a web interface to this multilingual resource that I could try?
Alain Désilets
-----Original Message-----
From: wiki-research-l-bounces(a)lists.wikimedia.org [mailto:wiki-
research-l-bounces(a)lists.wikimedia.org] On Behalf Of Daniel Kinzler
Sent: May 30, 2008 5:54 AM
To: wiki-research-l(a)lists.wikimedia.org
Subject: [Wiki-research-l] thesis: automatically building a
multilingualthesaurus from wikipedia
My diploma thesis about a system to automatically build a multilingual
thesaurus from wikipedia, "WikiWord", is finally done. I handed it in
yesterday. My research will hopefully help to make Wikipedia more
accessible for automatic processing, especially for applications
natural languae processing, machine translation and information
retrieval. What this could mean for Wikipedia is:
better search and conceptual navigation, tools for suggesting
categories, and more.
Here's the thesis (in German, i'm afraid):
<http://brightbyte.de/DA/WikiWord.pdf>
Daniel Kinzler, "Automatischer Aufbau eines multilingualen Thesaurus
durch
Extraktion semantischer und lexikalischer Relationen aus der
Wikipedia",
Diplomarbeit an der Abteilung für Automatische Sprachverarbeitung,
Institut
für Informatik, Universität Leipzig, 2008.
For the curious,
http://brightbyte.de/DA/ also contains source code and
data.
See <http://brightbyte.de/page/WikiWord> for more information.
Some more data is for now avialable at
<http://aspra27.informatik.uni-leipzig.de/~dkinzler/rdfdumps/>. This
includes full SKOS dumps for en, de, fr, nl, and no covering about six
million concepts.
The thesis ended up being rather large... 220 pages thesis and 30k
lines of code. I'm plannign to write a research paper in english soon,
which will give an overview over WikiWord and what it can be used for.
The thesis is licensed under the GFDL, WikiWord is GPL software. All
data taken or derived from wikipedia is GFDL.
Enjoy,
Daniel
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l