Nice job, please add it to WP:ACST so it can be known to a wider audience.
--
Prokonsul Piotrus aka Piotr Konieczny |||____ __ __
I=I====__|_|I__I=>
"Lennier, get us the hell out of here." |||
"Initiating 'getting the hell out of here' maneuver."
-- Ivanova and Lennier in Babylon 5:"The Hour of the Wolf"
Daniel Kinzler wrote:
> My diploma thesis about a system to automatically build a multilingual thesaurus
> from wikipedia, "WikiWord", is finally done. I handed it in yesterday. My
> research will hopefully help to make Wikipedia more accessible for automatic
> processing, especially for applications natural languae processing, machine
> translation and information retrieval. What this could mean for Wikipedia is:
> better search and conceptual navigation, tools for suggesting categories, and more.
>
> Here's the thesis (in German, i'm afraid):
http://brightbyte.de/DA/WikiWord.pdf
>
> Daniel Kinzler, "Automatischer Aufbau eines multilingualen Thesaurus durch
> Extraktion semantischer und lexikalischer Relationen aus der Wikipedia",
> Diplomarbeit an der Abteilung für Automatische Sprachverarbeitung, Institut
> für Informatik, Universität Leipzig, 2008.
>
> For the curious,
http://brightbyte.de/DA/ also contains source code and data.
> See
http://brightbyte.de/page/WikiWord for more information.
>
> Some more data is for now avialable at
>
http://aspra27.informatik.uni-leipzig.de/~dkinzler/rdfdumps/. This includes
> full SKOS dumps for en, de, fr, nl, and no covering about six million concepts.
>
> The thesis ended up being rather large... 220 pages thesis and 30k lines of
> code. I'm plannign to write a research paper in english soon, which will give an
> overview over WikiWord and what it can be used for.
>
> The thesis is licensed under the GFDL, WikiWord is GPL software. All data taken
> or derived from wikipedia is GFDL.