Thanks for the reminder! I guess the most relevant idea here is
Wiktionary is a large-scale, multilingual, crowd-sourced dictionary. It
features 18,689,141 articles in 171 languages maintained by 4184 active
users. Dictionary entries may contain definitions and examples, part of
speech, idioms and proverbs, synonyms, antonyms, hyperonyms and
hyponyms, related terms, phonological information in IPA notation or as
soundfile, word formation, flexion tables, etymology, images, as well as
translations into other languages. Wiktionary is an invaluable source of
To make further use of the data, it should to be transferred from its
current semi-structured document format to a semantic data format like
RDF. This can be achieved by already existing transformation software
 maintained by the DBpedia project. However, the structure of every
single language Wiktionary is different. Articles contain a varying
degree of information in varying forms. That's why the conversion
software allows for mapping the structure of Wiktionary articles to the
final RDF structure via custom mappings. At the moment, these mappings
exist for English, German, French, Russian, Greek, Vietnamese. This
means that 165 language mappings representing over 60% of the articles
are still missing.
Mappings are written in XML, using a simple regular expression syntax to
match the wiki markup. Up to this point, they were developed by native
speakers that are also versed in XML and programming.
To make the mapping approach more scaleable and allow for better
maintenance of existing mappings, the student responsible for this task
needs to develop a system that allows for easy mapping and taking into
account the diverse languages. This system might be a community project
like a mapping wiki, a mapping pipeline, a GUI or a combination thereof.
As proof of concept, a few new mappings, especially in the European
languages, should also be developed.
Mentors: Kyungtae Lim, Jim O’Regan (co-mentor)