Dear all,
The GSoC deadline is in three days (March 21st) [1] and there is still time to apply. The DBpedia GSoC students are quite active this year too [2] but we can certainly handle more :) Please forward our ideas page [3] to students (Bachelor, master or PhD) working on Semantic Web & Linked Data.
Best regards, Sebastian and Dimitris
[1] [2] [3]
Thanks for the reminder! I guess the most relevant idea here is i.e.:
Wiktionary is a large-scale, multilingual, crowd-sourced dictionary. It features 18,689,141 articles in 171 languages maintained by 4184 active users. Dictionary entries may contain definitions and examples, part of speech, idioms and proverbs, synonyms, antonyms, hyperonyms and hyponyms, related terms, phonological information in IPA notation or as soundfile, word formation, flexion tables, etymology, images, as well as translations into other languages. Wiktionary is an invaluable source of dictionary data.
To make further use of the data, it should to be transferred from its current semi-structured document format to a semantic data format like RDF. This can be achieved by already existing transformation software [2] maintained by the DBpedia project. However, the structure of every single language Wiktionary is different. Articles contain a varying degree of information in varying forms. That's why the conversion software allows for mapping the structure of Wiktionary articles to the final RDF structure via custom mappings. At the moment, these mappings exist for English, German, French, Russian, Greek, Vietnamese. This means that 165 language mappings representing over 60% of the articles are still missing.
Mappings are written in XML, using a simple regular expression syntax to match the wiki markup. Up to this point, they were developed by native speakers that are also versed in XML and programming.
To make the mapping approach more scaleable and allow for better maintenance of existing mappings, the student responsible for this task needs to develop a system that allows for easy mapping and taking into account the diverse languages. This system might be a community project like a mapping wiki, a mapping pipeline, a GUI or a combination thereof. As proof of concept, a few new mappings, especially in the European languages, should also be developed.
[1] [2] Mentors: Kyungtae Lim, Jim O’Regan (co-mentor)