Abstract Wikipedia give benefits:
- first, is creating multi-language corpus for machine translation learning. The big disadvantage of the existing multi-language corpuses is that most of data is from movie subtitles, which are very inaccurate.
- second, that it will data for Word Sense Disambiguation learning and WSD in many languages(!).
In abstract form should be graph of senses. Senses will be choosed from English Wordnet/UNL or English Wiktionary? UNL is piece of good work but is inactive for years and not evolves. Wiktoinary senses have plus, that are grouped by etymology – quite different senses are in other etymology group. Abstract Wikipedia will linked with Wiktionary? Wiktionary senses numbers should be now persistent , or better have unique idents. Wiktionary has advantage that senses are translated to other languages, with disadvantage that its points to words not senses in other language. Alternative Abstract Wikipedia can have own sense list with idents but how to lik with Wiktionary?
Graph: should be possibility to create text in many/all laguages. For example in English is “I saw”, in Polish “widziałemwidziałam” – Polish need gender, in Abstract form should be gender of verb, even though some languages not uses it.
Senses dictionary can grow gradually with abstract text. If I edit abstract text, editor should enforce me add word with senses to dictionary if not exists and enable me to add new sense if not exists.
Is neede:
abstract text = corpus
growing dictionary of senses
growing senses to national language senses dictionary
possibly link with Wiktionaries
Best regards,