[Wikimedia-l] The case for supporting open source machine translation

Mathieu Stumpf psychoslave at culture-libre.org
Wed Apr 24 13:37:39 UTC 2013


Le 2013-04-24 12:35, Denny Vrandečić a écrit :
> 3) Wiktionary could be an even more amazing resource if we would 
> finally
> tackle the issue of structuring its content more appropriately. I 
> think
> Wikidata opened a few venues to structure planning in this direction 
> and
> provide some software, but this would have the potential to provide 
> more
> support for any external project than many other things we could 
> tackle

If you have any information/idea related to Wikitionary structuration, 
please share it on https://meta.wikimedia.org/wiki/Wiktionary_future


> One idea I have been mulling over for years is basically how can we 
> use
> this advantage for the task of creating content available in many
> languages. Wikidata is an obvious attempt at that, but it really goes 
> only
> so far. The system I am really aiming at is a different one, and 
> there has
> been plenty of related work in this direction: imagine a wiki where 
> you
> enter or edit content, sentence by sentence, but the natural language
> representation is just a surface syntax for an internal structure.

I don't understand what you mean. To begin with, I doubt that sentence 
is the good scale to translate a natural language discourse. Sure some 
time you may translate one word with one word in an other language. 
Sometime you may translate a sentence with one sentence. Sometime you 
need to grab the whole paragraph, or even more, and sometime you need to 
have a whole cultural background to get the meaning of a single word in 
the current context. To my mind, natural languages deals with more than 
context free language. Could a static "internal structure" deal with 
such a dynamics?

> Your
> editing interface is a constrained, but natural language.

This is realy where I don't see how you hope to manage that.

> Now, in order to
> really make this fly, both the rules for the parsers (interpreting 
> the
> input) and the serializer (creating the output) would need to be 
> editable
> by the community - in addition to the content itself. There are a 
> number of
> major challenges involved, but I have by now a fair idea of how to 
> tackle
> most of them (and I don't have the time to detail them right now).

Well I'll be curious to have more information, like references I should 
read. Otherwise I'm affraid that what you says sounds like the Fermat's 
Last Theorem[1] and the famous margin which was too small to contain 
Fermat's alleged proof of his "last theorem".


[1] https://en.wikipedia.org/wiki/Fermat%27s_Last_Theorem


-- 
Association Culture-Libre
http://www.culture-libre.org/



More information about the Wikimedia-l mailing list