[Wikimedia-l] The case for supporting open source machine translation

Brion Vibber bvibber at wikimedia.org
Thu Apr 25 15:07:05 UTC 2013


On Thu, Apr 25, 2013 at 7:26 AM, Denny Vrandečić <
denny.vrandecic at wikimedia.de> wrote:

> Not just bootstrapping the content. By having the primary content be saved
> in a language independent form, and always translating it on the fly, it
> would not merely bootstrap content in different languages, but it would
> mean that editors from different languages would be working on the same
> content. The texts in the different language is not a translation of each
> other, but they are all created from the same source. There would be no
> primacy of, say, English.
>

You are blowing my mind, dude. :)

I suspect this approach won't serve for everything, but it sounds
*awesome*. If we can tie natural-language statements directly to data nodes
(rather than merely annotating vague references like we do today), then
we'd be much better able to keep language versions in sync. How to make
them sane to edit... sounds harder. :)

It would be foolish to create any such plan without reusing tools and
> concepts from the Translate extension, translation memories, etc. There is
> a lot of UI and conceptual goodness in these tools. The idea would be to
> make them user extensible with rules.
>
>
Heck yeah!

If you want, examples of that are the bots working on some Wikipedias
> currently, creating text from structured input. They are partially reusing
> the same structured input, and need "merely" a translation in the way the
> bots create the text to save in the given Wikipedia. I have seen some
> research in the area, but they all have one or the other drawbacks, but can
> and should be used as an inspiration and to inform the project (like
> Allegro Controlled English, or a Chat program developed at the Open
> University in Milton Keynes to allow conducting business in different
> languages, etc.)
>

Yessss... make them real-time updatable instead of one-time bots producing
language which can't be maintained.

-- brion


More information about the Wikimedia-l mailing list