[Wikimedia-l] The case for supporting open source machine translation

Denny Vrandečić denny.vrandecic at wikimedia.de
Thu Apr 25 14:26:22 UTC 2013


Erik,

2013/4/25 Erik Moeller <erik at wikimedia.org>

> > The system I am really aiming at is a different one, and there has
> > been plenty of related work in this direction: imagine a wiki where you
> > enter or edit content, sentence by sentence, but the natural language
> > representation is just a surface syntax for an internal structure. Your
> > editing interface is a constrained, but natural language. Now, in order
> to
> > really make this fly, both the rules for the parsers (interpreting the
> > input) and the serializer (creating the output) would need to be editable
> > by the community - in addition to the content itself. There are a number
> of
> > major challenges involved, but I have by now a fair idea of how to tackle
> > most of them (and I don't have the time to detail them right now).
>
> So what would you want to enable with this? Faster bootstrapping of
> content? How would it work, and how would this be superior to an
> approach like the one taken in the Translate extension (basically,
> providing good interfaces for 1:1 translation, tracking differences
> between documents, and offering MT and translation memory based
> suggestions)? Are there examples of this approach being taken
> somewhere else?



Not just bootstrapping the content. By having the primary content be saved
in a language independent form, and always translating it on the fly, it
would not merely bootstrap content in different languages, but it would
mean that editors from different languages would be working on the same
content. The texts in the different language is not a translation of each
other, but they are all created from the same source. There would be no
primacy of, say, English.

It would be foolish to create any such plan without reusing tools and
concepts from the Translate extension, translation memories, etc. There is
a lot of UI and conceptual goodness in these tools. The idea would be to
make them user extensible with rules.

If you want, examples of that are the bots working on some Wikipedias
currently, creating text from structured input. They are partially reusing
the same structured input, and need "merely" a translation in the way the
bots create the text to save in the given Wikipedia. I have seen some
research in the area, but they all have one or the other drawbacks, but can
and should be used as an inspiration and to inform the project (like
Allegro Controlled English, or a Chat program developed at the Open
University in Milton Keynes to allow conducting business in different
languages, etc.)

I hope this helps a bit.

Cheers,
Denny

 --
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.


More information about the Wikimedia-l mailing list