[Wikimedia-l] The case for supporting open source machine translation

Thu Apr 25 10:35:06 UTC 2013

Denny,

very good and compelling reasoning as always. I think the argument
that we can potentially do a lot for the MT space (including open
source efforts) in part by getting our own house in order on the
dictionary side of things makes a lot of sense. I don't think it
necessarily excludes investing in open source MT efforts, but Mark
makes a good point that there are already existing institutions
pouring money into promising initiatives. Let me try to understand
some of the more complex ideas outlined in your note a bit better.

> The system I am really aiming at is a different one, and there has
> been plenty of related work in this direction: imagine a wiki where you
> enter or edit content, sentence by sentence, but the natural language
> representation is just a surface syntax for an internal structure. Your
> editing interface is a constrained, but natural language. Now, in order to
> really make this fly, both the rules for the parsers (interpreting the
> input) and the serializer (creating the output) would need to be editable
> by the community - in addition to the content itself. There are a number of
> major challenges involved, but I have by now a fair idea of how to tackle
> most of them (and I don't have the time to detail them right now).

So what would you want to enable with this? Faster bootstrapping of
content? How would it work, and how would this be superior to an
approach like the one taken in the Translate extension (basically,
providing good interfaces for 1:1 translation, tracking differences
between documents, and offering MT and translation memory based
suggestions)? Are there examples of this approach being taken
somewhere else?

Thanks,
Erik