[Wikimedia-l] Human-assisted machine translation (it was: "The case for supporting open source machine translation")

David Goodman dggenwp at gmail.com
Sat May 4 20:09:37 UTC 2013

For the purposes of Wikipedia, it would be very useful to have machine or
machine assisted translation specialized to specific subject domain widely
represented in articles, especially those that might rely substantially on
the standardized data in the infoboxes or subjects that use a relatively
standardized vocabulary and have an international appeal. . (obviously the
wikidata project is a first approximation to this in some ways) . If we
confined ourselves for example to articles on football, or on classical
music, a very large number of the possible ambiguities would not be
present, that would be many fixed phrases that could be translated intact,
and even the possible sentence structures would be relatively predictable.
 This should be true even for non-Eurpean based languages, as long as they
have the vocabulary to discuss the subject and representative texts.

Even where the actual subject terminology is country specific, this might
be possible on a limited basis. The vocabulary for discussing UK and US and
Japanese politicians is necessarily different, with the need for decisions
about what is and is not of the same meaning. But once this has been
determined, the actual articles should be relatively easy to convert.

The article formulations used in the different  WP  are not quite
identical--some things, like references, are handled in different manners.
A set of MT tools specialized not just for the subject but for WP could
deal with this also.

On Wed, May 1, 2013 at 4:41 PM, Mark <delirium at hackish.org> wrote:

> On 5/1/13 3:39 PM, David Cuenca wrote:
>> In answer to Matthieu, I don't think perfection is something to aim for
>> during the first stage. Just a MT that gives a fairly good text about the
>> subject without big mistakes, would be already a big improvement. IMHO,
>> goals should be set step by step and at a reasonable height. The lowest
>> hanging fruit seems to be pairs of closely related languages, tolerating
>> instances like the one you pointed out.
>>  I agree with that, especially from the perspective of improving the
> state of free/libre tools. There are many things that are beyond the
> capabilities of current state-of-the-art MT, but current free/libre tools
> aren't even up to that level: proprietary cloud tools like Google Translate
> and Bing Translate appear to be the most advanced currently available, and
> certainly the only easy-for-end-users advanced tools available. If
> free/libre tools could get up to that level, that would be a big win in
> itself.
> -Mark
> ______________________________**_________________
> Wikimedia-l mailing list
> Wikimedia-l at lists.wikimedia.**org <Wikimedia-l at lists.wikimedia.org>
> Unsubscribe: https://lists.wikimedia.org/**mailman/listinfo/wikimedia-l<https://lists.wikimedia.org/mailman/listinfo/wikimedia-l>

David Goodman

DGG at the enWP

More information about the Wikimedia-l mailing list