[Foundation-l] Push translation

Michael Galvez michaelcg at gmail.com
Fri Aug 13 20:40:02 UTC 2010


On Sat, Aug 7, 2010 at 11:30 PM, Lars Aronsson <lars at aronsson.se> wrote:

> On 08/06/2010 07:47 PM, Michael Galvez wrote:
> > 3. We acquire dictionaries on limited licenses from other parties.  In
> > general, while we can surface this content on our own sites (e.g., Google
> > Translate, Google Dictionary, Google Translator Toolkit), we don't have
> > permission to donate that data to other sites.
>
> Google, as any large company, uses many sources. For example,
> Google Maps used to buy all its maps, but later started to drive
> around to build its own maps (and street images). With time, I'm
> certain you will use Google Books as a parallel corpus and derive
> translations of words and phrases from translated books, and
> some day you might be able to build Google Translate without
> relying on external dictionary sources. I don't know if this is one
> month or one year away, but it should take less than one decade.
> Expecting this development, you could keep collaboration with open
> content movements, such as Wikipedia/Wiktionary in mind.
>
> > For HTML files, both Translate and Translator Toolkit support the tag
> >
> > class="notranslate"
> >
> > to exclude text from translation.  (
> >
> http://translate.google.com/support/toolkit/bin/answer.py?hl=en&answer=147838
> > )
> >
> > If you tell us what MediaWiki tags you'd like for us to treat the same
> way,
> > we can do the same for Wikipedia.
>
> There is no such tag, unfortunately. But in the GTTK user interface,
> it would be useful to have a way to mark where in the original text
> (left-hand side) those tags should have been. If it is any help to the
> pretranslator, other kinds of marks could also be manually added,
> such as whether a phrase is a figure of speech or should be read
> literally. If the text says "kill two birds with one stone", that should
> be translated into Swedish as "hit two flies with one swat". But if
> David slays Goliath with a stone, that should remain a stone.
>

Is there a way to introduce this type of tag into MediaWiki?  If we can come
up with a generic MediaWiki tag for this, we can interpret this as the
equivalent for "notranslate" in MediaWiki text.

When we "pretranslate" the document, we can indicate to Google Translate
that this text should not be translated.  In addition, we can also lock this
text in Translator Toolkit so that the translator cannot edit it during
translation.



>
> >   a. If we find a translation for that segment in the TM, we will
> > "pre-translate" the segment with the highest-rated translation.
>
> But when you have two or more candidates, each with a reasonable
> probability, the choice could be presented to the human translator.
>

Yes.  This choice shows up as a translation memory entry when the translator
clicks, "Show Toolkit".


>
> > 1. When a translator uploads a WIkipedia article into Translator Toolkit,
> we
> > divide the article into segments.  (sentences, section headings, etc.)
>
> This means you do recognize some wiki markup, such as [[links]]
> and ==headings==. But recognition of that markup is apparently
> hard-wired and takes place before any learning. Now, consider
> the case when
>
> '''John Doe''' (May 1, 1733 - April 5, 1799) was a British colonel
>
> is translated, according to our manual of style, as:
>
> '''John Doe,''' född 1 maj 1733, död 5 april 1799, var en brittisk överste
>
> where the parentheses are replaced with commas and the words född
> (born) and död (died) have been added. It would be nice if the
> translation memory could learn not only the words (colonel = överste)
> but also to recognize this transformation of style. It is very
> context sensitive (this example only applies to the opening paragraph
> of biographic articles) and would need lots of translations to
> provide good results. And including dashes, commas and parentheses
> along with words as the elements of translated phrases is perhaps
> a major shift in what machine translation is supposed to do.
> (But it could open the door to translating template calls.)
>
> > Following interwiki links and suggesting parent categories is a bit of
> work
> > and unlikely to be implemented soon.  We can disable category translation
> if
> > that helps - can you confirm if that's OK?
>
> I think you should keep it as it is, until you get around to do
> that "bit of work".
>
>
> --
>    Lars Aronsson (lars at aronsson.se)
>   Aronsson Datateknik - http://aronsson.se
>
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list