Re: [Foundation-l] Push translation

13 Aug 2010

On Sat, Aug 7, 2010 at 11:30 PM, Lars Aronsson &lt;lars(a)aronsson.se&gt; wrote:

...
  On 08/06/2010 07:47 PM, Michael Galvez wrote:
  3. We acquire dictionaries on limited licenses
from other parties.  In
 general, while we can surface this content on our own sites (e.g., Google
 Translate, Google Dictionary, Google Translator Toolkit), we don't have
 permission to donate that data to other sites. 
 Google, as any large company, uses many sources. For example,
 Google Maps used to buy all its maps, but later started to drive
 around to build its own maps (and street images). With time, I'm
 certain you will use Google Books as a parallel corpus and derive
 translations of words and phrases from translated books, and
 some day you might be able to build Google Translate without
 relying on external dictionary sources. I don't know if this is one
 month or one year away, but it should take less than one decade.
 Expecting this development, you could keep collaboration with open
 content movements, such as Wikipedia/Wiktionary in mind.

  For HTML files, both Translate and Translator
Toolkit support the tag

 class="notranslate"

 to exclude text from translation.  (

 http://translate.google.com/support/toolkit/bin/answer.py?hl=en&answer=…
  )

 If you tell us what MediaWiki tags you'd like for us to treat the same  way,
  we can do the same for Wikipedia. 
 There is no such tag, unfortunately. But in the GTTK user interface,
 it would be useful to have a way to mark where in the original text
 (left-hand side) those tags should have been. If it is any help to the
 pretranslator, other kinds of marks could also be manually added,
 such as whether a phrase is a figure of speech or should be read
 literally. If the text says "kill two birds with one stone", that should
 be translated into Swedish as "hit two flies with one swat". But if
 David slays Goliath with a stone, that should remain a stone.

Is there a way to introduce this type of tag into MediaWiki?  If we can come
up with a generic MediaWiki tag for this, we can interpret this as the
equivalent for "notranslate" in MediaWiki text.

When we "pretranslate" the document, we can indicate to Google Translate
that this text should not be translated.  In addition, we can also lock this
text in Translator Toolkit so that the translator cannot edit it during
translation.

...

    a. If we find a translation for that segment in
the TM, we will
 "pre-translate" the segment with the highest-rated translation. 
 But when you have two or more candidates, each with a reasonable
 probability, the choice could be presented to the human translator.

Yes.  This choice shows up as a translation memory entry when the translator
clicks, "Show Toolkit".

...

  1. When a translator uploads a WIkipedia article
into Translator Toolkit,  we
  divide the article into segments.  (sentences,
section headings, etc.) 
 This means you do recognize some wiki markup, such as [[links]]
 and ==headings==. But recognition of that markup is apparently
 hard-wired and takes place before any learning. Now, consider
 the case when

 '''John Doe''' (May 1, 1733 - April 5, 1799) was a British
colonel

 is translated, according to our manual of style, as:

 '''John Doe,''' född 1 maj 1733, död 5 april 1799, var en
brittisk överste

 where the parentheses are replaced with commas and the words född
 (born) and död (died) have been added. It would be nice if the
 translation memory could learn not only the words (colonel = överste)
 but also to recognize this transformation of style. It is very
 context sensitive (this example only applies to the opening paragraph
 of biographic articles) and would need lots of translations to
 provide good results. And including dashes, commas and parentheses
 along with words as the elements of translated phrases is perhaps
 a major shift in what machine translation is supposed to do.
 (But it could open the door to translating template calls.)

  Following interwiki links and suggesting parent
categories is a bit of  work
  and unlikely to be implemented soon.  We can
disable category translation  if
  that helps - can you confirm if that's OK?

 I think you should keep it as it is, until you get around to do
 that "bit of work".

 --
    Lars Aronsson (lars(a)aronsson.se)
   Aronsson Datateknik - http://aronsson.se

 _______________________________________________
 foundation-l mailing list
 foundation-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Foundation-l] Push translation