On Sat, Aug 7, 2010 at 11:30 PM, Lars Aronsson <lars(a)aronsson.se> wrote:
On 08/06/2010 07:47 PM, Michael Galvez wrote:
3. We acquire dictionaries on limited licenses
from other parties. In
general, while we can surface this content on our own sites (e.g., Google
Translate, Google Dictionary, Google Translator Toolkit), we don't have
permission to donate that data to other sites.
Google, as any large company, uses many sources. For example,
Google Maps used to buy all its maps, but later started to drive
around to build its own maps (and street images). With time, I'm
certain you will use Google Books as a parallel corpus and derive
translations of words and phrases from translated books, and
some day you might be able to build Google Translate without
relying on external dictionary sources. I don't know if this is one
month or one year away, but it should take less than one decade.
Expecting this development, you could keep collaboration with open
content movements, such as Wikipedia/Wiktionary in mind.
For HTML files, both Translate and Translator
Toolkit support the tag
to exclude text from translation. (
If you tell us what MediaWiki tags you'd like for us to treat the same
we can do the same for Wikipedia.
There is no such tag, unfortunately. But in the GTTK user interface,
it would be useful to have a way to mark where in the original text
(left-hand side) those tags should have been. If it is any help to the
pretranslator, other kinds of marks could also be manually added,
such as whether a phrase is a figure of speech or should be read
literally. If the text says "kill two birds with one stone", that should
be translated into Swedish as "hit two flies with one swat". But if
David slays Goliath with a stone, that should remain a stone.
Is there a way to introduce this type of tag into MediaWiki? If we can come
up with a generic MediaWiki tag for this, we can interpret this as the
equivalent for "notranslate" in MediaWiki text.
When we "pretranslate" the document, we can indicate to Google Translate
that this text should not be translated. In addition, we can also lock this
text in Translator Toolkit so that the translator cannot edit it during
a. If we find a translation for that segment in
the TM, we will
"pre-translate" the segment with the highest-rated translation.
But when you have two or more candidates, each with a reasonable
probability, the choice could be presented to the human translator.
Yes. This choice shows up as a translation memory entry when the translator
clicks, "Show Toolkit".
1. When a translator uploads a WIkipedia article
into Translator Toolkit,
divide the article into segments. (sentences,
section headings, etc.)
This means you do recognize some wiki markup, such as [[links]]
and ==headings==. But recognition of that markup is apparently
hard-wired and takes place before any learning. Now, consider
the case when
'''John Doe''' (May 1, 1733 - April 5, 1799) was a British
is translated, according to our manual of style, as:
'''John Doe,''' född 1 maj 1733, död 5 april 1799, var en
where the parentheses are replaced with commas and the words född
(born) and död (died) have been added. It would be nice if the
translation memory could learn not only the words (colonel = överste)
but also to recognize this transformation of style. It is very
context sensitive (this example only applies to the opening paragraph
of biographic articles) and would need lots of translations to
provide good results. And including dashes, commas and parentheses
along with words as the elements of translated phrases is perhaps
a major shift in what machine translation is supposed to do.
(But it could open the door to translating template calls.)
Following interwiki links and suggesting parent
categories is a bit of
and unlikely to be implemented soon. We can
disable category translation
that helps - can you confirm if that's OK?
I think you should keep it as it is, until you get around to do
that "bit of work".
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
foundation-l mailing list