Re: [Wikipedia-l] RFC: Principles of mass content adding on small Wikipedias

25 Feb 2006

      Hi Mark,
...
What I wonder about TM is, how does it work with languages with
different structures?
It's quite obvious TM works well for Russian, Italian, Spanish,
French, German, other languages of similar structure. I heard it also
works for Chinese, Japanese, Korean, Arabic, Farsi, Hebrew as well.
So my main questions are:

Can it handle languages which don't separate words in writing?

Examples are Thai, Lao, Japanese, Chinese, and a number of smaller
languages.
Yes - there are translators using Thai, Japanese and Chinese within 
OmegaT - we also have people in the development team that work at least 
with one of these languages.
...

Can it handle languages of all typological classifications? So far

I have seen it works well for isolating (such as Chinese, Vietnamese)
and inflecting languages (such as Russian, Polish, Latin), but what
about polysynthetic languages (such as Inuktitut, Turkish, Georgian,
Adyghe, Abkhaz, Mohawk)? I would imagine it would be more difficult
for these languages. For example, Western Greenlandic
"Aliikusersuillammassuaanerartassagaluarpaalli." means "However, they
will say that he is a great entertainer, but..." (for other long words
like this, just look at the greenlandic wikipedia, kl.wp).
Well within OmegaT you have UTF-8 usage - so most languages are 
supported, for some we might have to try out, others might require 
special solutions. Basically all that is UTF-8 should not create problems.
...

Can it mass-process huge amounts of content quickly, to be reviewed

later by humans?
No - when Talking about OmegaT wer are not talking about machine 
translation, but computer assisted translation - that means a human 
translator re-uses translation memories from other projects, exchanged 
TMs etc. While translating the glossary entries are checked and OmegaT 
shows you the matching entries in a separate window. Should sentences be 
equal to former translated ones or similar, according to your settings 
within the software you can have it just proposed in a separate window 
or OmegaT can overwrite the sentence to be translated with the full or 
partial match sentence.
One feature I would very much like to see is assemble from portions, but 
this will be only at discussion after having it connected to 
Wiktionaryz, that is when there is tbx support - it does not make sense 
to talk about this very specific and helpful feature before.
The translation memory you are working with is only as good as you 
created it. The more you work with it, the better it becomes. That's 
basically it.
One thing that I also find very helpful: people that speak a language, 
but are not mothertognue easily can check how a word was translated 
before - which context etc. So this can help a lot during work and gives 
better results. Therefore the proof reading effort by mothertognue 
speakers will be less.
With proper set up segmentation rules, for example, you can go through 
the born and died people of the calendar quite fast, sinche descriptions 
are quite repetitive.
Please note: I am having a meeting with a group of colleagues this 
week-end and next week I am at the university of Pisa to give a 
presentation and a workshop - so if you write and need answers from me 
directly, please note it in the subject since it could well be that I 
then cannot see all posts.
Have a great week-end!
Best, Sabine
___________________________________ 
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive 
http://it.messenger.yahoo.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] RFC: Principles of mass content adding on small Wikipedias