[Wikipedia-l] OmegaT, WiktionaryZ, Betawiki (/me adding: Wikipedia) ... some questions that need an answer ...

Sun Aug 13 15:22:32 UTC 2006

Well I am crossposting this blog text as well as copying it to some 
people in blind copy since it involves various projects - as for 
Wikipedia: it is about contents creation for small wikipedias (that's 
why I changed the original title above).

The original post is at: 
http://sabinecretella.blogspot.com/2006/08/omegat-wiktionaryz-betawiki-some.html

Your comments and thoughts are  very much appreciated.

Best, Sabine

*****

      OmegaT, WiktionaryZ, Betawiki ... some questions that need an
      answer ...

In the Wiktionary IRC the following questions were made by Connel: "... 
considers omegat.org. Is the intent for it to just auto-upload stuff to 
WZ? to/from ZW? Or betawiki, or both betawiki and WZ? Or is betawiki 
just for WikiMedia total localization?"

That is a lot ... so let me go step by step.

The intent of OmegaT <http://en.wikipedia.org/wiki/OmegaT> is not to 
auto-upload stuff to WiktionaryZ <http://wiktionaryz.org> or download it 
from there. Nor is it only there for Betawiki 
<http://nike.users.idler.fi/betawiki/Etusivu> and WiktionaryZ, even if 
it will probably be used for both sooner or later. OmegaT is a CAT 
<http://en.wikipedia.org/wiki/Computer-assisted_translation>-Tool that 
helps translators to do their work.

What does this mean: imagine you use for all of your translations a tool 
that creates a Translation Memory, a file containing the translations 
you did segmented into sentences, combining source and target sentence. 
Then you do further translations and let the CAT-Tool access these 
already translated files. Now if your translation is of a subject you 
already translated chances are high that most terminology needed is 
already in there and you can even see in which context it was used. So 
with OmegaT you do a search on your project and the available 
translation memories to see if and how a term was already translated. 
This can help a lot.

Now consider a manual - of a machine, a computer, whatever. These 
manuals need updates once a new version of that machine or computer is 
produced. Normally companies than also just update the description and 
parts of it remain the same as before (simply because the functionality 
of these parts is still the same). When you then translate you will find 
these parts that are unchanged in your translation memory and depending 
on how you set your options OmegaT proposes the 100% match or overwrites 
the translation part of your project with the already existing 
translations. In this way you can save loads of time.

Having the right parser also the MediaWiki <http://mediawiki.org> UI 
could be translated in such a way. Now we always will have people that 
translate things manually online and who will not use a CAT. This means 
that OmegaT should be able to access the single pages containing the 
messages on Betawiki, you translate them on your computer and store them 
to the page in the correct language version. This is feasible.

Another use will be: creation of contents for small wikipedias. Once we 
get our wiki read/wiki write option within OmegaT it is possible to 
start a translation of an article, let's say from the English wikipedia 
<http://en.wikipedia.org>, and translate it to any language, let's say 
the Neapolitan wikipedia <http://nap.wikipedia.org>. This means you tell 
OmegaT which page to get on en.wikipedia <http://en.wikipedia.org> and 
which page to write on nap.wikipedia <http://nap.wikipedia.org>. The 
same is valid for any African language. The advantage of this is: if 
there is no online-connection people can work offline on translations.

The translation memories out of these translations should be stored 
(WiktionaryZ is already enabled to upload translation memories) 
somewhere in order to allow others to access and use them to be faster 
and of higher quality during their own translations. Another aspect of 
doing things this way is: the proof reading of a translation is easier 
since you see the source text above the translation for each sentence. 
This easens the job a lot and the quality of the translated article raises.

Now to WiktionaryZ and OmegaT: OmegaT for now has quite a simple 
glossary function - you create a tab separated text file and put it into 
your glossary directory. While you translate OmegaT shows you the 
translation proposals for the words that are present in that sentence 
and in the glossary. Now imagine what that means if you connect the 
glossary function to WiktionaryZ: the whole repository of data at your 
fingertips - of course: considering the mass of data that is online in 
WiktionaryZ it becomes very important to attribute domains to 
terminology. Often a word can be translated in 20 ways or even more into 
another language ... well, it does not make sense if you are doing a 
translation about medical equipment that you get proposals from another 
domain, let's say machinery - the possibilities from other domains 
should only be proposed (showing that other domain) when there is no 
entry for medical equipment.

At this stage we don't have this domain structure for terminology on 
WiktionaryZ and therefore the data, once we have loads of it online, 
cannot be used - it would just create a huge mess and would be very time 
consuming. So one of the things we really nees asap is a domain 
structure where we can connect the single terms to - the sooner we have 
it the better .... otherwise we will have loads of double and triple 
work or WiktionaryZ could become completely useless for the use within 
OmegaT and as such it would not be of any advantage for translators. Not 
even for scientist really ... imagine a biologist search for terminology 
and get whatever result ... also those of machinery or whatever other 
domain.

Back to the use within OmegaT:

The next step is then: what if the searched term is not in WiktionaryZ 
... I already noted that during my last translation - for now it is too 
time consuming to add terms to WiktionaryZ and also Wiktionary when you 
wish to do that while you are translating - but: it would make so much 
sense. So what is planned in the reference implementation for a 
translation glossary 
<http://meta.wikimedia.org/wiki/Reference_implementation_for_a_translation_glossary> 
is that when working with OmegaT you get the possibility to add such a 
term directly from there. You simply tell OmegaT to add it to 
WiktionaryZ with your user ID and you can attribute all the necessary 
domains etc. without problems as well as tag the term as "definition 
needs to be added". What happens in that way is that WiktionaryZ will 
get quite a bunch of very specific terminology over time.

Another use is OmegaT for language lessons - Connel, from en.wiktionary 
<http://en.wiktionary.org> thought about it and he is right: OmegaT 
could be used for language learning as well ... what if we have a huge 
sentence repository and people start to translate texts to study that 
language - they do not need a paper dictionary - OmegaT would help them 
to see the use of a word in various sentences and they would get the 
terminology proposals like the translators. When being back at school or 
university (or maybe also online with a language teacher) they can 
understand their errors, update WiktionaryZ and the online sentence 
repository.

For exams teachers would have a mass of proposals and they could 
determine which glossary group shall be included in the exams ... that 
is to be thought about ... it was not considered up to now even if there 
are already thoughts on how to use WiktionaryZ for language learning.

Did I miss something? Hmmm ... not sure. Well if you have questions: 
just ask :-)