Well we are building the NAP-wikipedia and of course there are parts
where one easily can transfer data by just translating it from one
Wikipedia to the other. In this case we already uploaded the Calendar -
and now it would make sense to transfer the contents of the Italian
wikipedia there by translating it - people and events stay the same. So
what I now would like to reach is:
1) having a dump from the Italian wikipedia
2) extracting all pages of the calendar
3) translate them with the help of OmegaT
Why OmegaT? Well in the sections "Born" and "Died" after the name of
the
person you vey often find just "actor, actress, writer, politician" or
whatever - this means that there would be quite a lot of 100% matches
and the translation would be much faster with the tool than without.
These lines all are created like this:
*[[name of the person]] description
Now to have the possibility to get 100% matches I need at least a line
break after
*[[name of the person]]
that needs to be taken out again after having translated the file.
Then in a second step, with the help of a bot, the translated parts can
be transferred into the articles.
Well: what I now need is some advice on how to have this done - and then
what can be done for Neapolitan can easily be repeated for other languages.
This means I need some help to get this regular expression ... I mean
some code that runs through the data, inserts the line break and after
translation takes it out again.
Who can help me with this? Btw. the TMX (translation memory) is going to
be available under GFDL for anyone - well this should be obvious.
This text was originally posted here (in order to allow to collect how
to's):
http://www.wesolveitnet.com/modules/newbb/viewtopic.php?topic_id=83&pos…
Thank you!!!
Ciao, Sabine
*****
Sabine Cretella
http://www.wordsandmore.it
s.cretella(a)wordsandmore.it
skype: sabinecretella
___________________________________
Yahoo! Messenger: chiamate gratuite in tutto il mondo
http://it.messenger.yahoo.com