[Foundation-l] Swahili Machine Translation First Run Completed for enwiki-20060817
Aphaia
aphaia at gmail.com
Tue Aug 29 08:34:17 UTC 2006
Generally machine translation creates nothing but a mess.
Swwiki has a good community even small - five active contributors and
they are in progress. They have also a good relation to local media, I
heard from one of them.
They know to write what they want to describe. They will translate
some articles from somewhere, but it is apparently nonsense "English
Wikpedia can be transfeered to anywhere without local community
consent".
If you disturb them without any negotiation with them, so you will
know how the global Wikimedia community can react public enemy at that
time, at least what rage and fury of Britomartis will be.
Sincerely,
On 8/29/06, Jeffrey V. Merkey <jmerkey at wolfmountaingroup.com> wrote:
>
> The first pass machine translation run of the English Wikipedia into the
> Swahili Language has completed and is posted.
> The translated XML dumps are posting to :
>
> http://sw.wikigadugi.org
>
> they will post throughout the night.
>
> Lexicons can be downloaded from:
>
> ftp://www.wikigadugi.org/africa/lexicon/swlexicon.public.bz2 - public
> swahili lexicon
> ftp://www.wikigadugi.org/africa/lexicon/swlexicon.kamusi.bz2 - kamusi
> project lexicon
> ftp://www.wikigadugi.org/africa/lexicon/sw.thesaurus.bz2 - rogets
> thesaurus in swahili
>
> MediaWiki Messages Files:
>
> ftp://www.wikigadugi.org/africa/MediWiki/MessagesSW.php.bz2
>
> Machine Translated XML Dumps against the ewiki-20060817 XMl Dumps from
> the English Wikipedia:
>
> ftp://www.wikigadugi.org/africa/xml/swphwiki-20060816-pages-articles.xml.bz2
>
> This first run does NOT employe the verb stem decomposer and conjugator,
> does NOT employ the grammar parser or sentence composer, does NOT
> employ the AI Inference engine, and does not perform verb or noun
> disambiguation as do the other machine translations as I have not
> constructed
> a decomposition rule set or grammar rules set for the translator. This
> first run uses simple word by word translation and phrase matching with
> hierarchical
> thesaurus lookups and substitution.
>
> This first pass is provided as an illustration of just how rapidly
> Wikipedia can be translated into a target language. A swahili grammar
> manual has been
> overnighted to me and later this week I will perform another run with
> grammar and sentence parsing rules. Since I am not a native speaker of
> swahili, I request a native speaker to select 20 or more very long
> articles and correc them. When I completed the disambiguator and
> grammar rules
> set for sentence construction, I will use the corrected articles to
> teach the AI engine how to reorder and retense the translations. This
> should get
> the translations over 90% accurracy. Unlike Cherokee, swahili appears
> to be a much simpler language for this task.
>
> The Machine translation of swahili is a VERY early first run and is a
> work in progress.
>
> Jeffrey V. Merkey
>
>
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/foundation-l
>
--
Kizu Naoko
Wikiquote: http://wikiquote.org
* vivemus, mea Lesbia, amemus *
More information about the wikimedia-l
mailing list