On Fri, Jul 26, 2013 at 3:25 PM, David Cuenca <dacuetu(a)gmail.com> wrote:
The linked page says:
For this kind of project it is prefered to use a
rule-based machine
translation<https://en.wikipedia.org/wiki/en:Rule-based_machine_translat…
system,
because total control is wanted over the whole process and minority
languages should be accounted for (not that easy with
statistical-based<https://en.wikipedia.org/wiki/en:Statistical_machine_t…
MT,
where parallel corpora may be non-existing).
This statement seems rather defeatist to me. Step one of a machine
translation effort should be to provide tools to annotate parallel texts in
the various wikis, and to edit and maintain their parallelism. Once this
is done, you have a substantial parallel corpora, which is then suitable to
grow the set of translated articles. That is, minority languages ought to
be accounted for by progressively expanding the number of translated
articles in their encyclopedia, as we do now. As this is done, machine
translation incrementally improves. If there is not enough of an editor
community to translate articles, I don't see how you will succeed in the
much more technically-demanding tasks of creating rules for a rule-based
translation system. The beauty of the statistical approach is that little
special ability is needed.
--scott
--
(
http://cscott.net)