Creating machine translations only in the draft space (or in the user space
in the projects which do not have draft) could help.
Cheers
Yaroslav
On Tue, May 2, 2017 at 10:16 PM, Pharos <pharosofalexandria(a)gmail.com>
wrote:
I think it all depends on the level of engagement of
the human translator.
When the tool is used in the right way, it is a fantastic tool.
Maybe we can find better methods to nudge people toward taking their time
and really doing work on their translations.
Thanks,
Pharos
On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
bodhisattwa.rgkmc(a)gmail.com> wrote:
Content translation with Yandex is also a problem
in Bengali Wikipedia.
Some users have grown a tendency to create machine translated meaningless
articles with this extension to increase edit count and article count.
This
has increased the workloads of admins to find and
delete those articles.
Yandex is not ready for many languages and it is better to shut it. We
don't need it in Bengali.
Regards
On May 3, 2017 12:17 AM, "John Erling Blad" <jeblad(a)gmail.com> wrote:
Actually this _is_ about turning
ContentTranslation off, that is what
several users in the community want. They block people using the
extension
> and delete the translated articles. Use of ContentTranslation has
become
a
rather contentious case.
Yandex as a general translation engine to be able to read some alien
language is quite good, but as an engine to produce written text it is
not
> very good at all. In fact it often creates quite horrible Norwegian,
even
> for closely related languages. One quite
common problem is reordering
of
> words into meaningless constructs, an other
problem is reordering
lexical
> gender in weird ways. The English
preposition "a" is often translated
as
> "en" in a propositional phrase,
and then the gender is added to the
> following phrase. That gives a translation of "Oppland is a county
in…"
> into something like "Oppland er en
fylket i…" This should be "Oppland
er
et fylke
i…".
(I just checked and it seems like Yandex messes up a lot less now than
previously, but it is still pretty bad.)
Apertium works because the language is closely related, Yandex does not
work because it is used between very different languages. People try to
use
> Yandex and gets disappointed, and falsely conclude that all language
> translations are equally weird. They are not, but Yandex translations
are
weird.
The numerical threshold does not work. The reason is simple, the number
of
> fixes depends on language constructs that fails, and that is simply
not a
> constant for small text fragments. Perhaps
if we could flag specific
> language constructs that is known to give a high percentage of
failures,
> and if the translator must check those
sentences. One such language
> construct is disappearances between the preposition and the gender of
the
following
term in a prepositional phrase. If they are not similar, then
the
sentence must be checked. It is not always wrong
to write "en jenta" in
Norwegian, but it is likely to be wrong.
A language model could be a statistical model for the language itself,
not
> for the translation into that language. We don't want a perfect
language
> model, but a sufficient language model to
mark weird constructs. A very
> simple solution could simply be to mark tri-grams that does not
already
> exist in the text base for the destination
as possible errors. It is
not
> necessary to do a live check, but at least
do it before the page can
be
> saved.
>
> Note the difference in what Yandex do and what we want to achieve;
Yandex
translates a text between two different languages, without any clear
reason
> why. It is not to important if there are weird constructs in the text,
as
long as
it is usable in "some" context. We translate a text for the
purpose
> of republishing it. The text should be usable and easily readable in
that
> language.
>
>
>
> On Tue, May 2, 2017 at 7:07 PM, Amir E. Aharoni <
> amir.aharoni(a)mail.huji.ac.il> wrote:
>
> > 2017-05-02 18:20 GMT+03:00 John Erling Blad <jeblad(a)gmail.com>om>:
> >
> > > Brute force solution; turn the ContentTranslation off. Really
stupid
> >
solution.
>
>
> ... Then I guess you don't mind that I'm changing the thread name :)
>
>
> > The next solution; turn the Yandex engine off. That would solve a
> > part of the problem. Kind of lousy solution though.
> >
>
> > What about adding a language model that warns when the language
> constructs
> > gets to weird? It is like a "test" for the translation. The CT is
used
> > for
> > > creating a translation, but the language model is used for
verifying
if
the
> translation is good enough. If it does not validate against the
language
> > model it should simply not be published to the main name space. It
will
> > > still be possible to create a draft, but then the user is
completely
> > aware
> > > that the translation isn't good enough.
> > >
> > > Such a language model should be available as a test for any
article,
as
> > it
> > > can be used as a quality measure for the article. It is really a
> quantity
> > > measure for the well-spokenness of the article, but that isn't
quite
so
> > intuitive.
> >
>
> So, I'll allow myself to guess that you are talking about one
particular
> language, probably Norwegian.
>
> Several technical facts:
>
> 1. In the past there were several cases in which translators to
different
> > languages who reported common translation mistakes to me. I passed
them
> on
> > to Yandex developers, with whom I communicate quite regularly. They
> > acknowledged receiving all of them. I am aware of at least one such
> common
> > mistake that was fixed; possibly there were more. If you can give me
a
list
of such mistakes for Norwegian, I'll be very
happy to pass them on. I
absolutely cannot promise that they will be fixed upstream, but it's
possible.
2. In Norwegian, Apertium is used for translating between the two
varieties
> of Norwegian itself (Bokmål and Nynorsk), and from other Scandinavian
> languages. That's probably why it works so well—they are similar in
> grammar, vocabulary, and narrative style (I'll pass it on to Apertium
> developers—I'm sure they'll be happy to hear it). Unfortunately,
machine
> > translation from English is not available in Apertium. Apertium works
> best
> > with very similar languages, and English has two characteristics,
which
> are
> > unfortunate when combined: it is both the most popular source for
> > translation into almost all other languages (including Norwegian),
and
it
> > is not _very_ similar to any other languages (except maybe Scots).
> Machine
> > translation from English into Norwegian is only possible with Yandex
at
the
> moment. More engines may be added in the future, but at the moment
that's
> > all we have. That's why disabling Yandex completely would indeed be a
> lousy
> > solution: A lot of people say that without machine translation
> integration
> > Content Translation is useless. Not all users think like that, but
many
> do.
> >
> > 3. We can define a numerical threshold of acceptable percentage of
> machine
> > translation post-editing. Currently it's 75%. It's a tad
embarrassing,
> but
> > it's hard-coded at the moment, but it can be very easily be made
into a
>
variable per language. If the translator tries to publish a page in
which
> > less than that is modified, a warning will be shown.
> >
> > 4. I'm not sure what do you mean by "language model". If it's
any
kind
of a
> linguistic engine, then it's definitely not within the resources that
the
> > Language team itself can currently dedicate. However, if somebody who
> knows
> > Norwegian and some programming will write a script that analyzes
common
> bad
> > constructs in a Wikipedia dump, this will be very useful. This would
> > basically be an upgraded version of suggestion #1 above. (In my spare
> time
> > as a volunteer I'm doing something comparable for Hebrew, although
not
> for
> > translation, but for improving how MediaWiki link trails work.)
> > _______________________________________________
> > Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
> > wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
> > wiki/Wikimedia-l
> > New messages to: Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe:
https://lists.wikimedia.org/ mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>