Reading this, I get a strong impression the problem may very well be in
setting expectations for the users of this translation tool. If they expect
the automated translation to be rather good, they may get fed up more
easily than when they consider it primarily a glorified dictionary.
Lodewijk
On Wed, May 3, 2017 at 1:06 PM, David Cuenca Tudela <dacuetu(a)gmail.com>
wrote:
Perhaps it would be a good idea to compare the
translated text to the text
that the user wants to save.
If they are more than 95% the same, that means that the user didn't take
the effort to correct the text.
Cheers,
Micru
On Wed, May 3, 2017 at 10:31 AM, Wojciech Pędzich <wpedzich(a)gmail.com>
wrote:
It does depend a lot on the engagement level of
the human behind the
keyboard. When I deal with machine-translated text, I simply wonder
whether
the someone behind the keyboard took efforts to
actually read the piece.
Now whether this would work if limited to namespaces outside "main" - I
do
not want to demonise the issue, but if the person
submitting the text for
machine translation does not read it, what will stop them from a quick
ctrl+c / ctrl+v? Just asking.
Wojciech
W dniu 2017-05-03 o 09:33, Yaroslav Blanter pisze:
Creating machine translations only in the draft space (or in the user
space
> in the projects which do not have draft)
could help.
>
> Cheers
> Yaroslav
>
> On Tue, May 2, 2017 at 10:16 PM, Pharos <pharosofalexandria(a)gmail.com>
> wrote:
>
> I think it all depends on the level of engagement of the human
translator.
>>
>> When the tool is used in the right way, it is a fantastic tool.
>>
>> Maybe we can find better methods to nudge people toward taking their
time
>> and really doing work on their
translations.
>>
>> Thanks,
>> Pharos
>>
>> On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
>> bodhisattwa.rgkmc(a)gmail.com> wrote:
>>
>> Content translation with Yandex is also a problem in Bengali Wikipedia.
>>> Some users have grown a tendency to create machine translated
>>> meaningless
>>> articles with this extension to increase edit count and article count.
>>>
>> This
>>
>>> has increased the workloads of admins to find and delete those
articles.
>>>
>>> Yandex is not ready for many languages and it is better to shut it. We
>>> don't need it in Bengali.
>>>
>>> Regards
>>> On May 3, 2017 12:17 AM, "John Erling Blad"
<jeblad(a)gmail.com> wrote:
>>>
>>> Actually this _is_ about turning ContentTranslation off, that is what
>>>> several users in the community want. They block people using the
>>>>
>>> extension
>>>
>>>> and delete the translated articles. Use of ContentTranslation has
>>>>
>>> become
>>
>>> a
>>>
>>>> rather contentious case.
>>>>
>>>> Yandex as a general translation engine to be able to read some alien
>>>> language is quite good, but as an engine to produce written text it
is
>>>>
>>> not
>>>
>>>> very good at all. In fact it often creates quite horrible Norwegian,
>>>>
>>> even
>>
>>> for closely related languages. One quite common problem is reordering
>>>>
>>> of
>>
>>> words into meaningless constructs, an other problem is reordering
>>>>
>>> lexical
>>
>>> gender in weird ways. The English preposition "a" is often
translated
>>>>
>>> as
>>
>>> "en" in a propositional phrase, and then the gender is added to
the
>>>> following phrase. That gives a translation of "Oppland is a county
>>>>
>>> in…"
>>
>>> into something like "Oppland er en fylket i…" This should be
"Oppland
>>>>
>>> er
>>
>>> et fylke i…".
>>>>
>>>> (I just checked and it seems like Yandex messes up a lot less now
than
>>>> previously, but it is still
pretty bad.)
>>>>
>>>> Apertium works because the language is closely related, Yandex does
not
>>>> work because it is used between
very different languages. People try
to
>>>>
>>> use
>>>
>>>> Yandex and gets disappointed, and falsely conclude that all language
>>>> translations are equally weird. They are not, but Yandex translations
>>>>
>>> are
>>
>>> weird.
>>>>
>>>> The numerical threshold does not work. The reason is simple, the
number
>>>>
>>> of
>>>
>>>> fixes depends on language constructs that fails, and that is simply
>>>>
>>> not a
>>
>>> constant for small text fragments. Perhaps if we could flag specific
>>>> language constructs that is known to give a high percentage of
>>>>
>>> failures,
>>
>>> and if the translator must check those sentences. One such language
>>>> construct is disappearances between the preposition and the gender of
>>>>
>>> the
>>
>>> following term in a prepositional phrase. If they are not similar,
then
>>>>
>>> the
>>>
>>>> sentence must be checked. It is not always wrong to write "en
jenta"
in
>>>> Norwegian, but it is likely to be
wrong.
>>>>
>>>> A language model could be a statistical model for the language
itself,
>>>>
>>> not
>>>
>>>> for the translation into that language. We don't want a perfect
>>>>
>>> language
>>
>>> model, but a sufficient language model to mark weird constructs. A
very
>>>> simple solution could simply be
to mark tri-grams that does not
>>>>
>>> already
>>
>>> exist in the text base for the destination as possible errors. It is
>>>>
>>> not
>>
>>> necessary to do a live check, but at least do it before the page can
>>>>
>>> be
>>
>>> saved.
>>>>
>>>> Note the difference in what Yandex do and what we want to achieve;
>>>>
>>> Yandex
>>
>>> translates a text between two different languages, without any clear
>>>>
>>> reason
>>>
>>>> why. It is not to important if there are weird constructs in the
text,
>>>>
>>> as
>>
>>> long as it is usable in "some" context. We translate a text for
the
>>>>
>>> purpose
>>>
>>>> of republishing it. The text should be usable and easily readable in
>>>>
>>> that
>>
>>> language.
>>>>
>>>>
>>>>
>>>> On Tue, May 2, 2017 at 7:07 PM, Amir E. Aharoni <
>>>> amir.aharoni(a)mail.huji.ac.il> wrote:
>>>>
>>>> 2017-05-02 18:20 GMT+03:00 John Erling Blad <jeblad(a)gmail.com>om>:
>>>>>
>>>>> Brute force solution; turn the ContentTranslation off. Really
>>>>>>
>>>>> stupid
>>
>>> solution.
>>>>>>
>>>>>
>>>>> ... Then I guess you don't mind that I'm changing the thread
name :)
>>>>>
>>>>>
>>>>> The next solution; turn the Yandex engine off. That would solve a
>>>>>> part of the problem. Kind of lousy solution though.
>>>>>>
>>>>>> What about adding a language model that warns when the language
>>>>>>
>>>>> constructs
>>>>>
>>>>>> gets to weird? It is like a "test" for the translation.
The CT is
>>>>>>
>>>>> used
>>>
>>>> for
>>>>>
>>>>>> creating a translation, but the language model is used for
>>>>>>
>>>>> verifying
>>
>>> if
>>>
>>>> the
>>>>>
>>>>>> translation is good enough. If it does not validate against the
>>>>>>
>>>>> language
>>>>
>>>>> model it should simply not be published to the main name space. It
>>>>>>
>>>>> will
>>>
>>>> still be possible to create a draft, but then the user is
>>>>>>
>>>>> completely
>>
>>> aware
>>>>>
>>>>>> that the translation isn't good enough.
>>>>>>
>>>>>> Such a language model should be available as a test for any
>>>>>>
>>>>> article,
>>
>>> as
>>>
>>>> it
>>>>>
>>>>>> can be used as a quality measure for the article. It is really a
>>>>>>
>>>>> quantity
>>>>
>>>>> measure for the well-spokenness of the article, but that isn't
>>>>>>
>>>>> quite
>>
>>> so
>>>
>>>> intuitive.
>>>>>>
>>>>>> So, I'll allow myself to guess that you are talking about
one
>>>>>
>>>> particular
>>>
>>>> language, probably Norwegian.
>>>>>
>>>>> Several technical facts:
>>>>>
>>>>> 1. In the past there were several cases in which translators to
>>>>>
>>>> different
>>>
>>>> languages who reported common translation mistakes to me. I passed
>>>>>
>>>> them
>>
>>> on
>>>>
>>>>> to Yandex developers, with whom I communicate quite regularly. They
>>>>> acknowledged receiving all of them. I am aware of at least one such
>>>>>
>>>> common
>>>>
>>>>> mistake that was fixed; possibly there were more. If you can give me
>>>>>
>>>> a
>>
>>> list
>>>>
>>>>> of such mistakes for Norwegian, I'll be very happy to pass them
on.
I
>>>>> absolutely cannot promise
that they will be fixed upstream, but it's
>>>>> possible.
>>>>>
>>>>> 2. In Norwegian, Apertium is used for translating between the two
>>>>>
>>>> varieties
>>>>
>>>>> of Norwegian itself (Bokmål and Nynorsk), and from other
Scandinavian
>>>>> languages. That's
probably why it works so well—they are similar in
>>>>> grammar, vocabulary, and narrative style (I'll pass it on to
Apertium
>>>>> developers—I'm sure
they'll be happy to hear it). Unfortunately,
>>>>>
>>>> machine
>>>
>>>> translation from English is not available in Apertium. Apertium works
>>>>>
>>>> best
>>>>
>>>>> with very similar languages, and English has two characteristics,
>>>>>
>>>> which
>>
>>> are
>>>>
>>>>> unfortunate when combined: it is both the most popular source for
>>>>> translation into almost all other languages (including Norwegian),
>>>>>
>>>> and
>>
>>> it
>>>
>>>> is not _very_ similar to any other languages (except maybe Scots).
>>>>>
>>>> Machine
>>>>
>>>>> translation from English into Norwegian is only possible with Yandex
>>>>>
>>>> at
>>
>>> the
>>>>
>>>>> moment. More engines may be added in the future, but at the moment
>>>>>
>>>> that's
>>>
>>>> all we have. That's why disabling Yandex completely would indeed be
a
>>>>>
>>>> lousy
>>>>
>>>>> solution: A lot of people say that without machine translation
>>>>>
>>>> integration
>>>>
>>>>> Content Translation is useless. Not all users think like that, but
>>>>>
>>>> many
>>
>>> do.
>>>>
>>>>> 3. We can define a numerical threshold of acceptable percentage of
>>>>>
>>>> machine
>>>>
>>>>> translation post-editing. Currently it's 75%. It's a tad
>>>>>
>>>> embarrassing,
>>
>>> but
>>>>
>>>>> it's hard-coded at the moment, but it can be very easily be made
>>>>>
>>>> into a
>>
>>> variable per language. If the translator tries to publish a page in
>>>>>
>>>> which
>>>
>>>> less than that is modified, a warning will be shown.
>>>>>
>>>>> 4. I'm not sure what do you mean by "language model".
If it's any
>>>>>
>>>> kind
>>
>>> of a
>>>>
>>>>> linguistic engine, then it's definitely not within the resources
that
>>>>>
>>>> the
>>>
>>>> Language team itself can currently dedicate. However, if somebody who
>>>>>
>>>> knows
>>>>
>>>>> Norwegian and some programming will write a script that analyzes
>>>>>
>>>> common
>>
>>> bad
>>>>
>>>>> constructs in a Wikipedia dump, this will be very useful. This would
>>>>> basically be an upgraded version of suggestion #1 above. (In my
spare
>>>>>
>>>> time
>>>>
>>>>> as a volunteer I'm doing something comparable for Hebrew,
although
>>>>>
>>>> not
>>
>>> for
>>>>
>>>>> translation, but for improving how MediaWiki link trails work.)
>>>>> _______________________________________________
>>>>> Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
>>>>> wiki/Mailing_lists/Guidelines
and
https://meta.wikimedia.org/
>>>>> wiki/Wikimedia-l
>>>>> New messages to: Wikimedia-l(a)lists.wikimedia.org
>>>>> Unsubscribe:
https://lists.wikimedia.org/
>>>>>
>>>> mailman/listinfo/wikimedia-l,
>>
>>> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>>>>
>>>> _______________________________________________
>>>> Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
>>>> wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
>>>> wiki/Wikimedia-l
>>>> New messages to: Wikimedia-l(a)lists.wikimedia.org
>>>> Unsubscribe:
https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
>>>>
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
>>> wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
>>> wiki/Wikimedia-l
>>> New messages to: Wikimedia-l(a)lists.wikimedia.org
>>> Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
,
>
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
> _______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list,
guidelines at:
https://meta.wikimedia.org/wik
i/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wik
i/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wik
i/Mailing_lists/Guidelines and
https://meta.wikimedia.org/ wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
--
Etiamsi omnes, ego non
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>