Note that some language pairs could easily be 100% correct.
On Wed, May 3, 2017 at 1:06 PM, David Cuenca Tudela <dacuetu(a)gmail.com>
wrote:
Perhaps it would be a good idea to compare the
translated text to the text
that the user wants to save.
If they are more than 95% the same, that means that the user didn't take
the effort to correct the text.
Cheers,
Micru
On Wed, May 3, 2017 at 10:31 AM, Wojciech Pędzich <wpedzich(a)gmail.com>
wrote:
It does depend a lot on the engagement level of
the human behind the
keyboard. When I deal with machine-translated text, I simply wonder
whether
the someone behind the keyboard took efforts to
actually read the piece.
Now whether this would work if limited to namespaces outside "main" - I
do
not want to demonise the issue, but if the person
submitting the text for
machine translation does not read it, what will stop them from a quick
ctrl+c / ctrl+v? Just asking.
Wojciech
W dniu 2017-05-03 o 09:33, Yaroslav Blanter pisze:
Creating machine translations only in the draft space (or in the user
space
> in the projects which do not have draft)
could help.
>
> Cheers
> Yaroslav
>
> On Tue, May 2, 2017 at 10:16 PM, Pharos <pharosofalexandria(a)gmail.com>
> wrote:
>
> I think it all depends on the level of engagement of the human
translator.
>>
>> When the tool is used in the right way, it is a fantastic tool.
>>
>> Maybe we can find better methods to nudge people toward taking their
time
>> and really doing work on their
translations.
>>
>> Thanks,
>> Pharos
>>
>> On Tue, May 2, 2017 at 4:09 PM, Bodhisattwa Mandal <
>> bodhisattwa.rgkmc(a)gmail.com> wrote:
>>
>> Content translation with Yandex is also a problem in Bengali Wikipedia.
>>> Some users have grown a tendency to create machine translated
>>> meaningless
>>> articles with this extension to increase edit count and article count.
>>>
>> This
>>
>>> has increased the workloads of admins to find and delete those
articles.
>>>
>>> Yandex is not ready for many languages and it is better to shut it. We
>>> don't need it in Bengali.
>>>
>>> Regards
>>> On May 3, 2017 12:17 AM, "John Erling Blad"
<jeblad(a)gmail.com> wrote:
>>>
>>> Actually this _is_ about turning ContentTranslation off, that is what
>>>> several users in the community want. They block people using the
>>>>
>>> extension
>>>
>>>> and delete the translated articles. Use of ContentTranslation has
>>>>
>>> become
>>
>>> a
>>>
>>>> rather contentious case.
>>>>
>>>> Yandex as a general translation engine to be able to read some alien
>>>> language is quite good, but as an engine to produce written text it
is
>>>>
>>> not
>>>
>>>> very good at all. In fact it often creates quite horrible Norwegian,
>>>>
>>> even
>>
>>> for closely related languages. One quite common problem is reordering
>>>>
>>> of
>>
>>> words into meaningless constructs, an other problem is reordering
>>>>
>>> lexical
>>
>>> gender in weird ways. The English preposition "a" is often
translated
>>>>
>>> as
>>
>>> "en" in a propositional phrase, and then the gender is added to
the
>>>> following phrase. That gives a translation of "Oppland is a county
>>>>
>>> in…"
>>
>>> into something like "Oppland er en fylket i…" This should be
"Oppland
>>>>
>>> er
>>
>>> et fylke i…".
>>>>
>>>> (I just checked and it seems like Yandex messes up a lot less now
than
>>>> previously, but it is still
pretty bad.)
>>>>
>>>> Apertium works because the language is closely related, Yandex does
not
>>>> work because it is used between
very different languages. People try
to
>>>>
>>> use
>>>
>>>> Yandex and gets disappointed, and falsely conclude that all language
>>>> translations are equally weird. They are not, but Yandex translations
>>>>
>>> are
>>
>>> weird.
>>>>
>>>> The numerical threshold does not work. The reason is simple, the
number
>>>>
>>> of
>>>
>>>> fixes depends on language constructs that fails, and that is simply
>>>>
>>> not a
>>
>>> constant for small text fragments. Perhaps if we could flag specific
>>>> language constructs that is known to give a high percentage of
>>>>
>>> failures,
>>
>>> and if the translator must check those sentences. One such language
>>>> construct is disappearances between the preposition and the gender of
>>>>
>>> the
>>
>>> following term in a prepositional phrase. If they are not similar,
then
>>>>
>>> the
>>>
>>>> sentence must be checked. It is not always wrong to write "en
jenta"
in
>>>> Norwegian, but it is likely to be
wrong.
>>>>
>>>> A language model could be a statistical model for the language
itself,
>>>>
>>> not
>>>
>>>> for the translation into that language. We don't want a perfect
>>>>
>>> language
>>
>>> model, but a sufficient language model to mark weird constructs. A
very
>>>> simple solution could simply be
to mark tri-grams that does not
>>>>
>>> already
>>
>>> exist in the text base for the destination as possible errors. It is
>>>>
>>> not
>>
>>> necessary to do a live check, but at least do it before the page can
>>>>
>>> be
>>
>>> saved.
>>>>
>>>> Note the difference in what Yandex do and what we want to achieve;
>>>>
>>> Yandex
>>
>>> translates a text between two different languages, without any clear
>>>>
>>> reason
>>>
>>>> why. It is not to important if there are weird constructs in the
text,
>>>>
>>> as
>>
>>> long as it is usable in "some" context. We translate a text for
the
>>>>
>>> purpose
>>>
>>>> of republishing it. The text should be usable and easily readable in
>>>>
>>> that
>>
>>> language.
>>>>
>>>>
>>>>
>>>> On Tue, May 2, 2017 at 7:07 PM, Amir E. Aharoni <
>>>> amir.aharoni(a)mail.huji.ac.il> wrote:
>>>>
>>>> 2017-05-02 18:20 GMT+03:00 John Erling Blad <jeblad(a)gmail.com>om>:
>>>>>
>>>>> Brute force solution; turn the ContentTranslation off. Really
>>>>>>
>>>>> stupid
>>
>>> solution.
>>>>>>
>>>>>
>>>>> ... Then I guess you don't mind that I'm changing the thread
name :)
>>>>>
>>>>>
>>>>> The next solution; turn the Yandex engine off. That would solve a
>>>>>> part of the problem. Kind of lousy solution though.
>>>>>>
>>>>>> What about adding a language model that warns when the language
>>>>>>
>>>>> constructs
>>>>>
>>>>>> gets to weird? It is like a "test" for the translation.
The CT is
>>>>>>
>>>>> used
>>>
>>>> for
>>>>>
>>>>>> creating a translation, but the language model is used for
>>>>>>
>>>>> verifying
>>
>>> if
>>>
>>>> the
>>>>>
>>>>>> translation is good enough. If it does not validate against the
>>>>>>
>>>>> language
>>>>
>>>>> model it should simply not be published to the main name space. It
>>>>>>
>>>>> will
>>>
>>>> still be possible to create a draft, but then the user is
>>>>>>
>>>>> completely
>>
>>> aware
>>>>>
>>>>>> that the translation isn't good enough.
>>>>>>
>>>>>> Such a language model should be available as a test for any
>>>>>>
>>>>> article,
>>
>>> as
>>>
>>>> it
>>>>>
>>>>>> can be used as a quality measure for the article. It is really a
>>>>>>
>>>>> quantity
>>>>
>>>>> measure for the well-spokenness of the article, but that isn't
>>>>>>
>>>>> quite
>>
>>> so
>>>
>>>> intuitive.
>>>>>>
>>>>>> So, I'll allow myself to guess that you are talking about
one
>>>>>
>>>> particular
>>>
>>>> language, probably Norwegian.
>>>>>
>>>>> Several technical facts:
>>>>>
>>>>> 1. In the past there were several cases in which translators to
>>>>>
>>>> different
>>>
>>>> languages who reported common translation mistakes to me. I passed
>>>>>
>>>> them
>>
>>> on
>>>>
>>>>> to Yandex developers, with whom I communicate quite regularly. They
>>>>> acknowledged receiving all of them. I am aware of at least one such
>>>>>
>>>> common
>>>>
>>>>> mistake that was fixed; possibly there were more. If you can give me
>>>>>
>>>> a
>>
>>> list
>>>>
>>>>> of such mistakes for Norwegian, I'll be very happy to pass them
on.
I
>>>>> absolutely cannot promise
that they will be fixed upstream, but it's
>>>>> possible.
>>>>>
>>>>> 2. In Norwegian, Apertium is used for translating between the two
>>>>>
>>>> varieties
>>>>
>>>>> of Norwegian itself (Bokmål and Nynorsk), and from other
Scandinavian
>>>>> languages. That's
probably why it works so well—they are similar in
>>>>> grammar, vocabulary, and narrative style (I'll pass it on to
Apertium
>>>>> developers—I'm sure
they'll be happy to hear it). Unfortunately,
>>>>>
>>>> machine
>>>
>>>> translation from English is not available in Apertium. Apertium works
>>>>>
>>>> best
>>>>
>>>>> with very similar languages, and English has two characteristics,
>>>>>
>>>> which
>>
>>> are
>>>>
>>>>> unfortunate when combined: it is both the most popular source for
>>>>> translation into almost all other languages (including Norwegian),
>>>>>
>>>> and
>>
>>> it
>>>
>>>> is not _very_ similar to any other languages (except maybe Scots).
>>>>>
>>>> Machine
>>>>
>>>>> translation from English into Norwegian is only possible with Yandex
>>>>>
>>>> at
>>
>>> the
>>>>
>>>>> moment. More engines may be added in the future, but at the moment
>>>>>
>>>> that's
>>>
>>>> all we have. That's why disabling Yandex completely would indeed be
a
>>>>>
>>>> lousy
>>>>
>>>>> solution: A lot of people say that without machine translation
>>>>>
>>>> integration
>>>>
>>>>> Content Translation is useless. Not all users think like that, but
>>>>>
>>>> many
>>
>>> do.
>>>>
>>>>> 3. We can define a numerical threshold of acceptable percentage of
>>>>>
>>>> machine
>>>>
>>>>> translation post-editing. Currently it's 75%. It's a tad
>>>>>
>>>> embarrassing,
>>
>>> but
>>>>
>>>>> it's hard-coded at the moment, but it can be very easily be made
>>>>>
>>>> into a
>>
>>> variable per language. If the translator tries to publish a page in
>>>>>
>>>> which
>>>
>>>> less than that is modified, a warning will be shown.
>>>>>
>>>>> 4. I'm not sure what do you mean by "language model".
If it's any
>>>>>
>>>> kind
>>
>>> of a
>>>>
>>>>> linguistic engine, then it's definitely not within the resources
that
>>>>>
>>>> the
>>>
>>>> Language team itself can currently dedicate. However, if somebody who
>>>>>
>>>> knows
>>>>
>>>>> Norwegian and some programming will write a script that analyzes
>>>>>
>>>> common
>>
>>> bad
>>>>
>>>>> constructs in a Wikipedia dump, this will be very useful. This would
>>>>> basically be an upgraded version of suggestion #1 above. (In my
spare
>>>>>
>>>> time
>>>>
>>>>> as a volunteer I'm doing something comparable for Hebrew,
although
>>>>>
>>>> not
>>
>>> for
>>>>
>>>>> translation, but for improving how MediaWiki link trails work.)
>>>>> _______________________________________________
>>>>> Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
>>>>> wiki/Mailing_lists/Guidelines
and
https://meta.wikimedia.org/
>>>>> wiki/Wikimedia-l
>>>>> New messages to: Wikimedia-l(a)lists.wikimedia.org
>>>>> Unsubscribe:
https://lists.wikimedia.org/
>>>>>
>>>> mailman/listinfo/wikimedia-l,
>>
>>> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>>>>
>>>> _______________________________________________
>>>> Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
>>>> wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
>>>> wiki/Wikimedia-l
>>>> New messages to: Wikimedia-l(a)lists.wikimedia.org
>>>> Unsubscribe:
https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
>>>>
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
>>> wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
>>> wiki/Wikimedia-l
>>> New messages to: Wikimedia-l(a)lists.wikimedia.org
>>> Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
,
>
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
> _______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list,
guidelines at:
https://meta.wikimedia.org/wik
i/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wik
i/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wik
i/Mailing_lists/Guidelines and
https://meta.wikimedia.org/ wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
--
Etiamsi omnes, ego non
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/
wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/
wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>