[Wiktionary-l] Wiktionary quality issues

Minh Nguyen mxn at zoomtown.com
Sat Apr 21 19:40:35 UTC 2007


Thanks for bringing this up, Gerard. As I noted on your blog entry, 
we're aware of the problem and are working to correct it. Right now, 
we're about to blank the existing templates and import new ones directly 
from the Russian Wiktionary. [1] There aren't many of us working on the 
Vietnamese Wiktionary, as I've said before, and only one of us knows any 
Russian (and not that much). That's why it's taken so long for anyone to 
notice the mistakes. You're all welcome to join in on our discussion.

I would oppose delisting the Vietnamese Wiktionary on the grounds that 
our Vietnamese, English, and French entries -- which make up the vast 
majority of our site -- are rather good. In fact, the source that we 
used for all our imports is *the* Vietnamese translationary on the Web. 
It's just the conjugation tables that PiedBot created that are the 
problem. I think having your bot distinguish between the Russian and 
non-Russian entries would be more trouble than it's worth.

By the way, you might want to have a look at the Lombard Wikipedia 
sometime. They have thousands of articles in English that claim to be in 
a variety of Lombard. [2] By comparison, the Russian Wiktionary doesn't 
look that bad. :)

[1] 
<http://vi.wiktionary.org/wiki/Thảo_luận_Thành_viên:David#Re:_.5B.5BTh.E1.BA.A3o_lu.E1.BA.ADn_Th.C3.A0nh_vi.C3.AAn:Mxn.23Russian_conjugations.7CRussian_conjugations.5D.5D>
[2] 
<http://lmo.wikipedia.org/wiki/14th_Street_(IRT_Broadway-Seventh_Avenue_Line)>

GerardM wrote:
> On the Wiktionary <http://wiktionary.org/> project I run the interwiki bot.
> The process is simple; when an article exists in another language spelled
> exactly the same, I create an "interwiki" link. This allows you to see the
> information on another language Wiktionary. This process is an automated
> process, it works on all Wiktionaries and it is an unattended process.
> 
> I have received a request from the Polish Wiktionary to stop adding
> interwiki links for the Russian and for the Vietnamese Wiktionary. The
> reason given is one of quality. On the Russian Wiktionary many of the
> articles are created by a bot and they do not provide good information. An
> example is dispersion, <http://ru.wiktionary.org/wiki/dispersion> there is
> nothing really in there. The Vietnamese Wiktionary is more problematic
> because a bot was used to generate declension and conjugation tables of
> Russian words and they got it wrong.
> 
> The Russian Wiktionary has some 81.000 empty shells and refuse to remove it.
> The Vietnamese are not willing to remove there incorrect data.
> 
> I have been asked to stop including the Russian Wiktionary and the
> Vietnamese Wiktionary when I run the interwiki process. To be honest, I run
> the bot as a service and I do not think it is the right thing to do. I think
> the Vietnamese are wrong not to correct the wrong data that they have. I am
> less sure about the Russian approach; in essence it is a stub. However,
> creating a Wiktionary in this way is like stamp collecting; you can look at
> it but there is not information about it.
> 
> Given how the process works, I am not sure that I can exclude either the
> Russian or the Vietnamese Wiktionary. The way it works is that I run
> explicitly on all Wiktionaries. When I exclude Russian or Vietnamese, I will
> probably end up removing all references to these projects. They are the
> third and fourth Wiktionary is size.
> 
> When I do not exclude the Russian and the Vietnamese Wiktionary, the bot may
> end up being blocked on the Polish Wiktionary. This will also kill off the
> interwiki process.
> 
> From my point of view, using bots to generate content in a Wiktionary only
> makes sense when there is at least a link to the word in the base language.
> When the initial creation of stubs is followed by the enrichment of these
> stubs it is acceptable. For having information that is completely wrong,
> there is no excuse.
> 
> The question is, will there be a discussion about acceptable practices in
> Wiktionary. The question are:
> 
>    - Can the Polish demand what they do?
>    - Is having a project that consists mainly of stubs acceptable?
>    - Is having incorrect data acceptable?
> 
> Thanks,
> GerardM
> 
> PS I copied this from my blog.

-- 
Minh Nguyen <mxn at zoomtown.com>
[[en:User:Mxn]] [[vi:User:Mxn]] [[m:User:Mxn]]
AIM: trycom2000; Jabber: mxn at myjabber.net; Blog: http://mxn.f2o.org/




More information about the Wiktionary-l mailing list