[Wiktionary-l] Wiktionary quality issues

21 Apr 2007


      On the Wiktionary http://wiktionary.org/ project I run the interwiki bot.
The process is simple; when an article exists in another language spelled
exactly the same, I create an "interwiki" link. This allows you to see the
information on another language Wiktionary. This process is an automated
process, it works on all Wiktionaries and it is an unattended process.
I have received a request from the Polish Wiktionary to stop adding
interwiki links for the Russian and for the Vietnamese Wiktionary. The
reason given is one of quality. On the Russian Wiktionary many of the
articles are created by a bot and they do not provide good information. An
example is dispersion, http://ru.wiktionary.org/wiki/dispersion there is
nothing really in there. The Vietnamese Wiktionary is more problematic
because a bot was used to generate declension and conjugation tables of
Russian words and they got it wrong.
The Russian Wiktionary has some 81.000 empty shells and refuse to remove it.
The Vietnamese are not willing to remove there incorrect data.
I have been asked to stop including the Russian Wiktionary and the
Vietnamese Wiktionary when I run the interwiki process. To be honest, I run
the bot as a service and I do not think it is the right thing to do. I think
the Vietnamese are wrong not to correct the wrong data that they have. I am
less sure about the Russian approach; in essence it is a stub. However,
creating a Wiktionary in this way is like stamp collecting; you can look at
it but there is not information about it.
Given how the process works, I am not sure that I can exclude either the
Russian or the Vietnamese Wiktionary. The way it works is that I run
explicitly on all Wiktionaries. When I exclude Russian or Vietnamese, I will
probably end up removing all references to these projects. They are the
third and fourth Wiktionary is size.
When I do not exclude the Russian and the Vietnamese Wiktionary, the bot may
end up being blocked on the Polish Wiktionary. This will also kill off the
interwiki process.
...
From my point of view, using bots to generate content in a Wiktionary only
makes sense when there is at least a link to the word in the base language.
When the initial creation of stubs is followed by the enrichment of these
stubs it is acceptable. For having information that is completely wrong,
there is no excuse.
The question is, will there be a discussion about acceptable practices in
Wiktionary. The question are:
- Can the Polish demand what they do?
   - Is having a project that consists mainly of stubs acceptable?
   - Is having incorrect data acceptable?
Thanks,
GerardM
PS I copied this from my blog.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

[Wiktionary-l] Wiktionary quality issues