I think that the problem is how much value is placed on article count.
Rather, we should place the value on size in bytes -- obviously, some languages take up more or less space than others, but it does seem to work better: some Wikis with high article counts but low amounts of content appear lower or on the same level with Wikis with low article counts but relatively high amounts of content.
For example, see br.wiki, scn.wiki, li.wiki, compare them with bn.wiki, sa.wiki (much of the size of sa.wiki is artificial as well due to whole sections of the Rgveda being copied verbatim when they really belong in Wikisource), and gd.wiki.
In fact, you can tell just how nasty so many of the articles on sa.wiki are by taking a look at this image: http://en.wikipedia.org/wikistats/EN/PlotDatabaseSize7.png
It's the only wiki of such a size to have the vast majority of its growth in giant leaps like that, which is indicative of a bot or some other fast, low-quality article adding technique.
Mark
On 06/09/05, Tomasz Wegrzanowski taw@users.sf.net wrote:
On Tue, Sep 06, 2005 at 11:52:21PM +0200, Lars Aronsson wrote:
Paweł Dembowski wrote:
It seems to me that Swedish Wikipedia is quite the opposite - they have over 100,000 articles mostly because of the huge amount of substubs...
I agree that this is embarrasing and should be addressed. I think that the Danish Wikipedia, with 30,000 articles, has an even higher percentage of (sub-)stubs than the Swedish one, but this is just a feeling and I have no numbers to prove this. We need a statistic for the amount of (sub-)stubs, so we can talk verifiable numbers (and set goals) instead of guestimates. How do we define that? Is the ">200 ch" count ("alternative" article count, [1]) in Erik Zachte's Wikistats a good metric? Or the percentage of articles longer than 0.5 kilobytes [2]? I think 200 characters is an OK stub, but perhaps a substub is less than 70 characters? This leaves us with the Special:Shortpages page. That page has the advantage of being instantly updated, which Wikistats is not.
The Swedish Wikipedia has 421 articles (0.4% of 102K) shorter than 70 bytes and the Danish has 351 (1.1% of 31K). As a comparison, the Dutch Wikipedia has 79 (0.08% of 89K) and the Polish has 387 (0.4% of 93K). This makes the Polish look just as bad as the Swedish, since both have 0.4% of articles shorter than 70 bytes. But perhaps a substub should be defined at 50 bytes instead? Or 100 bytes or 150?
Numbers like 0.4% of articles tell more about effectiveness of the wikicleaning process than about the typical article. (and by the way, Special:Shortpages is not updated live on WikiMedia servers)
Just take a look at the list of shortest pages on Polish Wikipedia - they're almost all:
- Redirects (what are they doing on the list ?)
- Disambiguation pages without descriptions for the links. Sometimes articles have titles so obvious that {{disambig}} + list of the links is enough.
- A few cases of things that look like leftovers of the past technical problems
- A few cases of things that should be immediately deteled, but have been missed or are simply too recent and will be deleted soon
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l