On Tue, Sep 06, 2005 at 11:52:21PM +0200, Lars Aronsson wrote:
Paweł Dembowski wrote:
It seems to me that Swedish Wikipedia is quite
the opposite - they
have over 100,000 articles mostly because of the huge amount of
substubs...
I agree that this is embarrasing and should be addressed. I think
that the Danish Wikipedia, with 30,000 articles, has an even
higher percentage of (sub-)stubs than the Swedish one, but this is
just a feeling and I have no numbers to prove this. We need a
statistic for the amount of (sub-)stubs, so we can talk verifiable
numbers (and set goals) instead of guestimates. How do we define
that? Is the ">200 ch" count ("alternative" article count, [1])
in Erik Zachte's Wikistats a good metric? Or the percentage of
articles longer than 0.5 kilobytes [2]? I think 200 characters is
an OK stub, but perhaps a substub is less than 70 characters?
This leaves us with the Special:Shortpages page. That page has
the advantage of being instantly updated, which Wikistats is not.
The Swedish Wikipedia has 421 articles (0.4% of 102K) shorter than
70 bytes and the Danish has 351 (1.1% of 31K). As a comparison,
the Dutch Wikipedia has 79 (0.08% of 89K) and the Polish has 387
(0.4% of 93K). This makes the Polish look just as bad as the
Swedish, since both have 0.4% of articles shorter than 70 bytes.
But perhaps a substub should be defined at 50 bytes instead?
Or 100 bytes or 150?
Numbers like 0.4% of articles tell more about effectiveness
of the wikicleaning process than about the typical article.
(and by the way, Special:Shortpages is not updated live
on WikiMedia servers)
Just take a look at the list of shortest pages on Polish
Wikipedia - they're almost all:
* Redirects (what are they doing on the list ?)
* Disambiguation pages without descriptions for the links.
Sometimes articles have titles so obvious that {{disambig}} +
list of the links is enough.
* A few cases of things that look like leftovers of the
past technical problems
* A few cases of things that should be immediately deteled,
but have been missed or are simply too recent and will
be deleted soon