I think that the problem is how much value is placed on article count.
Rather, we should place the value on size in bytes -- obviously, some
languages take up more or less space than others, but it does seem to
work better: some Wikis with high article counts but low amounts of
content appear lower or on the same level with Wikis with low article
counts but relatively high amounts of content.
For example, see br.wiki, scn.wiki, li.wiki, compare them with
bn.wiki, sa.wiki (much of the size of sa.wiki is artificial as well
due to whole sections of the Rgveda being copied verbatim when they
really belong in Wikisource), and gd.wiki.
In fact, you can tell just how nasty so many of the articles on
sa.wiki are by taking a look at this image:
http://en.wikipedia.org/wikistats/EN/PlotDatabaseSize7.png
It's the only wiki of such a size to have the vast majority of its
growth in giant leaps like that, which is indicative of a bot or some
other fast, low-quality article adding technique.
Mark
On 06/09/05, Tomasz Wegrzanowski <taw(a)users.sf.net> wrote:
On Tue, Sep 06, 2005 at 11:52:21PM +0200, Lars
Aronsson wrote:
Paweł Dembowski wrote:
It seems to me that Swedish Wikipedia is quite
the opposite - they
have over 100,000 articles mostly because of the huge amount of
substubs...
I agree that this is embarrasing and should be addressed. I think
that the Danish Wikipedia, with 30,000 articles, has an even
higher percentage of (sub-)stubs than the Swedish one, but this is
just a feeling and I have no numbers to prove this. We need a
statistic for the amount of (sub-)stubs, so we can talk verifiable
numbers (and set goals) instead of guestimates. How do we define
that? Is the ">200 ch" count ("alternative" article count, [1])
in Erik Zachte's Wikistats a good metric? Or the percentage of
articles longer than 0.5 kilobytes [2]? I think 200 characters is
an OK stub, but perhaps a substub is less than 70 characters?
This leaves us with the Special:Shortpages page. That page has
the advantage of being instantly updated, which Wikistats is not.
The Swedish Wikipedia has 421 articles (0.4% of 102K) shorter than
70 bytes and the Danish has 351 (1.1% of 31K). As a comparison,
the Dutch Wikipedia has 79 (0.08% of 89K) and the Polish has 387
(0.4% of 93K). This makes the Polish look just as bad as the
Swedish, since both have 0.4% of articles shorter than 70 bytes.
But perhaps a substub should be defined at 50 bytes instead?
Or 100 bytes or 150?
Numbers like 0.4% of articles tell more about effectiveness
of the wikicleaning process than about the typical article.
(and by the way, Special:Shortpages is not updated live
on WikiMedia servers)
Just take a look at the list of shortest pages on Polish
Wikipedia - they're almost all:
* Redirects (what are they doing on the list ?)
* Disambiguation pages without descriptions for the links.
Sometimes articles have titles so obvious that {{disambig}} +
list of the links is enough.
* A few cases of things that look like leftovers of the
past technical problems
* A few cases of things that should be immediately deteled,
but have been missed or are simply too recent and will
be deleted soon
_______________________________________________
Wikipedia-l mailing list
Wikipedia-l(a)Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
--
SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES
QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM
POSSIT MATERIARI
ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE