[Wikipedia-l] Project: This wikipedia-related article is a stub...
Lars Aronsson
lars at aronsson.se
Tue Sep 6 21:52:21 UTC 2005
Paweł Dembowski wrote:
> It seems to me that Swedish Wikipedia is quite the opposite - they
> have over 100,000 articles mostly because of the huge amount of
> substubs...
I agree that this is embarrasing and should be addressed. I think
that the Danish Wikipedia, with 30,000 articles, has an even
higher percentage of (sub-)stubs than the Swedish one, but this is
just a feeling and I have no numbers to prove this. We need a
statistic for the amount of (sub-)stubs, so we can talk verifiable
numbers (and set goals) instead of guestimates. How do we define
that? Is the ">200 ch" count ("alternative" article count, [1])
in Erik Zachte's Wikistats a good metric? Or the percentage of
articles longer than 0.5 kilobytes [2]? I think 200 characters is
an OK stub, but perhaps a substub is less than 70 characters?
This leaves us with the Special:Shortpages page. That page has
the advantage of being instantly updated, which Wikistats is not.
The Swedish Wikipedia has 421 articles (0.4% of 102K) shorter than
70 bytes and the Danish has 351 (1.1% of 31K). As a comparison,
the Dutch Wikipedia has 79 (0.08% of 89K) and the Polish has 387
(0.4% of 93K). This makes the Polish look just as bad as the
Swedish, since both have 0.4% of articles shorter than 70 bytes.
But perhaps a substub should be defined at 50 bytes instead?
Or 100 bytes or 150?
[1] Article count (alternate), longer than 200 bytes,
http://en.wikipedia.org/wikistats/EN/TablesArticlesTotalAlt.htm
[2] Articles over 0.5 Kb or 500 bytes,
http://en.wikipedia.org/wikistats/EN/TablesArticlesGt500Bytes.htm
As of July 2005,
[1] [2]
All languages 1.6 M 62%
English 595 K 73%
Japanese 129 K 52%
French 122 K 72%
Dutch 75 K 74% 75K is more than Swedish's 68K
Polish 68 K 65%
Italian 47 K 76%
Swedish 68 K 42% Low percentage
Spanish 53 K 70%
Portuguese 48 K 52%
Chinese 33 K 38% Even lower percentage
Hebrew 20 K 75%
Norwegian 25 K 52%
Finnish 24 K 64%
Russian 20 K 58%
Esperanto 22 K 51%
Danish 20 K 45% Almost as low percentage
--
Lars Aronsson (lars at aronsson.se)
Aronsson Datateknik - http://aronsson.se
More information about the Wikipedia-l
mailing list