Paweł Dembowski wrote:
It seems to me that Swedish Wikipedia is quite the opposite - they have over 100,000 articles mostly because of the huge amount of substubs...
I agree that this is embarrasing and should be addressed. I think that the Danish Wikipedia, with 30,000 articles, has an even higher percentage of (sub-)stubs than the Swedish one, but this is just a feeling and I have no numbers to prove this. We need a statistic for the amount of (sub-)stubs, so we can talk verifiable numbers (and set goals) instead of guestimates. How do we define that? Is the ">200 ch" count ("alternative" article count, [1]) in Erik Zachte's Wikistats a good metric? Or the percentage of articles longer than 0.5 kilobytes [2]? I think 200 characters is an OK stub, but perhaps a substub is less than 70 characters? This leaves us with the Special:Shortpages page. That page has the advantage of being instantly updated, which Wikistats is not.
The Swedish Wikipedia has 421 articles (0.4% of 102K) shorter than 70 bytes and the Danish has 351 (1.1% of 31K). As a comparison, the Dutch Wikipedia has 79 (0.08% of 89K) and the Polish has 387 (0.4% of 93K). This makes the Polish look just as bad as the Swedish, since both have 0.4% of articles shorter than 70 bytes. But perhaps a substub should be defined at 50 bytes instead? Or 100 bytes or 150?
[1] Article count (alternate), longer than 200 bytes, http://en.wikipedia.org/wikistats/EN/TablesArticlesTotalAlt.htm
[2] Articles over 0.5 Kb or 500 bytes, http://en.wikipedia.org/wikistats/EN/TablesArticlesGt500Bytes.htm
As of July 2005, [1] [2] All languages 1.6 M 62% English 595 K 73% Japanese 129 K 52% French 122 K 72% Dutch 75 K 74% 75K is more than Swedish's 68K Polish 68 K 65% Italian 47 K 76% Swedish 68 K 42% Low percentage Spanish 53 K 70% Portuguese 48 K 52% Chinese 33 K 38% Even lower percentage Hebrew 20 K 75% Norwegian 25 K 52% Finnish 24 K 64% Russian 20 K 58% Esperanto 22 K 51% Danish 20 K 45% Almost as low percentage