[Wikipedia-l] Project: This wikipedia-related article is a stub...

Lars Aronsson lars at aronsson.se
Tue Sep 6 21:52:21 UTC 2005


Paweł Dembowski wrote:
> It seems to me that Swedish Wikipedia is quite the opposite - they
> have over 100,000 articles mostly because of the huge amount of
> substubs...

I agree that this is embarrasing and should be addressed. I think 
that the Danish Wikipedia, with 30,000 articles, has an even 
higher percentage of (sub-)stubs than the Swedish one, but this is 
just a feeling and I have no numbers to prove this.  We need a 
statistic for the amount of (sub-)stubs, so we can talk verifiable 
numbers (and set goals) instead of guestimates.  How do we define 
that?  Is the ">200 ch" count ("alternative" article count, [1]) 
in Erik Zachte's Wikistats a good metric?  Or the percentage of 
articles longer than 0.5 kilobytes [2]?  I think 200 characters is 
an OK stub, but perhaps a substub is less than 70 characters?
This leaves us with the Special:Shortpages page.  That page has 
the advantage of being instantly updated, which Wikistats is not.

The Swedish Wikipedia has 421 articles (0.4% of 102K) shorter than 
70 bytes and the Danish has 351 (1.1% of 31K).  As a comparison, 
the Dutch Wikipedia has 79 (0.08% of 89K) and the Polish has 387 
(0.4% of 93K).  This makes the Polish look just as bad as the 
Swedish, since both have 0.4% of articles shorter than 70 bytes.
But perhaps a substub should be defined at 50 bytes instead?
Or 100 bytes or 150?

[1] Article count (alternate), longer than 200 bytes,
http://en.wikipedia.org/wikistats/EN/TablesArticlesTotalAlt.htm

[2] Articles over 0.5 Kb or 500 bytes,
http://en.wikipedia.org/wikistats/EN/TablesArticlesGt500Bytes.htm

As of July 2005,
                 [1]    [2]
  All languages  1.6 M  62%
  English        595 K  73%
  Japanese       129 K  52%
  French         122 K  72%
  Dutch           75 K  74%  75K is more than Swedish's 68K
  Polish          68 K  65%
  Italian         47 K  76%
  Swedish         68 K  42%  Low percentage
  Spanish         53 K  70%
  Portuguese      48 K  52%
  Chinese         33 K  38%  Even lower percentage
  Hebrew          20 K  75%
  Norwegian       25 K  52%
  Finnish         24 K  64%
  Russian         20 K  58%
  Esperanto       22 K  51%
  Danish          20 K  45%  Almost as low percentage


-- 
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se



More information about the Wikipedia-l mailing list