[Wikipedia-l] Quality vs Quantity

Craig Franklin craig at halo-17.net
Sat May 5 04:41:13 UTC 2007


Hi Lars,

I should have said, trying to make a qualitative comparison using 
quantitiative measures is quite difficult given the nature of Wikipedias.

To illustrate using the "article count" used on www.wikipedia.org, I had a 
looksie at two Wikis, Quechua (2,166 articles), and Friulian (2,041 
articles).  Using this quantitative comparison, they should be about equal 
in comprehensiveness.

However, after hitting "Random" five times on each, I got the following 
pages:

Quechua:

http://qu.wikipedia.org/wiki/Chhukruna (stub)
http://qu.wikipedia.org/wiki/Minsk (stub)
http://qu.wikipedia.org/wiki/Yatawaki (stubby, most of the article is just a 
bullet-point list)
http://qu.wikipedia.org/wiki/Kiru_ismu (stub)
http://qu.wikipedia.org/wiki/T'aklla (stub)

Friulian:

http://fur.wikipedia.org/wiki/1622 (stub)
http://fur.wikipedia.org/wiki/Timp_coordenât_universâl (short, maybe a Start 
class on en:)
http://fur.wikipedia.org/wiki/Lauc (full article, longer by word count than 
version on en:, about the same size at that on it: if you remove the large 
bar graph from the it: version)
http://fur.wikipedia.org/wiki/Toponims_Talian_Furlan_D (list)
http://fur.wikipedia.org/wiki/Islam (full article)

Anecdotal maybe, but clearly once you eyeball it, the Friulian Wikipedia has 
the edge in terms of comprehensiveness and quality over Quechua.  Trouble 
is, I don't see how this can be algorithmically determined using an 
automated process.  You're going to need humans to look at these things and 
make the determination, and really, who has time to do that for 200+ 
wikipedias?  Not to mention the accusations of bias, poor article selection, 
and other such things that will be made.

Using word counts is also going to be a problem too, languages like Norfuk 
that have lots of small particle words and the like will show inflated word 
counts compared to languages like Mandarin that don't have written words in 
the Western sense, languages like Gaeilge or Cymraeg where the very notion 
of "word" is a pretty nebulous one, or languages like Kalaallisut which 
compress many meanings and affixes into each and every word.

Cheers,
Craig Franklin

> Date: Thu, 3 May 2007 11:30:53 +0200 (CEST)
> From: Lars Aronsson <lars at aronsson.se>
> Subject: Re: [Wikipedia-l] Quality vs Quantity
> To: wikipedia-l at lists.wikimedia.org
> Message-ID: <Pine.LNX.4.64.0705031128240.15940 at localhost.localdomain>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
> Craig Franklin wrote:
>
>> You show me a way of quantitatively comparing two Wikis using an
>> automated process, and I'll show you a language or Wiki that
>> will break it.
>
> Today the front page www.wikipedia.org measures and compares the
> number of articles.  You can begin to "break" that method.  And
> then you can figure out some method that might perhaps be slightly
> better.  I've suggested two already.
>
>
> -- 
>  Lars Aronsson (lars at aronsson.se)
>  Aronsson Datateknik - http://aronsson.se




More information about the Wikipedia-l mailing list