[Wikipedia-l] Quality vs Quantity
Craig Franklin
craig at halo-17.net
Sat May 5 04:41:13 UTC 2007
Hi Lars,
I should have said, trying to make a qualitative comparison using
quantitiative measures is quite difficult given the nature of Wikipedias.
To illustrate using the "article count" used on www.wikipedia.org, I had a
looksie at two Wikis, Quechua (2,166 articles), and Friulian (2,041
articles). Using this quantitative comparison, they should be about equal
in comprehensiveness.
However, after hitting "Random" five times on each, I got the following
pages:
Quechua:
http://qu.wikipedia.org/wiki/Chhukruna (stub)
http://qu.wikipedia.org/wiki/Minsk (stub)
http://qu.wikipedia.org/wiki/Yatawaki (stubby, most of the article is just a
bullet-point list)
http://qu.wikipedia.org/wiki/Kiru_ismu (stub)
http://qu.wikipedia.org/wiki/T'aklla (stub)
Friulian:
http://fur.wikipedia.org/wiki/1622 (stub)
http://fur.wikipedia.org/wiki/Timp_coordenât_universâl (short, maybe a Start
class on en:)
http://fur.wikipedia.org/wiki/Lauc (full article, longer by word count than
version on en:, about the same size at that on it: if you remove the large
bar graph from the it: version)
http://fur.wikipedia.org/wiki/Toponims_Talian_Furlan_D (list)
http://fur.wikipedia.org/wiki/Islam (full article)
Anecdotal maybe, but clearly once you eyeball it, the Friulian Wikipedia has
the edge in terms of comprehensiveness and quality over Quechua. Trouble
is, I don't see how this can be algorithmically determined using an
automated process. You're going to need humans to look at these things and
make the determination, and really, who has time to do that for 200+
wikipedias? Not to mention the accusations of bias, poor article selection,
and other such things that will be made.
Using word counts is also going to be a problem too, languages like Norfuk
that have lots of small particle words and the like will show inflated word
counts compared to languages like Mandarin that don't have written words in
the Western sense, languages like Gaeilge or Cymraeg where the very notion
of "word" is a pretty nebulous one, or languages like Kalaallisut which
compress many meanings and affixes into each and every word.
Cheers,
Craig Franklin
> Date: Thu, 3 May 2007 11:30:53 +0200 (CEST)
> From: Lars Aronsson <lars at aronsson.se>
> Subject: Re: [Wikipedia-l] Quality vs Quantity
> To: wikipedia-l at lists.wikimedia.org
> Message-ID: <Pine.LNX.4.64.0705031128240.15940 at localhost.localdomain>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
> Craig Franklin wrote:
>
>> You show me a way of quantitatively comparing two Wikis using an
>> automated process, and I'll show you a language or Wiki that
>> will break it.
>
> Today the front page www.wikipedia.org measures and compares the
> number of articles. You can begin to "break" that method. And
> then you can figure out some method that might perhaps be slightly
> better. I've suggested two already.
>
>
> --
> Lars Aronsson (lars at aronsson.se)
> Aronsson Datateknik - http://aronsson.se
More information about the Wikipedia-l
mailing list