[Wikipedia-l] Quality vs Quantity

Mark Williamson node.ue at gmail.com
Sat May 5 22:11:24 UTC 2007


Yes, Lars suggested using the size after compression, which would
solve most if not all of the problems you noted.

Mark

On 04/05/07, Craig Franklin <craig at halo-17.net> wrote:
> Hi Lars,
>
> I should have said, trying to make a qualitative comparison using
> quantitiative measures is quite difficult given the nature of Wikipedias.
>
> To illustrate using the "article count" used on www.wikipedia.org, I had a
> looksie at two Wikis, Quechua (2,166 articles), and Friulian (2,041
> articles).  Using this quantitative comparison, they should be about equal
> in comprehensiveness.
>
> However, after hitting "Random" five times on each, I got the following
> pages:
>
> Quechua:
>
> http://qu.wikipedia.org/wiki/Chhukruna (stub)
> http://qu.wikipedia.org/wiki/Minsk (stub)
> http://qu.wikipedia.org/wiki/Yatawaki (stubby, most of the article is just a
> bullet-point list)
> http://qu.wikipedia.org/wiki/Kiru_ismu (stub)
> http://qu.wikipedia.org/wiki/T'aklla (stub)
>
> Friulian:
>
> http://fur.wikipedia.org/wiki/1622 (stub)
> http://fur.wikipedia.org/wiki/Timp_coordenât_universâl (short, maybe a Start
> class on en:)
> http://fur.wikipedia.org/wiki/Lauc (full article, longer by word count than
> version on en:, about the same size at that on it: if you remove the large
> bar graph from the it: version)
> http://fur.wikipedia.org/wiki/Toponims_Talian_Furlan_D (list)
> http://fur.wikipedia.org/wiki/Islam (full article)
>
> Anecdotal maybe, but clearly once you eyeball it, the Friulian Wikipedia has
> the edge in terms of comprehensiveness and quality over Quechua.  Trouble
> is, I don't see how this can be algorithmically determined using an
> automated process.  You're going to need humans to look at these things and
> make the determination, and really, who has time to do that for 200+
> wikipedias?  Not to mention the accusations of bias, poor article selection,
> and other such things that will be made.
>
> Using word counts is also going to be a problem too, languages like Norfuk
> that have lots of small particle words and the like will show inflated word
> counts compared to languages like Mandarin that don't have written words in
> the Western sense, languages like Gaeilge or Cymraeg where the very notion
> of "word" is a pretty nebulous one, or languages like Kalaallisut which
> compress many meanings and affixes into each and every word.
>
> Cheers,
> Craig Franklin
>
> > Date: Thu, 3 May 2007 11:30:53 +0200 (CEST)
> > From: Lars Aronsson <lars at aronsson.se>
> > Subject: Re: [Wikipedia-l] Quality vs Quantity
> > To: wikipedia-l at lists.wikimedia.org
> > Message-ID: <Pine.LNX.4.64.0705031128240.15940 at localhost.localdomain>
> > Content-Type: TEXT/PLAIN; charset=US-ASCII
> >
> > Craig Franklin wrote:
> >
> >> You show me a way of quantitatively comparing two Wikis using an
> >> automated process, and I'll show you a language or Wiki that
> >> will break it.
> >
> > Today the front page www.wikipedia.org measures and compares the
> > number of articles.  You can begin to "break" that method.  And
> > then you can figure out some method that might perhaps be slightly
> > better.  I've suggested two already.
> >
> >
> > --
> >  Lars Aronsson (lars at aronsson.se)
> >  Aronsson Datateknik - http://aronsson.se
>
>
> _______________________________________________
> Wikipedia-l mailing list
> Wikipedia-l at lists.wikimedia.org
> http://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>


-- 
Refije dirije lanmè yo paske nou posede pwòp bato.



More information about the Wikipedia-l mailing list