[Wikipedia-l] Look Who's Using Wikipedia

Neil Harris usenet at tonal.clara.co.uk
Fri Mar 16 18:25:21 UTC 2007


Lars Aronsson wrote:
> Delirium wrote:
>
>   
>> So I'd be surprised if we're "done" covering even the top-tier subjects 
>> before we get to 3 million articles, if even then.
>>     
>
> After "size" (number of "real" articles), the next measurement 
> must be "coverage".  But how do we define that?  Surely, coverage 
> was 0% on the first day and will be 100% if Wikipedia can answer 
> any possible question, but this is an unrealistic limit.  Maybe 
> Britannica is at 90% and Wikipedia at 80%?  But we don't know how 
> to define the "coverage" of an encyclopedia to begin with.
>
> Corpus linguists first collect a very large body (a corpus) of 
> text, then they count how many times each word occurs.  If every 
> 20th word (or 5% of the corpus) is "the", then a dictionary only 
> containing "the" will "cover" 5% of this corpus.  Most spelling 
> dictionaries can cover 95-98 % of any normal corpus.  But creating 
> a dictionary that covers the last few percents is hard, because 
> any normal text will contain a few very uncommon words.  There is 
> a very long, thin tail.
>
> Could we compile a "corpus" of questions, and see how large a 
> percentage of them can be answered by Wikipedia?  That probably 
> requires artificial intelligence, if not science fiction.  (Hey, 
> did somebody write a novel about this already?  Sci-fi can be a 
> great source of inspiration.)  I guess we could compile a list of 
> famous places and people, and see what percentage of them have 
> articles of reasonable length.  But having an entry on St 
> Petersburg, Florida, is less important than having entries on 
> London or Paris.  So the list must be weighted.  Search companies 
> like Google or MSN keep logs of every query that people type in, 
> so they know exactly how many times more often people search for 
> London than for St Petersburg, Florida.  That's the kind of 
> weights we would need to compute a coverage, I guess.
>
>
>   

http://en.wikipedia.org/wiki/Wikipedia:List_of_encyclopedia_topics

and the various other lists linked on that page, would be a good first 
place to start...

-- Neil






More information about the Wikipedia-l mailing list