[Wikipedia-l] Look Who's Using Wikipedia
Neil Harris
usenet at tonal.clara.co.uk
Fri Mar 16 18:25:21 UTC 2007
Lars Aronsson wrote:
> Delirium wrote:
>
>
>> So I'd be surprised if we're "done" covering even the top-tier subjects
>> before we get to 3 million articles, if even then.
>>
>
> After "size" (number of "real" articles), the next measurement
> must be "coverage". But how do we define that? Surely, coverage
> was 0% on the first day and will be 100% if Wikipedia can answer
> any possible question, but this is an unrealistic limit. Maybe
> Britannica is at 90% and Wikipedia at 80%? But we don't know how
> to define the "coverage" of an encyclopedia to begin with.
>
> Corpus linguists first collect a very large body (a corpus) of
> text, then they count how many times each word occurs. If every
> 20th word (or 5% of the corpus) is "the", then a dictionary only
> containing "the" will "cover" 5% of this corpus. Most spelling
> dictionaries can cover 95-98 % of any normal corpus. But creating
> a dictionary that covers the last few percents is hard, because
> any normal text will contain a few very uncommon words. There is
> a very long, thin tail.
>
> Could we compile a "corpus" of questions, and see how large a
> percentage of them can be answered by Wikipedia? That probably
> requires artificial intelligence, if not science fiction. (Hey,
> did somebody write a novel about this already? Sci-fi can be a
> great source of inspiration.) I guess we could compile a list of
> famous places and people, and see what percentage of them have
> articles of reasonable length. But having an entry on St
> Petersburg, Florida, is less important than having entries on
> London or Paris. So the list must be weighted. Search companies
> like Google or MSN keep logs of every query that people type in,
> so they know exactly how many times more often people search for
> London than for St Petersburg, Florida. That's the kind of
> weights we would need to compute a coverage, I guess.
>
>
>
http://en.wikipedia.org/wiki/Wikipedia:List_of_encyclopedia_topics
and the various other lists linked on that page, would be a good first
place to start...
-- Neil
More information about the Wikipedia-l
mailing list