[WikiEN-l] And now, for something completely different

Mark Wagner carnildo at gmail.com
Wed Feb 8 05:49:09 UTC 2006

I've finally finished my survey of biographical articles on Wikipedia.  I
randomly selected 100 biographical articles, and evaluated them on a number
of points such as the century the subject lived in, the length and quality
of coverage, the number of sources cited, the number of links to the
article, and the number of Google hits.  The results are at

Not surprisingly, the average coverage of subjects is fairly poor.  64% of
articles were rated "low" or "stub", indicating they did not have even a
basic chronology of the subject's life, and 29% were rated "medium",
indicating a basic chronology but nothing more.  6% were rated "good", with
a relatively complete chronology, and one article was approaching "featured"
quality.  While doing the survey, one of the biographies was deleted for
lack of notability, one as being unverifiable, and two were listed as

Sourcing is about average, with 13% of articles citing sources, versus 15%
for articles in general.  Given the sample size, the difference is
insignificant.  The number of sources cited per article is higher, though,
at 2.8 sources per article for biographies, versus 2.3 sources per article
in general.  The difference is even more pronounced when you take into
account that in the general survey, one article provided almost half the
total sources.

A rough categorization indicates that about 43% of biographies are of
"cultural" figures such as writers, actors, musicians, and athletes.  25%
are of "political" figures such as politicans and aristocrats.  11% are
scholars, scientists, inventors, and the like, and 6% are biographies of
fictional characters.

Coverage is generally Western-centric, with 70% of biographies being of
people from Western Europe, North America, and Australia.  The majority of
biographies are for the 1800s and 1900s, with only 22% being of people who
are currently engaged in whatever they're famous for.

More abstract results from survey are that there is no obvious correlation
between article coverage and Google hits, between inbound links and article
coverage, or between inbound links and Google hits.  More recent subjects
tend to have more Google hits, but there are significant exceptions.  For
example, [[Sandro Botticelli]] is from the 1400s, but has 345,000 Google
hits, while [[Romeo Munoz Cachola]], from the 2000s, has only 7.

There is also no apparent correlation between article coverage and sources:
four of the six "good" or "high" articles have no sources, while seven of
the thirteen articles with sources were rated "low" or "stub".

While doing the survey, I came across 303 articles that were not
biographies.  Assuming the results I got were typical, about 25% of articles
on Wikipedia are biographies.  Another 46 are "pop culture" subjects such as
albums (12), bands (6), movies (6), books (7), or fictional objects (4).
There were five schools, and 16 Rambot-created articles.  An alarming number
of articles had serious problems, including two copyvios and two articles
that needed speedy deletion.

During the survey, "random page" gave me the same article twice: [[FX-6]].


More information about the WikiEN-l mailing list