On Sep 16, 2004, at 12:16 PM, samuel wrote:
> I am about conduct a study of the quality of
Wikipedia's content. In
> order to do this, I will need to randomly select articles, and for
> this, the "random page" feature appears a natural choice.
Depending on what you're trying to measure, simply taking a random
selection of pages and looking at them in isolation may or may not be
useful to you. Wikipedia is a permanent work in progress; its purpose
is to generate content and grow and develop it over time. The actual
distribution of a published or "stable" encyclopedia is a distinct
project from the free-for-all editing on the wiki.
Due to Wikipedia's nature it can be fully expected that many, many
pages at any given time will be found wanting, because they are new or
unpopular subjects and thus insufficiently developed. By the same
token, Wikipedia can be fully expected to cover many topics that more
traditional encyclopedias don't cover at all, or cover in less detail.
If you're interested in comparing the quality of Wikipedia articles
against that of more traditional encyclopedias, consider also doing
random selection of topics covered by those encyclopedias, then seeking
the same particular topics in Wikipedia for comparison. (And
vice-versa!)
> However, I cannot find any information on how it
works, and it is
> essential that such information in described in the methodology
> section of the study. The questions I have regarding the selection
> are the following:
> 1) What counts as a page?
Returnable pages are those in article namespace (not talk pages, user
pages, etc) and not marked as redirects.
> 2) Is the selection done from all pages/articles
or a subset of them?
From all articles that meet the above criteria, with one exception: on
en.wikipedia.org the random selection has been hacked not to return
pages last edited by the accounts Ram-Man or Rambot. This is a crude
(and in my view unfortunate) hack to appease complaints that the random
page function returned articles on cities and towns in the US too
often.
> 3) Is there any weight attached to outcomes (for
example, so that a
> very popular or frequently edited article would have a higher chance
> of appearing as a random article)?
No.
You are of course welcome to download the article database from
http://download.wikimedia.org/ and select & sort articles in any
fashion you see fit.
-- brion vibber (brion @
pobox.com)