On Sep 16, 2004, at 12:16 PM, samuel wrote:
I am about conduct a study of the quality of Wikipedia's content. In order to do this, I will need to randomly select articles, and for this, the "random page" feature appears a natural choice.
Depending on what you're trying to measure, simply taking a random selection of pages and looking at them in isolation may or may not be useful to you. Wikipedia is a permanent work in progress; its purpose is to generate content and grow and develop it over time. The actual distribution of a published or "stable" encyclopedia is a distinct project from the free-for-all editing on the wiki.
Due to Wikipedia's nature it can be fully expected that many, many pages at any given time will be found wanting, because they are new or unpopular subjects and thus insufficiently developed. By the same token, Wikipedia can be fully expected to cover many topics that more traditional encyclopedias don't cover at all, or cover in less detail.
If you're interested in comparing the quality of Wikipedia articles against that of more traditional encyclopedias, consider also doing random selection of topics covered by those encyclopedias, then seeking the same particular topics in Wikipedia for comparison. (And vice-versa!)
However, I cannot find any information on how it works, and it is essential that such information in described in the methodology section of the study. The questions I have regarding the selection are the following:
- What counts as a page?
Returnable pages are those in article namespace (not talk pages, user pages, etc) and not marked as redirects.
- Is the selection done from all pages/articles or a subset of them?
From all articles that meet the above criteria, with one exception: on en.wikipedia.org the random selection has been hacked not to return pages last edited by the accounts Ram-Man or Rambot. This is a crude (and in my view unfortunate) hack to appease complaints that the random page function returned articles on cities and towns in the US too often.
- Is there any weight attached to outcomes (for example, so that a
very popular or frequently edited article would have a higher chance of appearing as a random article)?
No.
You are of course welcome to download the article database from http://download.wikimedia.org/ and select & sort articles in any fashion you see fit.
-- brion vibber (brion @ pobox.com)