Brion Vibber wrote:
On Sep 16, 2004, at 12:16 PM, samuel wrote:
I am about conduct a study of the quality of Wikipedia's content. In order to do this, I will need to randomly select articles, and for this, the "random page" feature appears a natural choice.
Depending on what you're trying to measure, simply taking a random selection of pages and looking at them in isolation may or may not be useful to you. Wikipedia is a permanent work in progress; its purpose is to generate content and grow and develop it over time. The actual distribution of a published or "stable" encyclopedia is a distinct project from the free-for-all editing on the wiki.
Due to Wikipedia's nature it can be fully expected that many, many pages at any given time will be found wanting, because they are new or unpopular subjects and thus insufficiently developed. By the same token, Wikipedia can be fully expected to cover many topics that more traditional encyclopedias don't cover at all, or cover in less detail.
Yes, I am quite familiar with the nature of Wikipedia and how it works. The object of the study is to measure the quality of the current (by "current" I mean the Wikipedia that exists at a certain, as of yet undetermined, moment in time) online version of Wikipedia, since that is what people actually use. There has been a lot of debate as to how reliable this version of Wikipedia is, and it is that I am interested in settling.
If you're interested in comparing the quality of Wikipedia articles against that of more traditional encyclopedias, consider also doing random selection of topics covered by those encyclopedias, then seeking the same particular topics in Wikipedia for comparison. (And vice-versa!)
This is indeed what will be done. It is for the "vice-versa" part that I considered using the "random page" feature to select articles.
However, I cannot find any information on how it works, and it is essential that such information in described in the methodology section of the study. The questions I have regarding the selection are the following:
- What counts as a page?
Returnable pages are those in article namespace (not talk pages, user pages, etc) and not marked as redirects.
- Is the selection done from all pages/articles or a subset of them?
From all articles that meet the above criteria, with one exception: on en.wikipedia.org the random selection has been hacked not to return pages last edited by the accounts Ram-Man or Rambot. This is a crude (and in my view unfortunate) hack to appease complaints that the random page function returned articles on cities and towns in the US too often.
Interesting. Is Ram-Man and Rambot actual users who submitted such changes, or are the accounts created for the specific purpose of identifying the articles to be ignored?
- Is there any weight attached to outcomes (for example, so that a
very popular or frequently edited article would have a higher chance of appearing as a random article)?
No.
According to other posters to the list, there are two ways that certain articles could have negative weights associated with them, the first being those that have cur_random = 0 and the other being those which share cur_random with at least one other article.
You are of course welcome to download the article database from http://download.wikimedia.org/ and select & sort articles in any fashion you see fit.
That might very well be the best idea, and what I will actually do in the end. I just thought it convenient to use the "random page" feature if it indeed worked in the way that the study demands.
-- brion vibber (brion @ pobox.com)
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l