Note that the "Wikipedia 0.5" WikiProject on en:wp is tackling this issue with some energy, and could use more input and nominations:
http://en.wikipedia.org/wiki/Wikipedia:Version_0.5_Nominations
On 6/13/06, Michael Snow wikipedia@earthlink.net wrote:
Delirium wrote:
We've discussed on and off that it'd be nice to vet specific revisions of Wikipedia articles so readers can either choose to read only quality articles, or at least have an indication of how good an article is. This is an obvious prerequisite for a Wikipedia 1.0 print edition, and would be nice on the website as well.
There is a lengthy list of proposals here: http://meta.wikimedia.org/wiki/Article_validation_proposals
I wanted to try to rekindle the process by summarizing some of the proposals, which I think can be grouped into three main types, and then suggest some ideas on where to go from there.
Thank you for taking the time to address this.
Ditto.
Proposal #1: Fork or freeze, then bring up to our quality standards.
<
Some cons: Either disrupts normal editing through a freeze, or results in duplicated effort with a fork. Also is likely to result in a fairly slow process, so the reviewed version of each article may be replaced with an updated version quite infrequently; most articles will have no reviewed version, so doesn't do much for increasing the typical quality of presentation on the website.
Duplication of effort is bad. Branching, rather than forking, for a very limited time duration, makes sense for various end uses. For instance, a single good revision of an article might support a dozen branches each of which pared it down to a different length. We will need a better notion of 'article revision history' that supports branching, or non-linear revisions, to properly allow for this. I believe there is some theoretical work being done on distributed version control for text...
Michael Snow writes:
This option would work well, I think, for two possible uses. One is for offline distribution, since there's less point in creating a fork that will just be another online variation on the same theme.
It will be helpful to distinguish between branching (which ends after a point and either remerges with the main trunk or is at least never modified again) and forking (starting a separate revision history with different end goals, to continue indefinitely).
Each offline copy gets modified slightly for format reasons, anyway. The question is whether to provide for such branching within a central wikipedia database.
< The second possibility I think we would benefit from is the "freeze" option of
presenting stable, reviewed versions by default to users who do not log in.
This seems a poor and less-scalable way to present stable versions to users; see other methods below.
Delirium:
Proposal #2: Institute a rating and trust-metric system
Wikipedians rate revisions, perhaps on some scale from "complete crap" to "I'm an expert in this field and am confident of its accuracy and
Naive, single-scale ratings have many problems that I don't see being overcome. (The advogato suggestions are no panacaea.) Allowing groups of editors (self-selecting, auto-selected by user properties) to provide revision metadata that others can choose to see or not see as they please would be more scalable and less gameable. Some of these groups could provide metadata of the form 'decent and not vandalized content'.
Proposal #3: Extend a feature-article-like process
I'm not sure what you meant by your example -- for instance by 'work on revisions rather than articles', as the goal is still a better article (you can't change a historical revision) -- but this is effectively what the en:wp validation effort is attempting. This scales in that it can be split up among topic-centered WikiProjects. See for instance this list:
http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/WikiProjec...
Avoiding hard-coded metrics for quality, and encouraging editors active within a topic to work together to reach quality decisions, seems in line with how editing has evolved. This is like peer review and FAC review that already takes place, but can be applied to a wider spectrum of quality.
--SJ
On Wed, 14 Jun 2006, SJ wrote:
Note that the "Wikipedia 0.5" WikiProject on en:wp is tackling this issue with some energy, and could use more input and nominations:
http://en.wikipedia.org/wiki/Wikipedia:Version_0.5_Nominations
I have a related, orthogonal, request, regarding the process of assembling a CD or other snapshot. My interest is less to do with quality, and more to do with the process. My end result is either a CD, or a plucker document for PalmOS.
http://en.wikipedia.org/wiki/Wikipedia_talk:Version_0.5
I see the job as too big to be done via hand selection. I am also more interested in coverage than quality - I figure the quality will just get better. So, I want automated methods, both for selecting good coverage, and (less important at the moment) version selection. I also would like to target a size - 128Meg, 512Meg, 600Meg, 1Gig, 4Gig. I am also interested in post-processing - stripping redlinks, including ''main article'' references on core articles, like ''History of South Africa'' etc. I want to be able to tweak parameters, then press a button and get a new CD (from my downloaded XML dump of en and a picture collection, and possibly via a live mediawiki snapshot of that content).
This is what I have tried, mostly with available tools, and a bit of perl.
* Download recent XML dump. * Download list of articles from category (currently using the WPCD template) * Trim the full dump to the above article list (natively performed by mwdumper --exactlist) * Import this to mysql * import (full) category dump to mysql (sql dump downloaded from wikipedia) * Use mediawiki/maintenance/dumpHTML.php to convert this to HTML * perl script removes categories with less than four included items from HTML dump * redlink removal by un-anchoring HTML with class=new (red links) - but not Categories (that always seem to appear red)
Problems I have come across:-
* templates (particularly <nowiki>{{main|History of Country}}</nowiki> and the like) do not make it through dumpHTML.php. Maybe I have to hack the php. * Remove all the dross at the end, like inter-wiki links.
Could this be done by tweaking the CSS from dumpHTML ?
Cheers, Andy!
SJ wrote:
Note that the "Wikipedia 0.5" WikiProject on en:wp is tackling this issue with some energy, and could use more input and nominations:
http://en.wikipedia.org/wiki/Wikipedia:Version_0.5_Nominations
I wasn't aware of that; thanks for pointing it out. It does seem to be tackling a different problem though. My concern is with particular *revisions* of articles---that a specific version of the article has been reviewed and determined to be good. Wikipedia 0.5 seems to be using a process similar to feature articles where articles are tagged, which doesn't necessarily guarantee that the current version of any included article is actually good (although it may raise the probability).
Naive, single-scale ratings have many problems that I don't see being overcome. (The advogato suggestions are no panacaea.) Allowing groups of editors (self-selecting, auto-selected by user properties) to provide revision metadata that others can choose to see or not see as they please would be more scalable and less gameable. Some of these groups could provide metadata of the form 'decent and not vandalized content'.
I agree there are more things that could be presented, but I think we need a fairly simple display for non-logged-in users to see by default. More data available for those interested is certainly fine, but a passerby should, IMO, be able to tell at a glance how much credence to put in the article they're about to read.
Proposal #3: Extend a feature-article-like process
I'm not sure what you meant by your example -- for instance by 'work on revisions rather than articles', as the goal is still a better article (you can't change a historical revision) -- but this is effectively what the en:wp validation effort is attempting. This scales in that it can be split up among topic-centered WikiProjects. See for instance this list:
The issue with revisions instead of articles is that I think there should be some indication that a particular revision has been reviewed. A reader can then read it in confidence that someone has checked it, or maybe if the current version hasn't been checked even ask for the last reviewed version. At the moment there's no such process---even with a feature article, I can't necessarily trust any of the facts without reading through the history to make sure it hasn't been vandalized in the last 10 minutes with some sneaky change, or even completely wrecked in the days/weeks/etc. since the last review; and even if I know it's in flux, there's no easy way for me to find the last good version without wading through the history.
-Mark
wikipedia-l@lists.wikimedia.org