[Foundation-l] moving forward on article validation

Wed Jun 14 02:54:55 UTC 2006

Delirium wrote:

> We've discussed on and off that it'd be nice to vet specific revisions 
> of Wikipedia articles so readers can either choose to read only 
> quality articles, or at least have an indication of how good an 
> article is.  This is an obvious prerequisite for a Wikipedia 1.0 print 
> edition, and would be nice on the website as well.
>
> There is a lengthy list of proposals here:
> http://meta.wikimedia.org/wiki/Article_validation_proposals
>
> I wanted to try to rekindle the process by summarizing some of the 
> proposals, which I think can be grouped into three main types, and 
> then suggest some ideas on where to go from there.

Thank you for taking the time to address this.

> Proposal #1: Fork or freeze, then bring up to our quality standards.
> ---
> Wikipedians would look around for articles that look reasonably good 
> (perhaps starting with feature articles) and nominate them to be 
> worked on.  Then either freeze them (by placing a notice or some sort 
> of technical measure), or else fork them off to a copy.  The articles 
> would then be checked for referencing, accuracy, grammar, and so on, 
> possibly only by users who've met some bar for participation in the 
> clean-up process, resulting in an article suitable for publication.  
> Forking or freezing is to ensure the cleanup process actually 
> terminates rather than trying to clean up a moving target; there are 
> of course pros and cons to forking vs. freezing.
>
> Some pros: Fairly straightforward; follows successful methods of 
> "stable release" management in the software-development world; allows 
> a certain amount of editorial work not normally suitable for an 
> in-progress encyclopedia (like cutting out an entire section because 
> it's too far from being done to go in the print version); is easy to 
> integrate "expert review" into as a last vetting step before it goes 
> out the door.
>
> Some cons: Either disrupts normal editing through a freeze, or results 
> in duplicated effort with a fork.  Also is likely to result in a 
> fairly slow process, so the reviewed version of each article may be 
> replaced with an updated version quite infrequently; most articles 
> will have no reviewed version, so doesn't do much for increasing the 
> typical quality of presentation on the website.

This option would work well, I think, for two possible uses. One is for 
offline distribution, since there's less point in creating a fork that 
will just be another online variation on the same theme. The second 
possibility I think we would benefit from is the "freeze" option of 
presenting stable, reviewed versions by default to users who do not log in.

> Proposal #2: Institute a rating and trust-metric system
> ---
> Wikipedians rate revisions, perhaps on some scale from "complete crap" 
> to "I'm an expert in this field and am confident of its accuracy and 
> high quality".  Then there is some way of coming up with a score for 
> that revision, perhaps based on the trustworthiness of the raters 
> themselves (determined through some method).  Once that's done, the 
> interface can do things like display the last version of an article 
> over some score, if any, or a big warning that the article sucks 
> otherwise (and so on).
>
> Some pros: Distributed; no duplicated effort; good revisions are 
> marked good as soon as enough people have vetted them; humans review 
> the articles, but the "process" itself is done automatically; most 
> articles will have some information about their quality to present to 
> a reader
>
> Some cons: Gameing-proof trust metric systems are notoriously hard to 
> design.

Aside from the manipulation problem, with this kind of approach I 
wonder, "To what end?" Simply attaching a number to things is not that 
interesting in and of itself. It needs to be put to some kind of use, 
and while that's certainly possible, I'm more excited about potential 
uses to which the other approaches better lend themselves.

Another potential concern I would point to with these is the possibility 
of what might be called grade inflation. People might well start 
criticizing the use of low scores as "biting newcomers" or something 
like that. This would be an unfortunate reversal of the current trend 
for featured articles, for which candidates have been held to 
progressively higher standards. It also would undermine our hopes that 
generally speaking, the content tends to improve over time.

> Proposal #3: Extend a feature-article-like process
> ---
> Extend a feature-article type process to work on revisions rather than 
> articles.  For example, nominate revision X of an article as a 
> featured article; improve it during the process until it gets to a 
> revision Y that people agree is good.  Then sometime later, nominate a 
> new revision Z, explain what the differences are, and discuss whether 
> this should supercede the old featured version.  Can also have 
> sub-featured statuses like "good" or "mediocre, but at least nothing 
> is outright wrong".  In principle can be done with no code changes, 
> though there are some that could ease things along greatly.
>
> Some pros: Gets at the effect of proposal #2 but with a flexible 
> human-run system instead of an automatic system, and therefore less 
> likely to be brittle.
>
> Some cons: Will need carefully-designed software assistance to keep 
> all the information and discussion manageable and avoid descending 
> into a morass of thousands upon thousands of messy talk pages

One of the weaknesses of directly modeling the featured article system 
is that it doesn't scale well. I suppose in a sense that's part of what 
you're saying, but you seem to suggest that it could be made to scale. 
That might be possible but right now I don't see how, could you perhaps 
elaborate on how you would design this? The suggestions I extrapolate 
from this outline would to my mind mostly add complexity to the system, 
and I'd expect them if anything to scale worse and not better.

> These are not necessarily mutually exclusive.

That's certainly true, although personally I mostly prefer elements of 
the first approach.

--Michael Snow