Sheldon Rampton wrote:
Ray Saintonge wrote:
My first impression from your response is that you would end up with something even more complicated than what I would imagine. :- ( While I see the value of having ones preferences set to have a certain version as preferred, but the drive-by viewer just looking for information is not likely to know about this. He can, however, be guided by whether an article has (in big numbers) a reliability rating of 2.6 or 7.9.
I think user rating of article versions ought to be as simple as possible: one-click approval. It ought to be as easy as clicking on the "watch this article" tab to add an article to your watchlist. This means that all users do is decide yes or no for approval. Anything further adds complexity to the system but offers little gain in utility.
We did have somehing of this sort back in 2002(?) before we went to the 1.4 versionb of the software. Adding information to the watchlist is only good for established users. It does nothing for the casual visitor who just wants to look up information, and these are the people who are most concerned about reliability. Insiders already have their own ways of judging the worth of an article, but these are by no means uniform.
I don't think it's a good idea to try to have each user attach a numerical rating to their approval level or to break out approval into different categories. Both of those would just complicate the system. For example, if we have a numerical rating, what numerical range should we use for the scale? 1 through 3? 1 through 5? 1 through 10? If it's 1 through 10, does 1 mean "exceptional" and 10 mean "horrible" or is it vice versa? These are trivial questions, but they have to be answered, and the interface has to convey the answers to these questions so plainly that even new users don't get confused.
Although I prefer a 10 or 100 point scheme with the good articles at the high end, I realize that whatever scale we use is arbitrary. Whatever decision is made on these it should be easy to explain. Whether an article is good could be related to the general valuation of all articles. When people are asked to rate things the average rating tends to be higher than the expected average of 5 on a 10 point scale. An average rating would be equal to the average for all articles. To make things more readable the number rating of articles between +1 and -1 standard deviations of the average could be black. We could use orange of those between -2 and -1, red for those below -2, blue for those between +1 and +2, and green for those above +2.
With Wikipedia, moreover, users have the ability to edit articles themselves to come up with a version that they are willing to approve, so there's less need to rate versions according to DEGREE of approval. If someone sees an article that they would rate 7 on a scale of 1 to 10, they can just edit it themselves into a condition where they think it rates 10 and then approve that version. And if someone subsequently edits it into an even BETTER condition, they can just click to approve the new version, superseding their previous choice.
In the context of a simple edit war both combattants are likely to rate their own versions as a 10. :-)
A statistical mathematical model should be capable of marginalizing the effect that idiots have on the article. What goes into the formula would be public information, but they would still operate in the background.
I think it would be an even bigger mistake to try to set up a system that scores article versions according to multiple criteria such as "accuracy," "neutrality," "comprehensiveness," etc. Not only would this complicate the rating system and the user interface, it would inevitably be arbitrary in its choice of rating criteria, because there are any number of criteria that could be used, and the system would have to arbitrarily choose a subset: OK, we'll rate according to "accuracy" and "neutrality" but not according to "clarity" or "fairness" or "grammar" or "well-referenced" or "suitable for children" or "appropriate use of graphics."
I wouldn't call it a mistake, but I do fully appreciate the difficulties that you raise. Until a workable system is established for giving a single overall rating, it would be pointless to try adding any greater sophistication. The single rating would need to be debugged first. After that additional criteria could be added individually, or could be made to apply only to articles within a particular category.
Ec