Sheldon Rampton wrote:
Ray Saintonge wrote:
My first impression from your response is that you
would end up with
something even more complicated than what I would imagine. :-
( While
I see the value of having ones preferences set to have a certain
version as preferred, but the drive-by viewer just looking for information
is not likely to know about this. He can, however, be guided by
whether
an article has (in big numbers) a reliability rating of 2.6 or 7.9.
I think user rating of article versions ought to be as simple as
possible: one-click approval. It ought to be as easy as clicking on
the "watch this article" tab to add an article to your watchlist.
This means that all users do is decide yes or no for approval.
Anything further adds complexity to the system but offers little gain
in utility.
We did have somehing of this sort back in 2002(?) before we went to the
1.4 versionb of the software. Adding information to the watchlist is
only good for established users. It does nothing for the casual visitor
who just wants to look up information, and these are the people who are
most concerned about reliability. Insiders already have their own ways
of judging the worth of an article, but these are by no means uniform.
I don't think it's a good idea to try to have
each user attach a
numerical rating to their approval level or to break out approval
into different categories. Both of those would just complicate the
system. For example, if we have a numerical rating, what numerical
range should we use for the scale? 1 through 3? 1 through 5? 1
through 10? If it's 1 through 10, does 1 mean "exceptional" and 10
mean "horrible" or is it vice versa? These are trivial questions, but
they have to be answered, and the interface has to convey the answers
to these questions so plainly that even new users don't get confused.
Although I prefer a 10 or 100 point scheme with the good articles at the
high end, I realize that whatever scale we use is arbitrary. Whatever
decision is made on these it should be easy to explain. Whether an
article is good could be related to the general valuation of all
articles. When people are asked to rate things the average rating tends
to be higher than the expected average of 5 on a 10 point scale. An
average rating would be equal to the average for all articles. To make
things more readable the number rating of articles between +1 and -1
standard deviations of the average could be black. We could use orange
of those between -2 and -1, red for those below -2, blue for those
between +1 and +2, and green for those above +2.
With Wikipedia, moreover, users have the ability to
edit articles
themselves to come up with a version that they are willing to
approve, so there's less need to rate versions according to DEGREE of
approval. If someone sees an article that they would rate 7 on a
scale of 1 to 10, they can just edit it themselves into a condition
where they think it rates 10 and then approve that version. And if
someone subsequently edits it into an even BETTER condition, they can
just click to approve the new version, superseding their previous
choice.
In the context of a simple edit war both combattants are likely to rate
their own versions as a 10. :-)
A statistical mathematical model should be capable of marginalizing the
effect that idiots have on the article. What goes into the formula
would be public information, but they would still operate in the background.
I think it would be an even bigger mistake to try to
set up a system
that scores article versions according to multiple criteria such as
"accuracy," "neutrality," "comprehensiveness," etc. Not only
would
this complicate the rating system and the user interface, it would
inevitably be arbitrary in its choice of rating criteria, because
there are any number of criteria that could be used, and the system
would have to arbitrarily choose a subset: OK, we'll rate according
to "accuracy" and "neutrality" but not according to
"clarity" or
"fairness" or "grammar" or "well-referenced" or
"suitable for
children" or "appropriate use of graphics."
I wouldn't call it a mistake, but I do fully appreciate the difficulties
that you raise. Until a workable system is established for giving a
single overall rating, it would be pointless to try adding any greater
sophistication. The single rating would need to be debugged first.
After that additional criteria could be added individually, or could be
made to apply only to articles within a particular category.
Ec