[Wikipedia-l] trust metrics

Sat Feb 14 15:10:52 UTC 2004

I am not advocating anything in this post, I'm just sharing some of my
thoughts over the past few days.

There are perennial discussions of trust metrics for things like
automatic sysopping and general "reputation management" system.  It is
rightly pointed out (by me and many others!) that such systems are
difficult to design properly and often easy to "game".  At the same
time, the hope is that a well-designed system would be scalable and
informative, while not oppressive or empowering of tyrants.

The system at slashdot is clearly broken, as many have said.

One system that I think does actually work, though, is the system at
Ebay.  At Ebay, after every transaction, buyers and sellers can leave
feedback for each other, positive or negative.  Whenever people want
to buy or sell something, they can look at the feedback of the
potential counterparty and see how much total feedback there is, how
many positive, how many negative.

For us, we have nothing like a 'transaction', but the system could be
generalized from that.  Each person could assign positive or negative
feedback to others.  Just a simple 'up' or 'down' rating with an
optional comment.  Some people might give an 'up' rating to everyone,
some might give a 'down' rating to everyone, some might just abstain
totally.  

But most would adopt a personal policy of giving mostly positives or
abstaining, reserving negatives for worst case scenarios.

Newcomers would have no rating at all, obviously.  Very prominent
people would have lots of ratings, mostly positive I would have to
assume.  I would probably have 95% positive rating, but not perfect,
since beloved though I am and obviously deserve to be (*wink*), I am a
target.

We'd likely see perfect positive ratings for people like Michael
Hardy, who keeps his nose to the grindstone editing topics that aren't
controversial, and who stays out of internal politics almost
completely as far as I know.

Some sysops have taken enormous and weighty responsibilities on
themselves to do important but controversial work like VfD or banning
trolls or mediating disputes or editing articles about the Middle
East.  We'd naturally expect them to get mixed reviews, but we might
be surprised... lots of people would give them positive ratings just
for doing those jobs, acknowledging the difficulty and risk involved.

Some virtues of this concept:

1.  Easy to program... it's just a single table in the database,
feedback, with 5 fields -- 'from', 'to', 'up/down', 'comment',
'timestamp'.  (The timestamp is so old feedback can be expired.  Maybe
positive votes expire after 1 year, negatives after 1 month, so as to
encourage more positivity!)

2.  Difficult to game -- it is NOT automated, so there's no way to
game the system by engaging in repetitive actions to score points.

3.  Easy for end users -- no complex system of approving or
disapproving of individual edits.  You can just give someone a smile
or a frown, as you wish, when you wish, or not.

4.  Rewards co-operativeness and friendliness and neutrality, because
to get a high rating, you have to please lots of people.

5.  It would be a relatively simple matter to also calculate a
"weighted" score, where the weight is based on the raw scores of those
who have rated this user.  What I mean is that if someone has a
perfect positive score, then their impact on the weighted score
calculations would be higher than the impact of someone with a high
negative score.  I consider this weighted score to be optional and
possibly dangerous, but it is at least easy enough to do.

6.  It *might* lead to a lot less demands and bickering.  It isn't
uncommon for someone to write to the lists or to me personally asking
that such-and-such prominent wikipedian be banned or desysopped.  This
is exhausting and divisive.  Possibly instead we could have a system
where people have a way to express displeasure, and to privately
advocate that others express displeasure, but in a way that doesn't
involve long drawn out flamewars.

7.  It is reflective of what we actually do in practice, i.e. it's
just a formalization of how we actually do operate.  Everyone has an
opinion, and when I think about who is good and prominent, I think
about my *own* opinion, but I also think about the opinions of
*others*.  But this can only be guessed at, not measured.  Tim
Starling held a quick poll on whether 172 should be de-sysopped, and
when it went heavily in one direction, he followed through.  If the
system I am talking about were in place, we'd instead just see 172's
positive rating start to evaporate.

-------

Some possible downsides, and there are many...

1.  Unintended consequences -- I'm imagining this working in one way,
but it might work in a totally different way in practice.  Perhaps the
system would encourage some bad behaviors that don't happen now, while
at the same time not discouraging any of our current bad behaviors.

2.  People might be dissuaded from taking controversial and brave
stands, if it's going to get them some negative feedback.

3.  People might be incentivized to create sham accounts just to give
themselves positive feedback.  This could be minimized, maybe, with a
second-order calculated metric which would take into account that
positive feedback from people with no feedback is essentially
meaningless.

4.  Well, I thought up the system, so I'm having a hard time seeing
other downsides, but I'm sure they exist.  :-)

-------

At least initially, such a system should have *no* real-world
consequences.  It would just be an indicator, which might be ignored
or not.  That's the way Ebay is -- you can have a pretty mediocre
rating, and it doesn't really affect anything automatically.  It may
inform others, though.

But with experience, we would surely organically come to some customs.
After some minimum number of feedbacks, with some percentage of them
positive, people could be sysopped.  When feedback gets sufficiently
negative, people could be desysopped.

--Jimbo