Maury,
perhaps I can help explain the behavior you saw in the UCSC system (I am one of the developers). New text is always somewhat orange, to signal to visitors that it has not yet been fully reviewed. The higher the reputation, the lighter the shade of orange, but orange it still is (I have no idea of how high was your computed reputation when you started writing that article).
Text background becomes white when other people revise it without drastically changing it: this indicates consensus. In our more recent code version, we also have a "vote" button; using this, text can more speedily gain trust without need for many revisions to occur. In a live experiment, where people can click on the vote button, I presume the trust of the text would raise more rapidly. Note that the code prevents double voting, or creating sock-puppet accounts to vote, etc etc.
So I don't think based on what you say that the system is tripping over diffs. It is simply considering new text less trusted, and more revised text more trusted, which is what we wanted. It appears however we don't do a very good job on the web site describing the algorithm (I guess we put most of the description work in writing the papers... we will try to improve the web site).
We don't measure "edit work" in number of edits, but in number of words changed. As you say, for our system, changing 1000 words in separate edits is the same (provided the edits are all kept, i.e., not reverted) as providing a single 1000-word contribution. We thought of giving a larger prize to larger contributions: precisely, of making the reputation increment proportional to n^a, where n is the number of words, and a > 1. This did not work well for the Wikipedia, because it ended up not rewarding enough the work of the many editors, who clean and polish the articles, thus making many small edits. Technically it would be trivial to change the code to include such a non-linear reward scheme (to adopt rewards proportional to n^a rather than n); whether it is desirable, I have no idea. It does not lead to better quantitative performance of the system, i.e., the resulting trust is not better at predicting future text deletions.
Luca
The USCS system did work, but gave me odd results. Apparently I have a very bad reputation, because when I look in the History at the first versions, which I wrote in entirety, it colored it all yellow!
Newer versions of the same articles had much more white, even though huge portions of the text were still from the origial. This may be due to diff problems -- I consider diff to be largely random in effectiveness, sometimes it works, but othertimes a single whitespace change, especially vertical, will make it think the entire article was edited.
My guess is that the system is tripping over diffs like this, and thus considering the article to have been re-written by another editor. Since this has happened, MY reputation goes down, or so I understand it.
I don´t think this system could possibly work if based on wiki's diffs. If its going to work it´s going to need to use a much more reliable system.
Another problem I see with it is that it will rank an author who´s contributions are 1000 unchanged comma inserts to be as reliable as an author who created a perfect 1000 character article (or perhaps rate the first even higher). There should be some sort of length bias, if an author makes a big edit, out of character, that´s important to know.
Maury
Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikipedia-l