This is a very good question.  In general, we would be delighted to find a small wiki (i.e., smaller than the En / De / Fr ones) that wants to try this out, and we think that this is realistic.  We think that it would be essential to experiment first on smaller wikis to gain more experience with load, and how to distribute it on the hardware infrastructure, before jumping to the largest wikis.  This is kind of obvious of course.
Note that in any case you can throttle down the load of the extension without affecting the main wiki, as it is all implemented in asynchronous fashion (the main wiki never needs to wait for the extension when serving requests).

Let me now give a somewhat detailed answer on the load.

- When there is an edit, there is an asynchronous job started to analyze the edit.  The key word is asynchronous: it does not slow down the edit http processing on the main server.  Right now, the analysis job is running on the same machine that runs mediawiki; this can change if desired.  The analysis takes a fraction of a second, but requires reading 5-10 revisions from the database, and some other minor db access as well.  So clearly edits become more expensive, but even on the English Wikipedia (5 edits / second at most?) a single CPU would suffice.  As long as edits are a small percentage of the reads (which is true of most wikis), we do not think that the analysis of edits is a significant load.

- When you ask to look at trust information, it is very quick: we just read some alternative markup, rather than the standard one.

- For each revision that is analyzed for trust, mediawiki stores the original, unchanged revision, and we store in an additional database table the same revision, annotated for trust and text origin. If you want to have trust information for all revisions, this causes the db size to exand by a factor of roughly 2.5: not an issue for small wikis, but an issue for very large ones.   In practice, very few people are interested in the trust information for very old article versions, so we could keep the trust information only for the most recent 50 or so revisions (in fact, there is a better way to determine the threshold, but let's simplify).  This would reduce storage, and make it proportional to the number of articles rather than the number of revisions.  It would be quite easy for us to add a variable that enables such pruning; let me add it to our todo list.

- The hard job is when you have a big existing wiki, and you need to analyze all its past, to get the extension up to date.  We are trying to decide whether to build special tools that facilitate this "catch-up" starting from xml dumps; if the WMF expressed clear interest in our extension, we would consider it.  The current implementation can "catch up with the past" at something like 10 revisions / second; to limit db usage, one can throttle this down, e.g. to 4 revisions / second.  This "catch up" analysis can be run in the background, and can use a spare cpu connected to the db, or whatever.  I believe the main bottleneck would be the db load, rather than the cpu load (the cpu load can occur on a separate cpu, the db is shared with the wiki servers).  But if one accepts that the catch-up takes a bit of time, one can just throttle down the analysis; after all, this needs to be done only once for each wiki.

Suggestions and comments are welcome.

Luca







On Tue, Aug 26, 2008 at 5:33 PM, mike.lifeguard <mike.lifeguard@gmail.com> wrote:

Do we have any idea what server load is like? Is this something WMF could potentially deploy at this point?

 

Mike

 


From: Luca de Alfaro [mailto:luca@dealfaro.org]
Sent: August 23, 2008 10:56 PM
To: wikiquality-l@lists.wikimedia.org
Subject: [Wikiquality-l] WikiTrust v2 released: reputation and trust foryour wiki in real-time!

 

As some of you might remember, we have been working on author
reputation and text trust systems for wikis; some of you may have seen
our demo at WikiMania 2007, or the on-line demo
http://wiki-trust.cse.ucsc.edu/

Since then, we have been busy at work to build a system that can be
deployed on any wiki, and display the text trust information.
And we finally made it!

We are pleased to announce the release of WikiTrust version 2!

With it, you can compute author reputation and text trust of your
wikis in real-time, as edits to the wiki are made, and you can display
text trust via a new "trust" tab.
The tool can be installed as a MediaWiki extension, and is released
open-source, under the BSD license; the project page is
http://trust.cse.ucsc.edu/WikiTrust

WikiTrust can be deployed both on new, and on existing, wikis.
WikiTrust stores author reputation and text trust in additional
database tables.  If deployed on an existing wiki, WikiTrust first
computes the reputation and trust information for the current wiki
content, and then processes new edits as they are made.  The
computation is scalable, parallel, and fault-tolerant, in the sense
that WikiTrust adaptively fills in missing trust or reputation
information.

On my MacBook, running under Ubuntu in vmware, WikiTrust can analize
some 10-20 revisions / second of a wiki; so with a little patience,
unless your wiki is truly huge, you can just deploy it and wait a
bit. 
Go to http://trust.cse.ucsc.edu/WikiTrust for more information and for
the code!

Feedback, comments, etc are much appreciated!

Luca de Alfaro
(with Ian Pye and Bo Adler)


_______________________________________________
Wikiquality-l mailing list
Wikiquality-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikiquality-l