Dear All,
We plan to release the code under the GPL (I assume this is fine for
MediaWiki; otherwise, we can release it under what they wish if there is
interest into an integration). We haven't done a release yet mainly because
the code is in flux (we just finished implementing trust; we will now tinker
with it a bit until we are satisfied). Note however, that the code is
currently written to batch-process .xml dumps, where all the revisions of a
page are given sequentially. The reputation we use is well-suited to become
an on-line system (the amount of computation to do for every edit is small),
but that would require some code modifications, since the order of
presentation of the revisions would be different (every time someone edits).
The main thing that would be needed to implement this as an on-line
(real-time) system in WikiMedia is to store, for each page, a little bit of
extra information in some auxiliary DB table (such as trust and version of
origin for each word of the last revision; reputation value for each
author).
The information would be much smaller than the total history size; it would
be essentially proportional to the size of the last revision (so the storage
requirements would not change much).
To implement this system with minimal disruption to the current Wikipedia
servers, one could make sure that every time the main Mediawiki servers get
an edit, they forward a copy of the edit to another server, which runs the
reputation/trust stuff, and which can then serve the colored pages to
interested visitors. This way, one could bring the trust/reputation system
online with minimal disruption to the Wikipedia servers (another advantage
of not requiring user-to-user ratings).
As for the namespaces, yes, we know, we just analyze the main name space (or
at least we try -- let me know if you find some article in the demo that
does not belong there).
Best, Luca
On 7/31/07, Daniel Mayer <maveric149(a)yahoo.com> wrote:
--- Luca de Alfaro <luca(a)soe.ucsc.edu> wrote:
Thanks for the comments! Yes, we very much
wanted a system that
- does not change the day-to-day Wikipedia experience (it worked so
well so far, let's not change what worked; would people be put off,
or
strange behaviors encouraged, by user-to-user
ratings?),
- encourages lasting content, but does not punish people whose
content
is rewritten/improved: people are mainly
punished for reversions or
partial
reversions.
- does not display reputation associated with authors (newbies to the
Wikipedia provide a good share of the factual content, as they
include many
domain experts, so it's important not to
put them off)
- but still gives useful information to visitors on the trust of text
(and lots more can be done, e.g., getting on request the last high
trust version, ...)
As you point out, getting text diff to work is not trivial, and it took
us a
long time to get something we liked; we had to
write it from scratch...
the
idea is given in the WWW07 paper: a greedy
algorithm, that matches
longest
substrings first, giving however a bias in favor
of substrings that
occur in
the same relative position in the pages.
Moreover, we keep track not
only
of the text present in a page (the
"live" text), but also of the text
that
used to be present, but has been deleted (the
"dead" text). If you
don't do
this, reverts (and partial reverts) are not dealt
with correctly.
We think that even better can be done, in fact (everything can always be
improved), but we haven't had a chance yet.
Please continue your work! :) Also, will UCSC allow you to license your
work under the GPL or a
compatible license? It would be a shame if your work could not be
incorporated into MediaWiki one
day.
Oh - I'm sure you already thought of this, but please make sure this tool
only works on the main
namespace since that is where all the articles (content) hang out. Your
analysis will be skewed if
it looks at non-article content since much of which tends to be preserved
as is and without
modification for years.
I also don't want to reward those who do almost all their editing outside
of articles by giving
their edits an incorrectly higher rank. We really need to encourage
editing of content, not
arguing on talk and policy pages.
-- mav
____________________________________________________________________________________
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search
http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs…