On 7/31/07, Daniel Mayer <maveric149@yahoo.com> wrote:

--- Luca de Alfaro <luca@soe.ucsc.edu> wrote:
> Thanks for the comments!  Yes, we very much wanted a system that
>    - does not change the day-to-day Wikipedia experience (it worked so
>    well so far, let's not change what worked; would people be put off, or
>    strange behaviors encouraged, by user-to-user ratings?),
>    - encourages lasting content, but does not punish people whose content
>    is rewritten/improved: people are mainly punished for reversions or partial
>    reversions.
>    - does not display reputation associated with authors (newbies to the
>    Wikipedia provide a good share of the factual content, as they include many
>    domain experts, so it's important not to put them off)
>    - but still gives useful information to visitors on the trust of text
>    (and lots more can be done, e.g., getting on request the last high
>    trust version, ...)
>
> As you point out, getting text diff to work is not trivial, and it took us a
> long time to get something we liked; we had to write it from scratch... the
> idea is given in the WWW07 paper: a greedy algorithm, that matches longest
> substrings first, giving however a bias in favor of substrings that occur in
> the same relative position in the pages.  Moreover, we keep track not only
> of the text present in a page (the "live" text), but also of the text that
> used to be present, but has been deleted (the "dead" text).  If you don't do
> this, reverts (and partial reverts) are not dealt with correctly.
> We think that even better can be done, in fact (everything can always be
> improved), but we haven't had a chance yet.

Please continue your work! :) Also, will UCSC allow you to license your work under the GPL or a
compatible license? It would be a shame if your work could not be incorporated into MediaWiki one
day.

Oh - I'm sure you already thought of this, but please make sure this tool only works on the main
namespace since that is where all the articles (content) hang out. Your analysis will be skewed if
it looks at non-article content since much of which tends to be preserved as is and without
modification for years.

I also don't want to reward those who do almost all their editing outside of articles by giving
their edits an incorrectly higher rank. We really need to encourage editing of content, not
arguing on talk and policy pages.

-- mav

      ____________________________________________________________________________________
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search
http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz