[Foundation-l] Wikipedia meets git
Luca de Alfaro
luca at dealfaro.org
Sat Oct 17 23:48:37 UTC 2009
Dear James,
you are absolutely right that we were lacking demos: we worked flat out to
produce some, and if you visit http://wikitrust.soe.ucsc.edu/ , you can see
that there are now a couple of Wikipedias on which you can try this.
We wrote our own text analysis engine. The reason is that the typical diff
algorithms you find in git, svn, etc, are very fragile for the analysis of
wiki text:
- They are typically not able to deal with text reordering. If you swap
the order of two paragraphs, it will look to them as if you inserted one of
the two paragraphs. We wanted to be able to trace text across block moves.
- They typically analyze text across the two last revisions only. We
wanted to be able to remember which text used to be present, and has
subsequently been deleted, so that if the text is later reinserted, we can
still correctly attribute it to the original author. Otherwise, if I want
to look like the author of text, I can simply delete (or replace) the
content of a page, do a few quick-fire edits to confuse the system, and then
reinsert the content with some minor changes.
We took a lot of pain to make sure that the text attribution system works in
a robust way with respect to these kind of phenomena. I am sure it is not
perfect yet, and we welcome all feedback.
Luca
On Fri, Oct 16, 2009 at 5:17 AM, jamesmikedupont at googlemail.com <
jamesmikedupont at googlemail.com> wrote:
> On Fri, Oct 16, 2009 at 2:08 PM, Gerard Meijssen
> <gerard.meijssen at gmail.com> wrote:
> > Hoi,
> > After a minute of googling I find http://wikitrust.soe.ucsc.edu/home ..
> I am
> > sure it is there for you as well.
>
>
> Yes the page is there, it seems to be a good idea.
>
> only I am missing some html pages so that we can see what it looks
> like, a wordlevel blame.
> the colorized pages are missing.
>
> On this page: http://wikitrust.soe.ucsc.edu/home
> it says : "In the meantime, you can look at our list of colored pages,
> or look at screenshots of English Wikipedia pages analyzed by
> WikiTrust. " and the colored pages link to
> http://wikitrust.soe.ucsc.edu/index.php/Colored_pages which are
> missing....
>
> mike
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
More information about the foundation-l
mailing list