[Wikiquality-l] Working on blame tool

jamesmikedupont at googlemail.com jamesmikedupont at googlemail.com
Sat Oct 17 19:54:55 UTC 2009


On Sat, Oct 17, 2009 at 9:50 PM, Brian J Mingus
<Brian.Mingus at colorado.edu> wrote:
>
>
> On Sat, Oct 17, 2009 at 12:40 PM, jamesmikedupont at googlemail.com
> <jamesmikedupont at googlemail.com> wrote:
>>
>> I was not able to find any examples.
>> I think that such a blame and trust tool belongs in git, not in
>> wikipedia because there are many other usages for it.
>> mike
>>
>> On Sat, Oct 17, 2009 at 8:33 PM, John Erling Blad
>> <john.erling.blad at jeb.no> wrote:
>> > There is a student at UiO looking into alternate trust coloring schemes.
>> > John Erling /jeblad
>> >
>> > jamesmikedupont at googlemail.com wrote:
>> >> On Sat, Oct 17, 2009 at 7:39 PM, Platonides <platonides at gmail.com>
>> >> wrote:
>> >>
>> >>> jamesmikedupont at googlemail.com wrote:
>> >>>
>> >>>> FYI,
>> >>>> I am working on a blame tool for wikipedia
>> >>>>
>> >>>> http://fmtyewtk.blogspot.com/2009/10/mediawiki-git-word-level-blaming-one.html
>> >>>> thanks,
>> >>>> mike
>> >>>>
>> >>> Importing an article history into git for using git blame doesn't seem
>> >>> like a good method...
>> >>>
>> >>
>> >> Well importing it just for blame is bad. I agree. I read about the
>> >> wikiblame.
>> >>
>> >> my purpose is to port the wikipedia over to git...
>> >>
>> >> mike
>
> So far all of the implementations of blame tools for the full  history dump
> of a wiki do not have the features of an ideal blame tool.
>
> Given an arbitrary string of text an ideal blame tool can scan Wikipedia's
> entire history - and ideally the history of all WMF wikis - and tell you the
> authors of that text.
>
> The design of such a system is essentially a search engine where each
> revision is a page with an associated author. The engine works iteratively,
> first finding all page blobs (where a page blob contains all text across all
> revisions for an article) that contain all of the words being searched for,
> and then iteratively working forwards in time on the revisions of that
> article in an effort to find the earliest authors. This isn't a complete
> spec, but it gives the general idea.

Nice description.

Well imagine a problem of finding and removing some copyrighted code
from linux kernel,
or some bug from software. We need to make sure that git has these features.



More information about the Wikiquality-l mailing list