---------- Forwarded message ---------- From: jamesmikedupont@googlemail.com jamesmikedupont@googlemail.com Date: Sun, Oct 18, 2009 at 3:39 AM Subject: Re: [Foundation-l] Wikipedia meets git To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
see my new blogpost word leve blaming for wikipedia via git and perl ... http://fmtyewtk.blogspot.com/2009/10/mediawiki-git-word-level-blaming-one.ht...
Next step is ready :
1. I have a single script that will pull a given article and check in the revisions into git, it is not perfect, but works.
http://bazaar.launchpad.net/~jamesmikedupont/+junk/wikiatransfer/revision/8 you run it like this,from inside a git repo :
perl GetRevisions.pl "Article_Name"
git blame Article_Name/Article.xml git push origin master
The code that splits up the line is in Process File, this splits all spaces into newlines. that way we get a word level blame.
if ($insidetext) { ## split all lines on the space s/(\ )/\\n/g;
print OUT $_; }
The Article is here: http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_de...
here are the blame results. http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_de...
Problem is that github does not like this amount of processor power begin used and kills the process, you can do a local git blame.
Now we have the tool to easily create a repository from wikipedia, or any other export enabled mediawiki.
mike
_______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
wiki-research-l@lists.wikimedia.org