[Foundation-l] Wikipedia meets git
jamesmikedupont at googlemail.com
jamesmikedupont at googlemail.com
Sat Oct 17 16:39:48 UTC 2009
see my new blogpost word leve blaming for wikipedia via git and perl ...
http://fmtyewtk.blogspot.com/2009/10/mediawiki-git-word-level-blaming-one.html
Next step is ready :
1. I have a single script that will pull a given article and check in
the revisions into git,
it is not perfect, but works.
http://bazaar.launchpad.net/~jamesmikedupont/+junk/wikiatransfer/revision/8
you run it like this,from inside a git repo :
perl GetRevisions.pl "Article_Name"
git blame Article_Name/Article.xml
git push origin master
The code that splits up the line is in Process File, this splits all
spaces into newlines.
that way we get a word level blame.
if ($insidetext)
{
## split all lines on the space
s/(\ )/\\\n/g;
print OUT $_;
}
The Article is here:
http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/article.xml
here are the blame results.
http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/wordblame.txt
Problem is that github does not like this amount of processor power
begin used and kills the process, you can do a local git blame.
Now we have the tool to easily create a repository from wikipedia, or
any other export enabled mediawiki.
mike
More information about the foundation-l
mailing list