Re: [Xmldatadumps-l] uploaded media for WMF projects available via rsync

6 Apr 2012

On 06/04/2012 16:01, Alex Buie wrote:
...
    Excellent, thanks guys. I'm assuming that I
shouldn't have to worry
 about malformed xml (hopefully, haha), which makes it even easier/faster. 
The dumps are well formed xml of course, the problem is that not always
the tags are in the same order or the revision are on chronological
order...and of course the revision text is a real mess!

I suggest you to have a look at our library and start by using it for
building simple scripts. It's really easy! All you have to do is to
write a method for every tag called process_tag (e.g.: process_title for
title tag). Have a look at
https://github.com/volpino/wiki-network/blob/master/revisions_page.py
for an example, it's a simple script that takes a pages-meta-history
dump and extracts the revisions of a specific page set to a csv file.

Feel free to write me for more information ;)

-- 
f.

  "Always code as if the guy who ends up maintaining your code will be a
   violent psychopath who knows where you live."
  (Martin Golding)

()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments

http://about.me/fox91

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Xmldatadumps-l] uploaded media for WMF projects available via rsync