Re: [Xmldatadumps-l] uploaded media for WMF projects available via rsync

6 Apr 2012


      Excellent, thanks guys. I'm assuming that I shouldn't have to worry about
malformed xml (hopefully, haha), which makes it even easier/faster.
Alex
On Apr 6, 2012 4:43 AM, "fox" fox91@anche.no wrote:
...
On 06/04/2012 10:15, emijrp wrote:
...
2012/4/3 Alex Buie <abuie@archive.org mailto:abuie@archive.org>
    I wonder how well python's lxml handles multigigabyte XML files...
    Guess we'll see :)
Pywikipediabot uses cElementTree for Python, which is fast as hell.
We've been using cElementTree for a lot of time in wiki-network
(https://github.com/volpino/wiki-network) a suite of scripts to analyize
dumps of wikipedia, in particular for social network analysis purposes.
It's really fast even on huge dumps, like enwiki-pages-meta-history
It's open source so you are welcome to use it and contribute to the
project!
--
f.
"Always code as if the guy who ends up maintaining your code will be a
  violent psychopath who knows where you live."
 (Martin Golding)
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
http://about.me/fox91

Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Re: [Xmldatadumps-l] uploaded media for WMF projects available via rsync