On 06/04/2012 10:15, emijrp wrote:
2012/4/3 Alex Buie <abuie@archive.org mailto:abuie@archive.org> I wonder how well python's lxml handles multigigabyte XML files... Guess we'll see :)
Pywikipediabot uses cElementTree for Python, which is fast as hell.
We've been using cElementTree for a lot of time in wiki-network (https://github.com/volpino/wiki-network) a suite of scripts to analyize dumps of wikipedia, in particular for social network analysis purposes. It's really fast even on huge dumps, like enwiki-pages-meta-history
It's open source so you are welcome to use it and contribute to the project!