Hi.

A new version of the Python parser in WikiXRay, along with improved documentation, can be found here:

http://meta.wikimedia.org/wiki/WikiXRay_Python_parser

Basically, I've developed two flavors: the standard for those people who want an alternative to other tools for processing Wikipedia's dumps (including the text table). The other version is for research purposes, It ignores the text itself and extracts instead useful info on the fly.

Both flavors use extended inserts (you can tune the size and num. of rows) and the --monitor mode calls a db access module to avoid timeout errors.

Further improvements (--skipnamespaces and --inject, this one should be very easy) are on the way.

Best,

Felipe.



Sé un Mejor Amante del Cine
¿Quieres saber cómo? ¡Deja que otras personas te ayuden! .