[Labs-l] Processing dumps with Wikimedia Utilities

Morten Wang nettrom at gmail.com
Mon May 12 15:30:08 UTC 2014


Hi Emilio,

You're probably aware of it, but one way to handle your own installs is to
use virtual environments: https://virtualenv.pypa.io/en/latest/

BTW, the Python utilities you pointed to is now deprecated in favour of a
newer version, but the newer version is Python 3.x only:
http://pythonhosted.org/mediawiki-utilities/

I have the older version of his utilities installed in my virtual
environment. When I processed the English dump about a month ago I used
tools-dev for testing and then submitted jobs to the job servers when it
was ready, running over the smaller split files of the dump for
parallelisation and less memory usage.

>From what I've heard the newer library is considerably faster than the 2.x
version, but I haven't yet had a project where I could test that.


Regards,
Morten



On 11 May 2014 13:10, Emilio J. Rodríguez-Posada <emijrp at gmail.com> wrote:

> Hi;
>
> I would like to process some Wikipedia dumps. The right place for this is
> tools-dev? I don't see Wikimedia Utilities[1] available there.
>
> Do I have to install it or this is a task for an admin?
>
> Regards
>
> [1] https://bitbucket.org/halfak/wikimedia-utilities/wiki/Home
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20140512/ba846fb8/attachment.html>


More information about the Labs-l mailing list