I'm seeing some issues with the history phase of the wikidata dumps taking a huge amount of memory and causing the server they are on to swap. I've shot the jobs and left less worker running on the one host for now; I'll investigate in depth tomorrow.
Ariel
There is a memory leak when we use spawned fetchTextPass.php, this only affects wikis with millions of respawns. Of course there is only one f those, wikidata. I restarted this yesterdya morning without spawning a separate process and right now it's about 2/3 of the way done for the May run. Once that completes successfully I'll get the rest of that run going and eventually backtrack and get the missing bits of the previous run as well.
Ariel
Στις 16-05-2013, ημέρα Πεμ, και ώρα 18:53 +0300, ο/η Ariel T. Glenn έγραψε:
I'm seeing some issues with the history phase of the wikidata dumps taking a huge amount of memory and causing the server they are on to swap. I've shot the jobs and left less worker running on the one host for now; I'll investigate in depth tomorrow.
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Things are looking glum.
First off I just discovered that I managed to run against the wrong stubs file for the one full history dump. OK it's fixable, I just need to move it to the right directory and update the status file etc. But more troubling is that even without spawning a separate job, we see memory increase when the python wrapper is used. All the python wrapper does is call the php script with a Popen. Simple stuff. And yet...
I'll start up the step by hand with the php command tomorrow as a temporary measure, and look for another workaround for the long term.
Ariel
Στις 22-05-2013, ημέρα Τετ, και ώρα 18:26 +0300, ο/η Ariel T. Glenn έγραψε:
First off I just discovered that I managed to run against the wrong stubs file for the one full history dump. OK it's fixable, I just need to move it to the right directory and update the status file etc.
Not in fact true, I was just too depressed from the memory issue to read my own commands correctly :-D
A.
We are back to running four workers for small wikis again. I'll have to shoot the wikidata job when it comes around, as I don't have a scriptable workaround in place yet for the history dumps. I'm running the missing bits of the 20130514 wikidata dump manually; they should complete in a few days.
Thanks for your patience,
Ariel
xmldatadumps-l@lists.wikimedia.org