don't know if this issue came up already - in case it did and has been
dismissed, I beg your pardon. In case it didn't...
I hereby propose, that pbzip2 (https://launchpad.net/pbzip2) is used
to compress the xml dumps instead of bzip2. Why? Because its sibling
(pbunzip2) has a bug bunzip2 hasn't. :-)
Strange? Read on.
A few hours ago, I filed a bug report for pbzip2 (see
https://bugs.launchpad.net/pbzip2/+bug/922804) together with some test
results done even some few hours before that.
The results indicate that:
bzip2 and pbzip2 are vice-versa compatible each one can create
archives, the other one can read. But if it is for uncomressing, only
pbzip2 compressed archives are good for pbunzip2.
I propose compressing the archives with pbzip2 for the following
1) If your archiving machines are SMP systems this could lead to a
better usage of system ressources (i.e. faster compression).
2) Compression with pbzip2 is harmless for regular users of bunzip2,
so everything should run for these people as usual.
3) pbzip2-compressed archives can be uncompressed with pbunzip2 with a
speedup that scales nearly linearly with the number of CPUs in the
So to sum up: It's a no loose and two win situation if you migrate to
pbzip2. And that just because pbunzip2 is slightly buggy. Isn't that
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek
Human Language Technology Experts Sitz der Gesellschaft: Fürth
69216618 Mind Units Registergericht: AG Fürth, HRB-9201
Three months ago, "they" changed the way dumps are done. "They"
wanted stubs for every language done first and work on the rest of the
Old way: Two dumps a month for all languages except enwiki. A new
dump was started when one completed. Except for larger languages,
dumps were done in under a day.
New way: One dump a month for majority of languages. Dumps have to
be started manually. All dumps take 2-4 weeks to complete.
Last month, French, German, Japanese, Russian, Spanish and others were
It's about to be August 5. No dumps have started this month. Only
one dump has started in over two weeks.
Does anybody know how to get dumps working again.
I am working on WP-MIRROR 0.8. The concept is that WP-MIRROR will be ``WMF
in a microcosm''. To this end, I am enhancing the --dump option of
WP-MIRROR. I would like it to be able to generate XML/SQL dumps just like
WMF does. I will also add the capability to generate ZIM files, media
Could you please tell me what code is being used at WMF to generate your
monthly XML/SQL dumps. Also let me know if there is any relevant
I would like to consider rolling the dump generating code into a DEB
package, as was done for your `mwxml2sql'. As before, I would be happy to
write the man page.