Hi,
don't know if this issue came up already - in case it did and has been
dismissed, I beg your pardon. In case it didn't...
I hereby propose, that pbzip2 (https://launchpad.net/pbzip2) is used
to compress the xml dumps instead of bzip2. Why? Because its sibling
(pbunzip2) has a bug bunzip2 hasn't. :-)
Strange? Read on.
A few hours ago, I filed a bug report for pbzip2 (see
https://bugs.launchpad.net/pbzip2/+bug/922804) together with some test
results done even some few hours before that.
The results indicate that:
bzip2 and pbzip2 are vice-versa compatible each one can create
archives, the other one can read. But if it is for uncomressing, only
pbzip2 compressed archives are good for pbunzip2.
I propose compressing the archives with pbzip2 for the following
reasons:
1) If your archiving machines are SMP systems this could lead to a
better usage of system ressources (i.e. faster compression).
2) Compression with pbzip2 is harmless for regular users of bunzip2,
so everything should run for these people as usual.
3) pbzip2-compressed archives can be uncompressed with pbunzip2 with a
speedup that scales nearly linearly with the number of CPUs in the
host.
So to sum up: It's a no loose and two win situation if you migrate to
pbzip2. And that just because pbunzip2 is slightly buggy. Isn't that
interesting? :-)
cheers,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek
Human Language Technology Experts Sitz der Gesellschaft: Fürth
69216618 Mind Units Registergericht: AG Fürth, HRB-9201
Hi everybody,
is there any chance that someone still has the enwiki-20071218-pages-articles.xml.bz2 dump or the one of November (even better)?
I'd need it for a research project and would be very grateful.
Thanks,
Tobias
Hi,
Since, the dump from August we updated the content and since then, our pages doesn't render properly.
Seems that it doesn't render properly the modules/templates contains in double bracelets {{}} but not always.
Soft redirects and some Infobox types are not getting displayed properly.
Since then, we updated the software to 1.23.3 and the extensions and we added more of them, but still no luck.
PRCE to 8.35
Here a few examples of the pages with this errors:
www.wow.com/wiki/Barack_Obamawww.wow.com/wiki/Robin_Williams
Thanks in advance
Diana
I am trying to get paired articles from Simple English Wikipedia and
English Wikipedia. For that I am looking for language links for Simple
English Wikipedia. Is it available?
with regards
Ditty