Hi,
don't know if this issue came up already - in case it did and has been
dismissed, I beg your pardon. In case it didn't...
I hereby propose, that pbzip2 (https://launchpad.net/pbzip2) is used
to compress the xml dumps instead of bzip2. Why? Because its sibling
(pbunzip2) has a bug bunzip2 hasn't. :-)
Strange? Read on.
A few hours ago, I filed a bug report for pbzip2 (see
https://bugs.launchpad.net/pbzip2/+bug/922804) together with some test
results done even some few hours before that.
The results indicate that:
bzip2 and pbzip2 are vice-versa compatible each one can create
archives, the other one can read. But if it is for uncomressing, only
pbzip2 compressed archives are good for pbunzip2.
I propose compressing the archives with pbzip2 for the following
reasons:
1) If your archiving machines are SMP systems this could lead to a
better usage of system ressources (i.e. faster compression).
2) Compression with pbzip2 is harmless for regular users of bunzip2,
so everything should run for these people as usual.
3) pbzip2-compressed archives can be uncompressed with pbunzip2 with a
speedup that scales nearly linearly with the number of CPUs in the
host.
So to sum up: It's a no loose and two win situation if you migrate to
pbzip2. And that just because pbunzip2 is slightly buggy. Isn't that
interesting? :-)
cheers,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek
Human Language Technology Experts Sitz der Gesellschaft: Fürth
69216618 Mind Units Registergericht: AG Fürth, HRB-9201
dumps.wikimedia.org, downloads.wikimedia.org will be down on Thursday
June 26 from 13.30 UTC until 14.30 UTC. While we expect the actual
downtime to be much less, we're blocking one hour just in case.
We will be moving it to a new rack in preparation for improved
bandwidth, and yes this mean raising download caps.
Services affected: dumps and pageview downloads, any other files hosted
on the server.
Dumps themselves will be stopped during the duration of the downtime.
They will be restarted as needed once the host is back on line.
Ariel
Dear Ariel,
I just pushed a patch <https://gerrit.wikimedia.org/r/#/c/139413/> for your
review.
It extends `mwxml2sql' so that it can handle the database schema for
`mediawiki' 1.24. The current `ariel' branch can only handle up to
`mediawiki' 1.21. I am presently preparing a DEB package of the patched
version.
Please let me know if you have any questions.
Sincerely Yours,
Kent
Hi there,
I wonder if anyone know when dumps fro may data will be ready?
Usually dumps preparation for data of previous month start on 2-8 of the
next month (http://dumps.wikimedia.org/enwiki/)
However June dumps preparation for May data not started yet
http://dumps.wikimedia.org/enwiki/latest/
--
Thank you.
Alex Druk
alex.druk(a)gmail.com
(775) 237-8550 Google voice
Hi to all,
I have a problem and I need your help.
It's about Dump of Wikipedia. I'm working in a project which uses Wikipedia
Database to display articles offline.
I need your help in two points:
- - First, the problem about the database in Ubuntu is that in my
Country we have several interruptions of electricity and it's very
difficult for us to maintain a quite installation with one interruption of
electricity every hour. We use to start again the installation after purge
the some table in the data base each time.
- 7 The second problem is that we have not a good bandwish of
internet to download dump of image of Wikipedia because it's too slow.
After try to resolve these problems, our evolution is still very slow.
Please we have a friend in USA who back in few weeks. Please can we send
you two hard discs to you for help us to install a local copy of Wikipedia
in French and English with image in articles in mediawiki? Please this
solution is the best for our environment and situation in this time.
Thank's for your support and assistance to these dumps for developer and
for the idea to make information accessible for all and open.
Tank's (sorry for english)
---------- Forwarded message ----------
From: wp mirror <wpmirrordev(a)gmail.com>
Date: Sun, Jun 1, 2014 at 4:36 PM
Subject: Re: [Xmldatadumps-l] fast dump importer and revision diff
calculator
To: Jeremy Baron <jeremy(a)tuxmachine.com>
Dear Jeremy,
I am interested. But I have a few questions:
o How much RAM is `copious'?
o Do you have any test results?
o Can it be used with `pages-articles' and `pages-meta-current'; or only
with `pages-meta-history'?
Sincerely Yours,
Kent
On Fri, May 30, 2014 at 6:31 AM, Jeremy Baron <jeremy(a)tuxmachine.com> wrote:
> fast dump importer and revision diff calculator.
>
> I believe it's really fast but needs copious RAM.
>
> https://github.com/makoshark/wikiq/blob/master/README
>
> -Jeremy
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>