XML dumps have been resumed, one thread only; if they look ok in another
12 hours or so I'll start multiple batches and they will run as usual.
Old bad dump files have been moved out of the way.
There is no reason to think that the dumps running now won't look fine;
this is just me being cautious.
Dumps have been halted for 1-2 days while code fixes get merged into the
deployment branch of our code (so that we need not have the host that
runs them removed from getting regular updates).
We'll also be running a job to locate and remove the broken dumps that
have accumulated in the past 10 (?) days, priior to restarting them.
Is there anything that can be done to alleviate that problem?
By the way, what's the point of producing .bz2 version of the
pages-meta-history.xml dump? Is it easier on the system to produce .bz2
first and .7z after that? From the user's perspective I can tell that .7z is
all I need, there is simply no point in working with .bz2 (if .7z is
-- Regards, Dmitry
The database dump progress page (
http://dumps.wikimedia.org/backup-index.html) seems to indicate that no dump
has been made for more than a week for any Wikipedia.
The first line is about the enwiki dump which is still in progress and seems
to be updated.
But all the other lines are dated back to 2010-07-06 or older.
Este mensaje le ha llegado mediante el servicio de correo electronico
que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistem
a Nacional de Salud. La persona que envia este correo asume el compromiso de
usar el servicio a tales fines y cumplir con las regulaciones establecidas
I had some questions about the order or pages and revisions in the dump.
As I understand, the order is according to the respective IDs. But
where do these IDs come from? Are they the keys of the corresponding
table in the database? So then they are more or less in order of
creation? If that's the case, why does the dump begin with pages with
titles mostly beginning with "A"?
I was doing a bit of analysis of the dump
enwiki-20100130-pages-meta-history.xml.7z. What I found to my surprise
is that there are (at least) 7 million pages in the main namespace. I
got this figure by grepping for page titles that do not contain a ":"
character. Is this really the case or am I missing something? I'd seen
some Wikimedia stats that said the number of articles currently is about
3.2 million, so I'm not sure why I'm seeing so many pages in the dump.