Yes it is, and the new naming scheme of the "latest" files is a bug, I
need to fix that. Grrr.
Ariel
Στις 08-09-2011, ημέρα Πεμ, και ώρα 16:05 -0700, ο/η Eric Sun έγραψε:
Just to confirm, the
enwiki-20110901-pages-articles.xml.bz2 file is
the concatenation of all those sub-files, right?
Would it be possible to restore the filename of this file
to enwiki-latest-pages-articles.xml.bz2 for consistency with all the
other wikipedias?
For example, the latest full dump
in
http://dumps.wikimedia.org/dewiki/latest/
is called dewiki-latest-pages-articles.xml.bz2 and it's the same in
all other languages.
Thanks,
Eric
On Thu, Sep 8, 2011 at 12:22 AM, Ariel T. Glenn <ariel(a)wikimedia.org>
wrote:
The September en wikipedia dumps are done. Folks who use
them, note
that this is the first run with the generation of a pile of
smaller
files. The naming scheme as you will have noticed has an
additional
string: -p<first-page-id-contained>p<last-pageid-contained>
Expect the
specific groupings to change from one run to the next; it's
time-based,
rather than based on the number of pages or revisions.
You may notice a gap of a few numbers between files; this
would indicate
that those pages were deleted and not included in the dump at
all.
Since there were no issues with the network, database servers,
broken MW
deployments etc., the run finished without any need for
restarts of a
particular step; this is probably the fastest we'll ever see
it run, in
a little under 8 days.
Any issues, please let me know. I expect people will need a
script to
download these files easily; didn't someone on this list have
a tool in
the works?
Ariel
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l