[Xmldatadumps-l] another month, another dump. ho hum :-P

Eric Sun paradox at gmail.com
Thu Sep 8 23:05:21 UTC 2011


Just to confirm, the enwiki-20110901-pages-articles.xml.bz2 file is the
concatenation of all those sub-files, right?

Would it be possible to restore the filename of this file
to enwiki-latest-pages-articles.xml.bz2 for consistency with all the other
wikipedias?

For example, the latest full dump in
http://dumps.wikimedia.org/dewiki/latest/
is called dewiki-latest-pages-articles.xml.bz2 and it's the same in all
other languages.

Thanks,
Eric

On Thu, Sep 8, 2011 at 12:22 AM, Ariel T. Glenn <ariel at wikimedia.org> wrote:

> The September en wikipedia dumps are done.  Folks who use them, note
> that this is the first run with the generation of a pile of smaller
> files.  The naming scheme as you will have noticed has an additional
> string: -p<first-page-id-contained>p<last-pageid-contained>  Expect the
> specific groupings to change from one run to the next; it's time-based,
> rather than based on the number of pages or revisions.
>
> You may notice a gap of a few numbers between files; this would indicate
> that those pages were deleted and not included in the dump at all.
>
> Since there were no issues with the network, database servers, broken MW
> deployments etc., the run finished without any need for restarts of a
> particular step; this is probably the fastest we'll ever see it run, in
> a little under 8 days.
>
> Any issues, please let me know.  I expect people will need a script to
> download these files easily; didn't someone on this list have a tool in
> the works?
>
> Ariel
>
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-l/attachments/20110908/360d371f/attachment-0001.htm 


More information about the Xmldatadumps-l mailing list