Those of you following along will notice that dewiki and wikidatawiki have
more files than usual for the page content dumps (pages-meta-history).
We'll have more of this going forward; if I get the work done in time,
starting April we'll split up these jbos ahead of time into small files
that can be rerun right then when they fail, rather than waiting for
MediaWiki to split up the output based on run time and wait for a set of MW
jobs to complete before retrying failures. This will mean more resiliency
when dbs are pulled out of the pool for various reasons (schema changes,
Later on during the current run, I hope we will see dumps of magic words
and namespaces, provided as json files. Let me put it this way: the code
is tested and deployed, now we shall see.
At this very moment, status of a given dump can be retrieved via a file in
the current run directory: <wikiname>/20170320/dumpstatus.json These files
are updated frequently during the run. You can also get the status of all
current runs at https://dumps.wikimedia.org/index.json Thanks to Hydriz
for the idea on how a status api could be implemented cheaply. This will
probably need some refinement, but feel free to play. More information at
The last of the UI updates went live, thanks to Ladsgroup for all of those
fixups. It's nice to enter the new century at last :-)
And finally, we moved all the default config info out into a yaml file
(Thanks to Adam Wight for the first version of that changeset). There were
a couple hiccups with that, which resulted in my starting the en wikipedia
run manually for this run, though via the standard script.
Again thanks to Ladsgroup, this is a change to the per-dump index.html
page, and you can see sample screenshots here:
Please weigh in on the ticket. I'd like to get any issues resolved and
have this in play by the time the next dump run starts on March 20; this
run is already in process so we're too late for that.
For those who like torrent and download dumps files, good news!
I add the torrent for "enwiki-20170301-pages-meta-current.xml.bz2" and "enwiki-20170301-pages-articles.xml.bz2", also a bonus with all files for day 20170220: "enwiki-20170220-all-files".
Check on: https://meta.wikimedia.org/wiki/Data_dump_torrents
Not forget to seed, please.
Thank you for your attention,
Felipe L. Ewald