Over the next month all dumps will be moved to run out of our data center in Ashburn. Because the hardware we'll be using has more cores and more memory, I can rearrange the number of workers that run at a time, as well as restructure the way some dumps are produced.
Here's the plan:
------- Phase 1 -- enwiki This is in progress right now, runs will continue as they have but from the new location
Phase 2 -- "big" wikis (ru, es, de, pt, pl, nl, fr, it, ja) We are waiting for current runs of these to complete in the old location; once they are done we will start them up in the new location with 2 workers instead of 3 or 4 as we have had to date. However these 2 workers will do each wiki in 4 parts simultaneously, much as en wp runs in 27 parts. We will recombine all files *except for the bz2 and 7z history files*.
***WARNING*** If you are working with these dumps, start planning now! This will take effect in 10 days to two weeks from today.
The reason for reconfiguring these dump runs is that several of these wikis take days and days to run (e.g. ru wiki takes 11 days and it's not the slowest). While we could instead run more workers and let each wiki run in serial taking many days, it's better for maintenance, security updates, recovering from database server issues etc. to have a short run time for each wiki.
Phase 3 -- the rest Once the above are concluded, we'll wait for the workers for the rest of the wikis to complete their current jobs, and then we'll start up jobs in the new location. We should be able to safely run 6 workers as opposed to the typical 3 or 4, but apart from the number, these jobs will continue to run as they have in the past. -------
This should mean that we'll have about a 2 week cycle for big wikis, and about 10 days again for the rest, with the standard once a month for en wiki.
And sometime this fall if we are lucky we'll switch it all up for incremental dumps :-)
Ariel
xmldatadumps-l@lists.wikimedia.org