Over the next month all dumps will be moved to run out of our data
center in Ashburn. Because the hardware we'll be using has more cores
and more memory, I can rearrange the number of workers that run at a
time, as well as restructure the way some dumps are produced.
Here's the plan:
-------
Phase 1 -- enwiki
This is in progress right now, runs will continue as they have but from
the new location
Phase 2 -- "big" wikis (ru, es, de, pt, pl, nl, fr, it, ja)
We are waiting for current runs of these to complete in the old
location; once they are done we will start them up in the new location
with 2 workers instead of 3 or 4 as we have had to date. However these
2 workers will do each wiki in 4 parts simultaneously, much as en wp
runs in 27 parts. We will recombine all files *except for the bz2 and
7z history files*.
***WARNING***
If you are working with these dumps, start planning now! This will take
effect in 10 days to two weeks from today.
The reason for reconfiguring these dump runs is that several of these
wikis take days and days to run (e.g. ru wiki takes 11 days and it's not
the slowest). While we could instead run more workers and let each wiki
run in serial taking many days, it's better for maintenance, security
updates, recovering from database server issues etc. to have a short run
time for each wiki.
Phase 3 -- the rest
Once the above are concluded, we'll wait for the workers for the rest of
the wikis to complete their current jobs, and then we'll start up jobs
in the new location. We should be able to safely run 6 workers as
opposed to the typical 3 or 4, but apart from the number, these jobs
will continue to run as they have in the past.
-------
This should mean that we'll have about a 2 week cycle for big wikis, and
about 10 days again for the rest, with the standard once a month for en
wiki.
And sometime this fall if we are lucky we'll switch it all up for
incremental dumps :-)
Ariel