Over the next month all dumps will be moved to run out of our data
center in Ashburn. Because the hardware we'll be using has more cores
and more memory, I can rearrange the number of workers that run at a
time, as well as restructure the way some dumps are produced.
Here's the plan:
-------
Phase 1 -- enwiki
This is in progress right now, runs will continue as they have but from
the new location
Phase 2 -- "big" wikis (ru, es, de, pt, pl, nl, fr, it, ja)
We are waiting for current runs of these to complete in the old
location; once they are done we will start them up in the new location
with 2 workers instead of 3 or 4 as we have had to date. However these
2 workers will do each wiki in 4 parts simultaneously, much as en wp
runs in 27 parts. We will recombine all files *except for the bz2 and
7z history files*.
***WARNING***
If you are working with these dumps, start planning now! This will take
effect in 10 days to two weeks from today.
The reason for reconfiguring these dump runs is that several of these
wikis take days and days to run (e.g. ru wiki takes 11 days and it's not
the slowest). While we could instead run more workers and let each wiki
run in serial taking many days, it's better for maintenance, security
updates, recovering from database server issues etc. to have a short run
time for each wiki.
Phase 3 -- the rest
Once the above are concluded, we'll wait for the workers for the rest of
the wikis to complete their current jobs, and then we'll start up jobs
in the new location. We should be able to safely run 6 workers as
opposed to the typical 3 or 4, but apart from the number, these jobs
will continue to run as they have in the past.
-------
This should mean that we'll have about a 2 week cycle for big wikis, and
about 10 days again for the rest, with the standard once a month for en
wiki.
And sometime this fall if we are lucky we'll switch it all up for
incremental dumps :-)
Ariel
Hi everyone.
I'm wondering if there's any update on the image dumps from
http://ftpmirror.your.org/. They were being generated on a somewhat
monthly basis from April 2012 - December 2012. The last complete set
was in December 2012 [1]. I believe there was a January 2013 post [2]
that indicated there was an issue at the hosting web site. Was there
an outcome, or is the situation still in progress?
The image dumps were incredibly useful (though also extremely large).
I can understand if monthly updates were too much, but I'm hoping they
will be available again -- perhaps on a quarterly / semi-annual basis?
[1] http://ftpmirror.your.org/pub/wikimedia/imagedumps/tarballs/fulls/20121201/
[2] http://lists.wikimedia.org/pipermail/xmldatadumps-l/2013-January/000665.html
For folks wondering why I haven't kicked off the English language
Wikipedia dumps already, I'm determined to run these from eqiad this
month, and trying to puppetize a bit more as I move things. Every month
I put off the last changes due to other priorities but this month I've
drawn a line in the sand. Having said that I won't delay longer than
Monday, nice and puppetized or not.
Ariel