For future
dumps, we might have to resort to some form of snapshot
server that is fed all updates either from memcaches or mysqls. This
allows for a live backup to be performed, so it's useful for not just
dumps.
Possibly but the crux of it is simply the page text from external
storage. Fetching the meta content while long is very short in the grand
scheme of things.
Is it possible to suspend any individual slaves
temporarily during off
peak hours to flush the database to disk and then copy the database
files to another computer? If not, we may still be able to use a
"stale" database files copied to another computer, as long as we only
use data from it that is at least a few days old, so we know that it's
been flushed to disk (not sure how mysql flushes the data...).
Spinning down a slave wont help us much since external storage is the
slowdown. But mirroring that content elsewhere might be the way to go.
External storage by itself is just a set of mysql db's. I'm curious to
see if there might be a better storage sub system to optimize for this.
AFAIK External Storage is used with direct assignation. A wiki gets
assigned an ES cluster and uses it for a long period of time. Thus from
one dump to the next all text will be at the same ES cluster with high
probability.
Could the dump be run on one of the current ES cluster slaves?
Moving the old dump might be a greater problem than getting the articles
(though it's much 'easier' to move, since everything is on a single data
block and pipelines nicely) but the dumper machine could became an ES slave.