What's the end goal? just to have a complete copy that can be accessed
from map reduce?
Maybe you should do periodic dumps from a prod slave to e.g. csv and
then load that into HDFS after each run.
On Thu, Oct 17, 2013 at 6:23 PM, Dan Andreescu <dandreescu(a)wikimedia.org> wrote:
drdee mentioned a way to pass a comment with select
statements to make them
lower priority, is this documented somewhere?
I imagine you could either
* take a slave out of rotation during a non-peak time of day
* use a tampa slave in case the sites are using only eqiad hosts
(would need to double check that it's actually not in rotation) (I
guess take a host that's in
https://noc.wikimedia.org/dbtree/ but not
in
https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php and
then double check with springle)
* or both
Depends on how heavy your use would be. I guess the comment method may
also work but it couldn't take as much load as the other methods
above. Maybe better to just not risk degrading prod at all.
Could we just stand up the MySQL backups and import
them?
Sure, but if hot spares are sitting idle then maybe easier to use them instead.
Could we import from the xml dumps?
I'm not sure how much might be missing from those (although I guess
you could also get access to the private XML parts)
Is there a way to do incremental importing once an
initial load is done?
I guess this may be related to the incremental dumping that was part
of GSoC this year. Not sure what the status is there.
-Jeremy