What's the end goal? just to have a complete copy that can be accessed from map reduce?
Maybe you should do periodic dumps from a prod slave to e.g. csv and then load that into HDFS after each run.
On Thu, Oct 17, 2013 at 6:23 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
drdee mentioned a way to pass a comment with select statements to make them lower priority, is this documented somewhere?
I imagine you could either * take a slave out of rotation during a non-peak time of day * use a tampa slave in case the sites are using only eqiad hosts (would need to double check that it's actually not in rotation) (I guess take a host that's in https://noc.wikimedia.org/dbtree/ but not in https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php and then double check with springle) * or both
Depends on how heavy your use would be. I guess the comment method may also work but it couldn't take as much load as the other methods above. Maybe better to just not risk degrading prod at all.
Could we just stand up the MySQL backups and import them?
Sure, but if hot spares are sitting idle then maybe easier to use them instead.
Could we import from the xml dumps?
I'm not sure how much might be missing from those (although I guess you could also get access to the private XML parts)
Is there a way to do incremental importing once an initial load is done?
I guess this may be related to the incremental dumping that was part of GSoC this year. Not sure what the status is there.
-Jeremy