Re: [Analytics] Importing data into Hadoop

17 Oct 2013

I would be surprised if sqoop doesn't let you dial back the load so you don't
crush the source DB. I've used it in production and it's been fine. 

There's also some interesting work done on sqoop 2 right now. 

-Toby

...
  On Oct 17, 2013, at 11:23 AM, Dan Andreescu
&lt;dandreescu(a)wikimedia.org&gt; wrote:

 Hi,

 I spoke to Dario today about investigating uses for our Hadoop cluster.  This is an
internal cluster but it's mirrored on labs so I'm posting to the public list in
case people are interested in the technology and hearing what we're up to.

 The questions we need to answer are :
 What's an easy way to import lots of data from MySQL without killing the source
servers?  We've used sqoop and drdee's sqoopy but these would hammer the prod
servers too hard we think.
 drdee mentioned a way to pass a comment with select statements to make them lower
priority, is this documented somewhere?
 Could we just stand up the MySQL backups and import them?
 Could we import from the xml dumps?
 Is there a way to do incremental importing once an initial load is done?
 Once we figure this out, the fun starts.  What are some useful questions once we have
access to the core mediawiki db tables across all projects?
 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Importing data into Hadoop