Hi Nuria,

On Fri, Aug 22, 2014 at 2:06 AM, Nuria Ruiz <nuria@wikimedia.org> wrote:

Sean: 

Could explain a little bit why the following bug  affects EL data going public (for the schemas that have public data and can be made public more easily than others)


Performance.

The labs servers must replicate multiple clusters like analytics-store does, plus they must deal with large volumes of unpredictable load from labs users. Given that researchers with decent SQL can cause replag on analytics-store, imagine what a legion of labs users with poor SQL can achieve :-)

While I realize that only certain EL tables will be replicated, the following are the important points from Ops POV:

- Requesting this bug fix up front will improve both labsdb and analytics-store. It is likely that the relevant EL tables will have to replicate to labs /via/ analytics-store, so the effect will be two-fold (just as the penalty would be).

- Even though only specific tables will be replicated, that filtering necessarily occurs on the target slave servers and not on the master. This means the entire load from EL replication will hit labs or sanitarium servers, even though only the public EL tables will be processed and exposed.

- Optimization requests like bug 67450 are easy to put off because everyone (understandably) wants to get on with exciting new projects. In a well-meaning and respectful way, this is an opportunity to stonewall and request performance is addressed now.

BR
Sean

--
DBA @ WMF