Hi Nuria,
On Fri, Aug 22, 2014 at 2:06 AM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
Sean:
Could explain a little bit why the following bug affects EL data going
public (for the schemas that have public data and can be made public more
easily than others)
https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
Performance.
The labs servers must replicate multiple clusters like analytics-store
does, plus they must deal with large volumes of unpredictable load from
labs users. Given that researchers with decent SQL can cause replag on
analytics-store, imagine what a legion of labs users with poor SQL can
achieve :-)
While I realize that only certain EL tables will be replicated, the
following are the important points from Ops POV:
- Requesting this bug fix up front will improve both labsdb and
analytics-store. It is likely that the relevant EL tables will have to
replicate to labs /via/ analytics-store, so the effect will be two-fold
(just as the penalty would be).
- Even though only specific tables will be replicated, that filtering
necessarily occurs on the target slave servers and not on the master. This
means the entire load from EL replication will hit labs or sanitarium
servers, even though only the public EL tables will be processed and
exposed.
- Optimization requests like bug 67450 are easy to put off because everyone
(understandably) wants to get on with exciting new projects. In a
well-meaning and respectful way, this is an opportunity to stonewall and
request performance is addressed now.
BR
Sean
--
DBA @ WMF