Awesome; thanks for such a detailed update! Can you give details on the query-over-multiple machines risk and what could cause it? I have no wish to respond to people going out of their way to prevent us breaking things by, ah, breaking things ;).

I knew there was a reason I was sending a bottle of single malt to Athens.


On 2 April 2014 18:24, Sean Pringle <springle@wikimedia.org> wrote:
Hi!

In another thread Oliver asked about the progress of One Machine To Rule Them All :-)

In fact it looks like it will now be two machines to rule them all, or rather, two machines to cooperatively rule them all in roughly equal capacity. I know, it doesn't have the same ring to it...

I posted the following to RT 6383, but who knows who reads RT, so here it is again:

-- quote --
An update on this. Some Analytics folk have probably already heard bits and pieces via mailing lists, but my fellow Opsen on RT duty rightly begin to wonder about this ticket.

We have procured dbstore100[12], both with which will be replicating shards S[1-7] into a single MariaDB 10 instance each, using the new multi-source replication. The boxes are still being setup (because recombining the shards requires full dump/reload, plus getting all seven in sync, plus compressing tables -- slow going). The x1 shard and event logging will replicate to dbstore too, but that's pending RT 7081. Analytics will have direct, but read-only, access to dbstore1002.

db1047, the current s1-analytics-slave, has the required disk space so it will likely become a slave to dbstore1002, or else make use of the MariaDB 10 CONNECT engine to access the data (like federation, but better than FEDERATED engine was, thanks to ECP: engine-condition-pushdown). As ever, Analytics will have read/write access to db1047 with scratch space.

The situation will result in:

- cross joins/unions on any wiki on either db1047 or dbstore1002
- ability to spread load across both boxes with a single SQL query
- less likely to block others due to locking
- less likely to cause replag

I'm happy to go into more technical detail if anyone is interested.

When will it be ready, you ask? :-)  Not until after the Ops meet in Athens, which realistically means: in May.
-- endquote --

The bit about spreading load across two machines with one query will require people to be a bit careful in designing the SQL. Alternatively you guys might simply choose to dictate which box should run expensive queries, to avoid tripping each other up.

Incidentally, MariaDB 10 has the Cassandra storage engine which might be of some use to you guys in time. But so far I've only been trialing CONNECT+ECP.

BR
Sean

--
DBA @ WMF

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Oliver Keyes
Research Analyst
Wikimedia Foundation