Hi everybody,

some updates about the status of the Analytics databases refactoring:

1) analytics-slave.eqiad.wmnet's CNAME now points to db1108, the new host. The staging database that was on db1047 (the old CNAME) has been copied to db1108 so all the previous data is there. We are working on the owners of the remaining databases to figure out what to backup and what not, after that we'll be able to finally decommission db1047.

2) s[12]-analytics.eqiad.wmnet now point to dbstore1002.eqiad.wmnet (analytics-store). Previously they were pointing to db1047 (old CNAME for analytics-slave).

3) the analytics-store.eqiad.wmnet log database (dbstore1002.eqiad.wmnet) is going to be dropped soon (it was scheduled for the 20th but we thought to wait a bit more). Some people already followed up in https://phabricator.wikimedia.org/T156844 and as far as I can see there is no opposition to proceed, please ping us otherwise.
The log database is scheduled to be dropped from dbstore1002 on Tuesday 28th. After that, the log database will be available only on db1108 (analytics-slave.eqiad.wmnet).

Thanks a lot!


2017-11-08 12:02 GMT+01:00 Luca Toscano <ltoscano@wikimedia.org>:
Hi everybody,

the Analytics team needs to make some changes to the current configuration and deployment of the Analytics databases. Before starting a little refresh to be on the same page:

- db1046 - eventlogging master database
- db1047 - also known as analytics-slave.eqiad.wmnet - replicates via mysql s1/s2 and the log database (on db1046) using a custom replication script.
- dbstore1002 - also known as analytics-store.eqiad.wmnet and x1-analytics-slave.eqiad.wmnet - replicates most of the S shards and X1 via mysql, and the log database using a custom replication script.
- db1108 (brand new host) - replicates the log database using a custom replication script.

We have been suffering during the past months some space and performance issues on dbstore1002 (https://phabricator.wikimedia.org/T168303), so we came up with the following plan:

- db1108, a brand new host with SSD disks, replaces db1047 and becomes the CNAME of analytics-slave.eqiad.wmnet. This new host will be a replica of the log database only, no other database will be replicated.
- dbstore1002 will loose the support of the log database, that will be dropped from the host.
- db1047 will eventually be decommissioned (after backing up data and alert people beforehand - T156844).

This will allow us to:
1) Reduce the load on dbstore1002 and free a lot of space on the host.
2) Offer a more performant way to query eventlogging analytics data.
3) Reduce the current performance issues that we have been experiencing while trying to sanitize/purge old event-logging data (https://phabricator.wikimedia.org/T156933

The plan is the following:

- November 13th: the analytics-slave CNAME moves from db1047 to db1108
- November 20th: the log database will be dropped from dbstore1002/analytics-store together with the event-logging replication script
- December 4th: shutdown of db1047 (prior backup of non-log database tables)

To summarize what will change from the users perspective:

- dbstore1002 (analytics-store) will offer all the S/X shards replication (wikis) and all the databases like staging that everybody is used to work with. It will only loose the support of the log database.
- db1108 will offer the log database replication and a staging database.
- the db1047's (analytics-slave) staging database will be moved or copied with a different name (like staging_db1047) to dbstore1002.

Please let us know in the task your opinion in T156844, we'd love to hear some feedback before proceeding, especially about extra requirements that we haven't thought of.


Luca (on behalf of the Analytics team)