[Toolserver-announce] Changes to database access

River Tarnell river.tarnell at wikimedia.de
Sat Aug 7 08:41:23 UTC 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

This mail describes two separate changes to how database access works.  Please 
read all of it and make the necessary changes to your tools.

  Summary
  =======

This is a brief summary of the changes you should make:

* If you currently use the fast server (sql-sX-fast or XXwiki-p.fastdb), change 
  back to the normal server, then follow the rest of these instructions.

* For tools which only connect to 'sql', no changes are necessary.

* For tools which use database servers but *do not* use user databases:

  * If you currently connect to sql-sX.toolserver.org, instead connect to
    sql-sX-rr.toolserver.org

  * If you currently connect to XXwiki-p.db.toolserver.org, instead connect to
    XXwiki-p.rrdb.toolserver.org

* For tools which use database servers and *do* use user databases:

  * If you currently connect to sql-sX.toolserver.org, instead connect to
    sql-sX-user.toolserver.org

  * If you currently connect to XXwiki-p.db.toolserver.org, instead connect to
    XXwiki-p.userdb.toolserver.org

* If you have any queries which could run for longer than 10 minutes when
  working correctly, add the string SLOW_OK somewhere in the query, e.g.:

    SELECT /* SLOW_OK */ * FROM table...

I will update the documentation on the wiki shortly to reflect these changes.

The rest of this mail describes the changes in detail.

  RR servers
  ==========

A while ago, we introduced the idea of 'fast' database servers, which only 
allowed queries running for less than 60 seconds.  The idea was that since 
there were no long queries to create load on the server, it would be less 
likely to have replication lag than the normal servers.

Since the introduction of fast servers, we have seen very little take-up of 
them, even for tools which could usefully use them.  Additionally, we have not 
had much issue with replication lag recently.

We will therefore be retiring the fast servers, and replacing them with RR 
servers.  To connect to an RR server, the following hostnames should be used:

  * sql-sX-rr.toolserver.org
  * XXwiki-p.rrdb.toolserver.org

Unlike the normal aliases, the RR server will randomly connect you to one of 
the two servers in each cluster.  (For example, when connecting to sql-s1-rr, 
you will randomly connect to either thyme or rosemary.)

There is no disadvantage for users to connect to the RR alias (since there is 
no limit on query execution time), and this will allow us to better distribute 
load among the database servers, which will reduce replication lag for 
everyone.  It also makes it easier for us to add additional database servers to 
a cluster in the future.

The only tools which cannot use the RR servers are tools which access user 
databases, since these databases are still only present on a single server.  
These tools should instead connect to the new "user" aliases:

  * sql-sX-user.toolserver.org
  * XXwiki-p.userdb.toolserver.org

The user aliases will always point to the server which currently contains the 
user databases.

  Long query killer
  =================

To help prevent replication lag, we will be introducing an automatic query 
killer on all servers.  This will work as follows:

* If the replication lag is under 10 minutes, no queries will be killed.

* If the replication lag is 10 minutes or more, but less than 30 minutes,
  queries will be killed if the following two conditions are both true:

    1. The query does not contain the text SLOW_OK
 
    2. The query has been running for X seconds or more, where X is the current 
       replication lag.

* If the replication lag is 30 minutes or more, queries will be killed if the 
  following condition is true:

    1. The query has been running for X seconds or more, where X is the current 
       replication lag.

This is intended to only kill queries which are causing replication lag (in 
particular, queries which cause InnoDB lock wait timeouts).  We will monitor 
the performance of the query killer and might adjust the parameters in the 
future.

If you find that your queries are being killed and you don't think they should 
have been, please open a request in JIRA.

	- river.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)

iEYEARECAAYFAkxdHDMACgkQIXd7fCuc5vJi0wCdH8e8MkbHdVukfXxR9JGDTHz7
e6sAoLgvCpkNW39zJ1uhkGWnRTK61XF3
=n+x9
-----END PGP SIGNATURE-----



More information about the Toolserver-announce mailing list