Hello,
performing of random same queries over the given set of wikis (typically random statistics) is apparently one of the very usual tasks done by toolserver users.
I have started https://wiki.toolserver.org/view/Iterating_over_wikis where I would appreciate if you could share your code excerpts in random languages. (Feel free to add yours if it's missing.)
I bet there are random ways how to do such task, good and bad, effective and ineffective, resource consuming or not, so one of the aims is to have the most effective algorithms on this task published there, so everyone can use them instantly.
Thank you for cooperation.
Kind regards
Danny B.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I've just added a new feature to the automated query killer which can be used
to limit the execution time of queries.
To use it, include the string LIMIT:<n> in your query. For example:
SELECT /* LIMIT:60 */ * FROM ...
would limit the query execution time to 60 seconds.
This could be used in web tools to prevent excessively long page load times
(where the user will gave given up by the time the query finishes), or as a
sanity check on queries which usually finish quickly, but sometimes take too
long.
- river.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkxj8k4ACgkQIXd7fCuc5vJQWwCfaVYnSWoxSmEd8HesFcfUikeH
dLsAn3sgbrnXQwpeGoyys48fad+gbQKO
=HE6k
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
I wanted to get an OK (or not) to return the page title, and timestamp
for a given deleted revid. It might also be helpful to return the
username. Some information on the tool is available at
http://toolserver.org/~lifeguard/page/tools/get-deleted-data
This is needed as a workaround for bug 23489 (No way to get the page
title of a deleted revision by its rev_id:
https://bugzilla.wikimedia.org/show_bug.cgi?id=23489). Specifically, the
database of link additions maintained by COIBot has some data corrupted
by mixing charsets which needs to be repaired. We need page titles for
corrupted rows, and since we have only a revid, we cannot get it from
MediaWiki at present.
Thanks,
- -Mike
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEARECAAYFAkxcuSkACgkQst0AR/DaKHsSnQCgvBU/IZGsxZ5IoOr4GRanvAKP
XooAoK3LQEDnK7hGvU0NTlfeGrLorYzK
=LXpN
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
This mail describes two separate changes to how database access works. Please
read all of it and make the necessary changes to your tools.
Summary
=======
This is a brief summary of the changes you should make:
* If you currently use the fast server (sql-sX-fast or XXwiki-p.fastdb), change
back to the normal server, then follow the rest of these instructions.
* For tools which only connect to 'sql', no changes are necessary.
* For tools which use database servers but *do not* use user databases:
* If you currently connect to sql-sX.toolserver.org, instead connect to
sql-sX-rr.toolserver.org
* If you currently connect to XXwiki-p.db.toolserver.org, instead connect to
XXwiki-p.rrdb.toolserver.org
* For tools which use database servers and *do* use user databases:
* If you currently connect to sql-sX.toolserver.org, instead connect to
sql-sX-user.toolserver.org
* If you currently connect to XXwiki-p.db.toolserver.org, instead connect to
XXwiki-p.userdb.toolserver.org
* If you have any queries which could run for longer than 10 minutes when
working correctly, add the string SLOW_OK somewhere in the query, e.g.:
SELECT /* SLOW_OK */ * FROM table...
I will update the documentation on the wiki shortly to reflect these changes.
The rest of this mail describes the changes in detail.
RR servers
==========
A while ago, we introduced the idea of 'fast' database servers, which only
allowed queries running for less than 60 seconds. The idea was that since
there were no long queries to create load on the server, it would be less
likely to have replication lag than the normal servers.
Since the introduction of fast servers, we have seen very little take-up of
them, even for tools which could usefully use them. Additionally, we have not
had much issue with replication lag recently.
We will therefore be retiring the fast servers, and replacing them with RR
servers. To connect to an RR server, the following hostnames should be used:
* sql-sX-rr.toolserver.org
* XXwiki-p.rrdb.toolserver.org
Unlike the normal aliases, the RR server will randomly connect you to one of
the two servers in each cluster. (For example, when connecting to sql-s1-rr,
you will randomly connect to either thyme or rosemary.)
There is no disadvantage for users to connect to the RR alias (since there is
no limit on query execution time), and this will allow us to better distribute
load among the database servers, which will reduce replication lag for
everyone. It also makes it easier for us to add additional database servers to
a cluster in the future.
The only tools which cannot use the RR servers are tools which access user
databases, since these databases are still only present on a single server.
These tools should instead connect to the new "user" aliases:
* sql-sX-user.toolserver.org
* XXwiki-p.userdb.toolserver.org
The user aliases will always point to the server which currently contains the
user databases.
Long query killer
=================
To help prevent replication lag, we will be introducing an automatic query
killer on all servers. This will work as follows:
* If the replication lag is under 10 minutes, no queries will be killed.
* If the replication lag is 10 minutes or more, but less than 30 minutes,
queries will be killed if the following two conditions are both true:
1. The query does not contain the text SLOW_OK
2. The query has been running for X seconds or more, where X is the current
replication lag.
* If the replication lag is 30 minutes or more, queries will be killed if the
following condition is true:
1. The query has been running for X seconds or more, where X is the current
replication lag.
This is intended to only kill queries which are causing replication lag (in
particular, queries which cause InnoDB lock wait timeouts). We will monitor
the performance of the query killer and might adjust the parameters in the
future.
If you find that your queries are being killed and you don't think they should
have been, please open a request in JIRA.
- river.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkxdHDMACgkQIXd7fCuc5vJi0wCdH8e8MkbHdVukfXxR9JGDTHz7
e6sAoLgvCpkNW39zJ1uhkGWnRTK61XF3
=n+x9
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
Since no one reported any problems with Perl 5.12, this is now the default Perl
version (/usr/bin/perl) on the Solaris systems.
Additionally, Python 2.7 is now available as /usr/bin/python2.7. This is not
yet the default version, but will become so if there are no issues.
- river.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkxaWAgACgkQIXd7fCuc5vLTSwCgt9Y3sfm6KDUP3TPC/+qYt4SD
x/AAn1pBJz8TMwBetE2eYk2cVdOFafSh
=WOIe
-----END PGP SIGNATURE-----
It would be a nice time-saver, to turn the names of servers or services
accessible via http(s) into links, such as JIRA, or ortelius (which is btw.
currently missing from the list) at http://status.toolserver.org/
Can that be done without too much hassle?
Greetings - Purodha
--
Greetings - Purodha
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
yarrow (sql-s[25]-fast) is currently offline due to a problem with its disk
array. All load for these clusters is being served by daphne and there should
be no impact to users.
- river.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkxYV7kACgkQIXd7fCuc5vJu7QCfQvlpFbiHMys0CyDVSs/yCrHa
L9AAn3Rjnl1lKK72zkm1BI0Xf94st2Jm
=QTJz
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I'm about to shut down JIRA for an upgrade. This should take under 30 minutes.
- river.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (FreeBSD)
iEYEARECAAYFAkxYSaMACgkQIXd7fCuc5vIxwQCdE152B3VHVAR6cEn2AjvL0/Rc
lCMAnAkhNAk0h1dsztoxMiAgR09cKjJ0
=igYf
-----END PGP SIGNATURE-----
Hello Admins!
I've created a little problem for me, I did a 'svn remove' on the wrong
directory and thus lost my bot code... :( or ;) (can't decide...)
Can you maybe give me the most recent data from backup of
.../drtrigon/pywikipeda back?!? If not I think I can restore the code
myself, but the data would also be useful... ;)
Sorry for this issue and thank you!
Greetings
DrTrigon
I'm going to suspend replication on cassia (sql-s4) for a few hours, in order to
generate a missing index on globalimagelinks. should be done by tonight.
We'll probably do the same on the other servers soon.
-- daniel