On Wednesday i reported
https://jira.toolserver.org/browse/TS-1693 z-dat-s5-b: ERROR 2013
(HY000): Lost connection to MySQL server at 'reading initial
No TS-Admin did take care about this initial problem. On Thursday there
was no space left on /sql (according to munin).
Since then many expected rows are missing on dewiki tables. My bot scans
dewiki for pages with missing categories or pagelinks and has founded
many wrong results in the last 48 hours.
Is there any estimated time when s5-user will be usable again? I think a
reimport is needed because of corrupted data (dewiki on sq-s5-rr
(cassia) seems to be ok). s5 is growing fast because of wikidata.
This week i also reported replication problems with other database servers:
* TS-1687: wikidatawiki replication on cassia (sql-s5-rr) stopped at
Sept 30th 2013
* TS-1688: commonswiki replication on cassia (sql-s5-rr) stopped at Sept
* TS-1689: commonswiki replication on z-dat-s5-b (sql-s5-user) stopped
at Oct 8th 2013
* TS-1690: wikidatawiki replication on z-dat-s6-a (sql-s6-user/rr)
stopped at Aug 10th 2013
* TS-1691: wikidatawiki replication on z-dat-s7-a (sql-s7-user/rr)
stopped at Aug 10th 2013
* TS-1694: toolserver.servermapping wrong for s5
We had set a deadline (Sept 30th) to ask WMDE for support to migrate
your tools to Tool Labs. As very few of you made use of this until now,
Johannes still has time to help you. So we are still accepting support
requests. Please think about whether you'll have the time and the
know-how to migrate on your own. Your tools shall have left the
toolserver until June 30th, 2014. This deadline is serious.
If you would like Johannes to give you a hand, please tell us until
November 15th 2013.
* Write to me or to directly to johannes.kroll(a)wikimedia.de.
* Give some details about your tool: Did you try to migrate already but
something fails to work? What works? What doesn't? What exactly do you
want Johannes to do?
Internes IT-Management und Projektmanagement Toolserver
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. (030) 219 158 260
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
für Körperschaften I Berlin, Steuernummer 27/681/51985.
in the last days Toolserver experienced outages of web pages which were caused by too many queries from only a few hosts.
They are using OSM images and - please dont ask me why - single IPs tend to query about 40-50 pictures per second - for minutes or hours, peaks can be worse.
At some points our web server give up then.
Yes sorry ;). I can proudly say that only today about 11.7 millions web queries were answered somehow.
I tried to mitigate the problem of "too many requests per IP" via blocking but it is not an option.
One problem is that users of at least one portal then complain and another is that the IP addresses seem random - coming even from dial up ranges.
There might be something badly wrong with cache-control headers for the images (or probably we can tweak at that point) or - I dont know what it could be.
To make the long story short - I rate limited the OSM tile delivery to 40 images per second per IP - allowed burst is 55.
Users will then get a 503 error if the rate exceeds until it decreases - but delivery isnt stopped completely.
It seems to work since I have some notices which IPs were throtteled and these are IPs that have heavy usage.
I used this here to throttle: http://nginx.org/en/docs/http/ngx_http_limit_req_module.html
I dont want to have this option configured forever - I rather hope we can do something about caching or give the pictures they need to the projects themselves (I doubt we have to deliver hill shading pictures for everyone - this is Toolserver)
If anyone has an idea what to do / questions - please let me know.
for adding a SSL-certificate I need to restart the web-loadbalancer
and/or the webserver (not sure at the moment). I plan to do this at
SUNDAY, 13:00 UTC.
If all works right everything will be done in <1 minute, but if there
are problems there could be some downtime (max 1/2h) for webpages.
how the WMF announced , the password-hashes and email addresses of
many users were public accessible in WikiLabs (and so ToolLabs) for 6
So please make sure that you and your bots get a new password as soon as
possible! A well known bot in the wrong hands is dangerous, so change
the password now – don’t wait if you get a mail by the WMF (I got none,
but be affected AFAIS).