Re: [Multimedia] [Ops] Brief image scalers outage, Mon Apr 21 03:12 UTC

21 Apr 2014


      On Mon, Apr 21, 2014 at 8:04 AM, Ori Livneh ori@wikimedia.org wrote:
...
The number of Apache busy workers on the image scalers spiked between 2:55
and 3:15 UTC, peaking at about 3:12 and overwhelming
rendering.svc.eqiad.wmnet for about a minute.
The outage correlates fairly well with a spike of fatals in
TimedMediaHandler, consisting almost entirely of requests to this URL: <
http://commons.wikimedia.org/w/thumb_handler.php/2/2c/Closed_Friedmann_unive...
...
.
The full stack trace is included in <
https://bugzilla.wikimedia.org/show_bug.cgi?id=64152%3E, filed by Reedy
yesterday. It appears File::getMimeType is returning 'unknown/unknown' and
that File::getHandler is consequently not able to find a handler.
The problem has happened again this morning between 8:25 and 8:35 UTC. This
time the load was so high that ganglia stopped graphing data. From an
analysis of the logs, while it is true we have a lot of fatals for that url
above, it is also true that the number of requests for that url is quite
low and does not present a spike in that interval. So the problem is
genuine load and that is probably caused by some large processing.
The problem resolved before I could get to strace the apache processes, so
I don't have more details - Faidon was investigating as well and may have
more info.
Giuseppe

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [Multimedia] [Ops] Brief image scalers outage, Mon Apr 21 03:12 UTC