Re: [Multimedia] [Ops] Brief image scalers outage, Mon Apr 21 03:12 UTC

21 Apr 2014


      On 21.04.2014 12:05, Faidon Liambotis wrote:
...
On Mon, Apr 21, 2014 at 10:56:40AM +0200, Giuseppe Lavagetto wrote:
...
The problem resolved before I could get to strace the apache processes, so
I don't have more details - Faidon was investigating as well and may have
more info.
Indeed, I do: this had nothing to do with TMH. The trigger was Commons
User:Fæ uploading hundreds of 100-200MB multipage TIFFs via GWToolset
over the course of 4-5 hours (multiple files per minute), and then
random users/bots viewing Special:NewFiles, which attempts to display a
thumbnail for all of those new files in parallel in realtime, and thus
saturating imagescalers' MaxClients setting and basically inadvertently
DoSing them.
The issue was temporary because of
https://bugzilla.wikimedia.org/show_bug.cgi?id=49118 but since the user
kept uploading new files, it was recurrent, with different files every
time. Essentially, we would keep having short outages every now and then
for as long as the upload activity continued.
I left a comment over at https://commons.wikimedia.org/wiki/User_talk:F%C3%A6
and contacted Commons admins over at #wikimedia-commons, as a courtesy
to both before I used my root to elevate my privileges and ban a
long-time prominent Wikimedia user as an emergency countermeasure :)
It was effective, as Fæ immediately responded and ceased the activity
until further discussion; the Commons community was also helpful in the
short discussion that followed.
Andre also pointed out that Fæ had previously began the "Images so big
they break Commons" thread at the Commons Village Pump:
https://commons.wikimedia.org/wiki/Commons:Village_pump#Images_so_big_they_b...
As for the more permanent solution: there's not much we, as ops, can do
about this but say "no, don't upload all these files", which is
obviously not a great solution :) The root cause is an architecture
issue with how imagescalers behave with regards to resource-intensive
jobs coming in a short period of time. Perhaps a combination of
poolcounter per file and more capacity (servers) would alleviate the
effect, but ideally we should be able to have some grouping &
prioritization of imagescaling jobs so that large jobs can't completely
saturate and DoS the cluster.
Commons has big difficulties to deal with big TIFF files and this is a 
serious issue, in particular for Wikipedians in Residence. To me it 
looks like that using the Vipsscaler would help to fix the worse ones.
Here is an email I have sent to Andre and Greg a few days ago. I make it 
public with the hope it might help.
===========
As a GLAM volunteer and WIR at the Swiss National Library, I encourage 
institutions to upload high quality pictures to increase digital 
sustainability. But, in the worse case (big TIFF files), Commons is not 
able to deal with them and fails to compute the thumbnails.
You have a perfect example of the problem with this recently uploaded 
collection of historical plans of the Zurich main station:
https://commons.wikimedia.org/wiki/Category:Historical_plans_of_Zurich_Main_...
It seems that this problem might be fixed by using the VipsScaler for 
TIFF pictures and Greg has already worked on this and proposed a patch. 
But this patch has been waiting a review since 7 months:
https://bugzilla.wikimedia.org/show_bug.cgi?id=52045
IMO it would be great if you could do something to increase the priority 
and the urgency of this ticket. The movement invests pretty much 
resources to build successful collaboration with GLAMs and many of them 
get braked by this "silly" bug.
Hope you can help us.
===========
-- 
Volunteer
Technology, GLAM, Trainings
Zurich
+41 797 670 398

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [Multimedia] [Ops] Brief image scalers outage, Mon Apr 21 03:12 UTC