I dont think vips would help this situation very much. (Also the vips patch
is only half done. Its waiting on someone to do the other half. It is not
waiting for review)
At first glance (by which I mean what is said in this thread without doing
any other investigation), it appears that the upload rate of files from
gwtoolset is higher than the rate at which we can generate thumbnails that
are "instantly" demanded. Obvious solution would be to slow down gwtoolset.
Afaik there are currently multiple job runner processes that process
gwtoolset jobs (--maxthreads). Lets reduce that to 1. If thats still too
speedy, we could add a sleep() call to the job code. After that we could
restart fae's batch upload, but with just the next 50 files. Watch ganglia
stats like a hawk, if nothing bad happens, do the next 100, then 200, etc
until we are reasonably sure that the gwtoolset upload rate is sustainable.
Once we are sure, tell the users they can go wild again.
--bawolff
On Apr 22, 2014 2:27 PM, "Fabrice Florin" <fflorin(a)wikimedia.org> wrote:
Dear Faidon,
Your point is well taken that a major outage should trump a feature
release.
We will discuss this issue with the multimedia team in tomorrow’s sprint
planning
meeting and see if we can take it on right away. If we do, this
could push back our release of Media Viewer in coming weeks.
For now, I have filed this high-priority ’spike' ticket for evaluation by
our
team. We will respond here and onwiki, once our team has had a chance
to investigate possible solutions.
https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/482
Thanks again to everyone on your team for taking on this issue!
Regards as ever,
Fabrice
______________________________________
#482 Investigate solution for image scalers outage
Narrative
As a power user, I can upload large TIFF image files using GWToolset, so
that
others can view them without crashing the system.
Investigate possible solutions for the image scalers outage that took
place over
Easter weekend.
User:Fæ uploaded hundreds of 100-200MB multipage TIFFs via GWToolset over
the
course of 4-5 hours (multiple files per minute), and then
random users/bots viewing Special:NewFiles, which
attempts to display
a thumbnail for all of those new files in parallel in realtime,
and thus
saturating imagescalers' MaxClients setting and
basically
inadvertently DoSing them, as reported by Faidon.
The outage's symptoms seem to have been alleviated since, but
the Commons/GLAM
communities are waiting for a response from us to resume
their work. They've responded to our
"pause" request and in
turn requested our feedback at:
https://commons.wikimedia.org/wiki/Commons:Village_pump#Images_so_big_they_…
It appears this project is establishing the limits of what Commons can
currently
handle, and we invite ideas on how the strain on the servers for
the large images involved can be reduced.
According to Emmanuel, it seems that this problem might be fixed by using
the
VipsScaler for TIFF pictures and Greg has already worked on this and
proposed a patch. But this patch has been waiting a review since 7 months:
wrote:
> Fabrice,
>
> I don't see how a feature release can be of a higher priority than
> troubleshooting an outage but regardless:
>
> The outage's symptoms seem to have been alleviated since, but the
> Commons/GLAM communities are waiting for a response from us to resume
> their work. They've responded to our "pause" request and in turn
> requested our feedback at:
>
https://commons.wikimedia.org/wiki/Commons:Batch_uploading/NYPL_Maps
> (see the large red banner with the stop sign at the bottom)
>
> ...which is also linked from:
>
https://commons.wikimedia.org/wiki/User_talk:F%C3%A6#Large_file_uploads
>
https://commons.wikimedia.org/wiki/Commons:Village_pump#Images_so_big_they_…
>
> Sadly, I don't have much to offer them, as I previously explained. I
> certainly wouldn't commit to anything considering your response on the
> matter.
>
> Could you communicate your team's priorities to Fæ and the rest of the
> Commons/GLAM community directly?
>
> Thanks,
> Faidon
>
> On Mon, Apr 21, 2014 at 08:57:39AM -0700, Fabrice Florin wrote:
>>
>> Dear Faidon, Emmanuel and Guiseppe,
>>
>> Thanks so much for investigating this issue so quickly and sharing the
likely cause of the problem with us.
>>
>> This quarter, our team’s top priority is to address serious issues
related
to Upload Wizard — and this seems like a good one for us to take on.
>>
>> However, we are still in the process of releasing Media Viewer, which
is
likely to take most of our attention for the next few weeks.
>>
>> So we may not be able to troubleshoot it right away. But we are filling
tickets about this issue, so we can hit the ground running in early may.
>>
>> Thanks again for your fine work, as well as for your patience and
understanding.
>>
>>
>> Fabrice
>>
>>
>> On Apr 21, 2014, at 3:53 AM, Emmanuel Engelhart <
emmanuel.engelhart(a)wikimedia.ch> wrote:
>>
>>> On 21.04.2014 12:05, Faidon Liambotis wrote:
>>>
>>>> On Mon, Apr 21,
2014 at 10:56:40AM +0200, Giuseppe Lavagetto wrote:
>
>>>
>>>>> The problem
resolved before I could get to strace the apache
processes, so
>>>>> I don't have more details -
Faidon was investigating as well and may
have
>>>>>> more info.
>
>>>
>
>>>
>>>>> Indeed, I do: this had nothing to do with TMH. The
trigger was Commons
>>>>> User:Fæ uploading hundreds of 100-200MB multipage TIFFs via
GWToolset
>>>>> over the course of 4-5 hours (multiple files per minute), and then
>>>>
random users/bots viewing
Special:NewFiles, which attempts to display
a
>>>>> thumbnail for all of those new files in parallel in realtime, and
thus
>>>>
saturating
imagescalers' MaxClients setting and basically
inadvertently
>>>> DoSing them.
>>>
>>>> The issue was
temporary because of
>>>>
https://bugzilla.wikimedia.org/show_bug.cgi?id=49118 but since the
user
>>>> kept uploading new files, it was
recurrent, with different files every
>>>> time. Essentially, we would keep having short outages every now and
then
>>>> for as long as the upload activity
continued.
>>>
>>>> I left a comment
over at
https://commons.wikimedia.org/wiki/User_talk:Fæ
>>>> and contacted Commons admins over at
#wikimedia-commons, as a courtesy
>>>> to both before I used my root to elevate my privileges and ban a
>>>> long-time prominent Wikimedia user as an emergency countermeasure :)
>>>
>>>> It was effective,
as Fæ immediately responded and ceased the activity
>>>> until further discussion; the Commons community was also helpful in
the
>>>> short discussion that followed.
>>>
>>>> Andre also pointed
out that Fæ had previously began the "Images so big
>>>> they break Commons" thread at the Commons Village Pump:
>>>>
https://commons.wikimedia.org/wiki/Commons:Village_pump#Images_so_big_they_…
>>>
>>>> As for the more permanent solution: there's not
much we, as ops, can
do
>>>> about this but say "no, don't
upload all these files", which is
>>>> obviously not a great solution :) The root cause is an architecture
>>>> issue with how imagescalers behave with regards to resource-intensive
>>>> jobs coming in a short period of time. Perhaps a combination of
>>>> poolcounter per file and more capacity (servers) would alleviate the
>>>> effect, but ideally we should be able to have some grouping &
>>>> prioritization of imagescaling jobs so that large jobs can't
completely
>>>> saturate and DoS the cluster.
>>>
>>>
>>> Commons has big difficulties to deal with big TIFF files and this is a
serious issue, in particular for Wikipedians in Residence. To me it looks
like that using the Vipsscaler would help to fix the worse ones.
>>>
>>> Here is an email I have sent to Andre and Greg a few days ago. I make
it public with the hope it might help.
>>>
>>> ===========
>>> As a GLAM volunteer and WIR at the Swiss National Library, I encourage
institutions to upload high quality pictures to increase digital
sustainability. But, in the worse case (big TIFF files), Commons is not
able to deal with them and fails to compute the thumbnails.
>>>
>>> You have a perfect example of the problem with this recently uploaded
collection of historical plans of the Zurich main station:
>>>
https://commons.wikimedia.org/wiki/Category:Historical_plans_of_Zurich_Main…
>>>
>>>> It seems that this
problem might be fixed by using the VipsScaler for
TIFF pictures and Greg has already worked on this and proposed a patch. But
this patch has been waiting a review since 7 months:
>>>>
https://bugzilla.wikimedia.org/show_bug.cgi?id=52045
>>>
>>>> IMO it would be
great if you could do something to increase the
priority and the urgency of this ticket. The movement invests pretty much
resources to build successful collaboration with GLAMs and many of them get
braked by this "silly" bug.
>>>
>>>> Hope you can help
us.
>>>> ===========
>>>
>>>> --
>>>> Volunteer
>>>> Technology, GLAM, Trainings
>>>> Zurich
>>>> +41 797 670 398
>>>
>>>>
_______________________________________________
>>>> Multimedia mailing list
>>>> Multimedia(a)lists.wikimedia.org
>>>>
https://lists.wikimedia.org/mailman/listinfo/multimedia
>>>
>>>
>>> _______________________________
>>>
>>> Fabrice Florin
>>> Product Manager
>>> Wikimedia Foundation
>>>
>>>
http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
>>>
>>>
>>>
>>
>>> _______________________________________________
>>> Ops mailing list
>>> Ops(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/ops
>>
>>
>
> _______________________________
>
> Fabrice Florin
> Product Manager
> Wikimedia Foundation
>
>
http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
>
>
>
>
> _______________________________________________
> Multimedia mailing list
> Multimedia(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/multimedia
>