[Wikitech-l] Recent thumbnail problems and problem reporting.

17 Sep 2007


      Two days ago the disk filled up on one of our servers, Bacon,
(http://ganglia.wikimedia.org/pmtpa/graph.php?c=Miscellaneous&h=bacon.wik...).
The full disk resulted in some thumbnails failing to render.
The root problem was resolved, but some of the failed thumbnails
remained failed. They could be resolved by purging the image page, or
by simply waiting for the cache to expire for them.  The technical
team considered the matter closed.
Sometime today awareness of broken thumbs on English Wikipedia rocketed up.
Rather than successfully flagging the tech team's attention, a series
of inaccurate sitenotices were placed on English Wikipedia and on
several other language Wikipedias. The English notice in particular
was displayed to the general public.
The notices claimed that the issue was being worked on. This was not
correct. The notice most likely caused people to not report the
problems they were seeing.
None of the active tech team were aware of any ongoing issue. It was
understood that some images would fail to display until their cache
expired but this was not believed to be an issue significant enough in
scale to justify any action.
When I happened to browse over to enwp as a reader I saw the notice.
I asked ST47 to remove the notice.  I got a hold of our resident
caching god, Mark Bergsma, and went ahead and mass-purged all the
thumbnails.
Sometime after that point the incorrect notice was restored on English
Wikipedia and revised several times, and in its last version it
attempted to give bad directions on how to purge images. It is
generally inadvisable to instruct the general public to purge pages on
a wide scale for a number of reasons.
All in all this issue was handled poorly all around. On the tech side
a status report should have gone out after the fix, and on the
Wikipedia admins side no claim should ever be made that a problem is
being worked on unless you are darn sure that it is the case.
There are also some issues related to how we communicate with the
public, but I'll leave it to someone else to complain about that.
My biggest fear is that had there been a second issue it may have
persisted for days with the techs unaware of the problem. I've seen
some prior examples of over eagerness to claim something is being
worked on in the past in our user communities. It frightens me for
this reason.
Hopefully future events will be handled better and this message will
increase awareness of the potential issues involved.
Thanks for your time.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Recent thumbnail problems and problem reporting.