On Tue, May 13, 2014 at 02:45:21PM -0700, Gergo Tisza wrote:
- the scalers did not have enough processing power to handle all the
thumbnail requests that were coming in simultaneously. This was presumably because Special:NewFiles and Special:ListFiles were filled with the NYPL maps, and users looking at those pages sent dozens of thumbnailing requests in parallel.
Sort of, yes. CPU was spiking but also MaxClient limits were being hit, as requests were piling up due to the large duration of those large scaling requests, both due to the time it takes to actually run it, as well as due to the time it takes to actually transfer the files.
It's unlikely we'd survive this if we increased MaxClients, though.
- Swift traffic was saturated by GWToolset-uploaded files, making the
serving of everything else very slow. I assume this was because of the scalers fetching the original files? Or could this be directly caused by the uploading somehow?
The former. The network spike graphs correlated exactly with equivalent imagescaler network spike graphs. Note that this has a secondary effect: when the network gets saturated, imagescaler original transfers become slower and hence scaling requests pile up (see above).
- GWToolset jobs piling up in the job queue (Faidon said he cleared out 7396
jobs).
Not exactly, no. I found 34 XML files in Swift under the container wikipedia-commons-gwtoolset-metadata starting with "Fæ/". A "grep -hr '<filename>' | sort -u | wc -l" showed 7396 *files* (all containing "NYPL" in the name) that would be eventually be uploaded (AIUI), not jobs in the job queue. I'm not exactly sure how GWToolset maps these into jobs in the jobqueue, but I remember reading something about a "master" job that uploads multiple files when it runs.
So I am not confident that throttling would be enough to avoid further meltdowns. I think Dan is working on a patch to make the upload jobs pre-render the thumbnails; we might have to wait for that before allowing GWToolset uploads again.
That sounds acceptable to me, preferrably in combination with the reference thumbnail idea.
Faidon