We already pool counter thumbnails on a per-file level (e.g. no more than 2 processes at a time for any thumbnails having to do with an original file at a time). Since pool counter calls cannot be nested, we can't add another layer of pool countering based on file type grouping anything of the sort.

The biggest hole right now is that either:
a) A bunch of new files come in quickly, say 100. There could be 200 workers rendering those files (given pool counter). Many more, 50 * 100, could also be waiting on pool counter until they timeout, tying up thumb.php even more (though at least not using cpu or bandwith). The throttling config change could help with this if low limits are picked.
b) Files come in more slowly but nobody views them until there are, say 100, and then they all get viewed at once for some reason. I'm not sure how likely this is, but it's not impossible. This would require possibly pre-rendering some thumbnails first via jobs or something in addition to the throttling config change.
c) In any case, someone could still view a bunch of non-standard sizes and could tie up dozens of processes for a while before getting rate limited for a short time (and they could repeat the process). The number of threads this could tie up is lower than (b) since rate limiting would apply before the pool queue sizes would get as large. Still, it would use a lot of bandwidth and cpu. If there was throttling that was weighted (instead of using 1 for all files), then it could help.

I'm not worried about ~7000 jobs in the queue though, as it seems to just make a backlog that doesn't take up much space.


On Tue, May 13, 2014 at 2:50 PM, Gergo Tisza <gtisza@wikimedia.org> wrote:
On Tue, May 13, 2014 at 2:45 PM, Gergo Tisza <gtisza@wikimedia.org> wrote:
For the first problem, we can make an educated guess of the level of throttling required: if we want to keep the number of simultaneous GWToolset-related scaling requests below X, that means Special:NewFiles and Special:ListFiles should not have more than X/2 GWToolset files on them at any given time. Those pages show the last 50 files, so GWToolset should not upload more than X files in the time that takes normal users to upload 100 of them. I counted the number of uploads per hour on Commons on a weekday, and there were 240 uploads in the slowest hour, which is about 25 minutes for 100 files. so GWToolset should be limited to X files in 25 minutes, for some value of X that ops are happy with.

This is the best we can do with the current throttling options of the job queue, I think, but it has a lot of holes. The rate of normal uploads could drop extremely low for a short time for some reason. New file patrollers could be looking at the special pages with non-default settings (500 images instead of 50). Someone could look at the associated category (200 thumbnails at a time). This is not a problem if people are continuosly keeping watch on Special:NewFiles, because that would mean that the thumbnails get rendered soon after the uploads; but that's an untested assumption.

Maybe we could create scaling priority groups? Tag GWToolset-uploaded images as belonging to the "expensive" group, then use PoolCounter to ensure that no more than X expensive thumbnails are scaled at the same time. That would throttle thumbnail rendering directly, instead of throttling the upload speed and making guesses about how that translates a throttle on rendering.

_______________________________________________
Ops mailing list
Ops@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ops




--
-Aaron S