Hi Fae,
On Mon, Jun 30, 2014 at 2:38 PM, Fæ faewik@gmail.com wrote:
This is a simple sounding question, but I have two uploads going on in parallel right now, one using 8 processing threads and the other using 16, so a total of 24. None of these files is huge, they seem to be under 15mb, with an occasional outlier around 45mb (though quite a few drawing scans break the TIFF max size barrier of 50MP even though these are only a miniscule ~2.5mb in filesize).
GWT was designed for a maximum of 20 threads, and I don't know whether to feel guilty at running 24 threads this way, even though these uploads are unlikely to break anything.
The recent outages were not directly related to upload volume. Uploads do not (yet) cause thumbnails to be rendered; the thumbnail requests (which overloaded the image scaling servers) were caused by people looking at those images (maybe on Special:NewFiles, or some category page). So it is really the "image view volume" that counts; there is some relation to the upload speed (more uplads -> more images on Special:NewFiles -> more views) but it's rather indirect.
The upload volume in itself is tiny; you mention 3K uploads of 15 Mb images per day, that consumes about 0.5 Mbps bandwidth, while the the capacity is in the gigabytes. As long as the images are small and creating thumbnails for them is not particularly processing-intensive, I don't think lots of threads would be problematic. However...
Any thoughts? If what I'm doing is somehow self-regulating, I would be tempted to add another job and bump the "volume" to 40 or more threads, as this particular upload has over 100,000 images (potentially 200,000) and I'd rather it didn't take over a month to complete (which is what it is looking like right now at a rate of 2,800 images per day).
...exactly because GWToolset has self-regulating limits, there is not much point in doing that. The job throttling has recently been changed to be global instead of per-user or per-process, so adding more jobs will not speed things up. Actually, running one upload with 20 threads should be faster than running two with 24 (as the number of threads is ignored currently, but the number of uploads is factored into the throttling). (Note that this is a recent change and not yet tested in real world so take this with a grain of salt. https://gerrit.wikimedia.org/r/#/c/132112/ for details.)
A more direct way of throttling thumbnail generation will be deployed next Thursday (bug 65691 https://bugzilla.wikimedia.org/show_bug.cgi?id=65691); we might want to reconsider GWToolset throttling limits after that.