SUMMARY: This week I experienced an issue when uploading several hundred very high resolution maps as part the NYPL maps project.[1] Discussion has been going on in several places and this thread is an attempt to share a discussion in one place so all users can benefit.
[Gilles, Could you join this low volume open email list to keep track of GWT issues and be a voice for WMF Operations to help us reach a recommendation for end user best practices?]
HISTORY For our GLAM projects my upload was unusually stressful for the WMF servers. Individual map scans are up to 300 MB images, and resolutions can exceed 80 megapixels (80 million pixels). There are 20,000 tiff images to be uploaded, I have completed around 12%. I used the GLAMtoolset at full capacity (20 threads) though I had broken the xml file up, so runs were a few hundred images at a time. My intention was to ramp this up to a couple of thousand per upload "tranche".
I was contacted on Tuesday by operations asking for me to suspend the upload as the demand for attempted thumbnail rendering of the tiff images was too high a load on WMF servers.[2] Over 500 of the tiff images were greater than 50 megapixels and as a consequence Commons fails to render any thumbnails (they are created for jpegs greater than this limit, this is a tiff specific constraint).[3]
CURRENT STATE With no obvious immediate fix/work-around on the table from WMF ops, I have proposed to re-start my uploads for this project with an effective throttle by using 2 threads (this is a setting on the first screen of the GWToolset. In practice, having tried a run of a couple of hundred, this means that the tool is uploading 100MB sized images at a rate of 2 every 5 minutes. This seems to not be causing any issues.
WAY FORWARD In the longer term the WMF is looking at alternatives for rendering tiff thumbnails which will enable 50MP+ images to be handled; this may or may not help solve the problem seen this week.[4]
I recommend that the GWToolset on-wiki guides include a recommendation about how to choose the number of processing threads based on the types of images to be uploaded. To date, no other project has seen these problems, probably because the image resolutions fall well under the 50MP threshold. The maximum allowed number of threads is 20, with a default being 10. For the time being I suggest that we agree a best practice that for upload projects with tiffs over 50MP, that no more than 2 threads are used; these problems do not appear to exist for projects uploading smaller resolution files.
I propose that WMF Operations consider finding ways of testing the peak loads possible from the GWT and decide if this can be fixed by future operational improvements, whether the tool might benefit from some simple "load management" changes, or if establishing a best practice for our (relatively) small number of GWT users would be a sufficient community based control.
Links 1. https://commons.wikimedia.org/wiki/Commons:Batch_uploading/NYPL_Maps 2. https://commons.wikimedia.org/wiki/Commons_talk:Batch_uploading/NYPL_Maps 3. https://commons.wikimedia.org/wiki/Category:NYPL_maps_%28over_50_megapixels%... 4. https://bugzilla.wikimedia.org/show_bug.cgi?id=52045
Fae