On 01/07/2014, Gergo Tisza gtisza@wikimedia.org wrote:
Any thoughts? If what I'm doing is somehow self-regulating, I would be tempted to add another job and bump the "volume" to 40 or more threads, as this particular upload has over 100,000 images (potentially 200,000) and I'd rather it didn't take over a month to complete (which is what it is looking like right now at a rate of 2,800 images per day).
...exactly because GWToolset has self-regulating limits, there is not much point in doing that. The job throttling has recently been changed to be global instead of per-user or per-process, so adding more jobs will not speed things up. Actually, running one upload with 20 threads should be faster than running two with 24 (as the number of threads is ignored currently, but the number of uploads is factored into the throttling).
Ah, good to know. I am currently running in exactly this way, 1 job with 20 threads.
we might want to reconsider GWToolset throttling limits after that.
Good. Though it's working right now, throttling the tool so that we (all users) can only upload 100,000 (or the equivalent) images in a month, i.e. ~1 million in a year, looks low in the long term. In fact I'd probably end up annoying all other users by hogging the capacity to myself if it stays the same.
To put this in context, the HABS project at the Library of Congress is supposed to have 1/4 million images by itself, and that is just one of many archives in the LoC. I have also previously discussed a UK GLAM with several million potential images. GWT would seem more than a little defective if we had to say it would take several years to upload this many. :-)
Fae