SUMMARY: This week I experienced an issue when uploading several
hundred very high resolution maps as part the NYPL maps project.[1]
Discussion has been going on in several places and this thread is an
attempt to share a discussion in one place so all users can benefit.
[Gilles, Could you join this low volume open email list to keep track
of GWT issues and be a voice for WMF Operations to help us reach a
recommendation for end user best practices?]
HISTORY
For our GLAM projects my upload was unusually stressful for the WMF
servers. Individual map scans are up to 300 MB images, and resolutions
can exceed 80 megapixels (80 million pixels). There are 20,000 tiff
images to be uploaded, I have completed around 12%. I used the
GLAMtoolset at full capacity (20 threads) though I had broken the xml
file up, so runs were a few hundred images at a time. My intention was
to ramp this up to a couple of thousand per upload "tranche".
I was contacted on Tuesday by operations asking for me to suspend the
upload as the demand for attempted thumbnail rendering of the tiff
images was too high a load on WMF servers.[2] Over 500 of the tiff
images were greater than 50 megapixels and as a consequence Commons
fails to render any thumbnails (they are created for jpegs greater
than this limit, this is a tiff specific constraint).[3]
CURRENT STATE
With no obvious immediate fix/work-around on the table from WMF ops, I
have proposed to re-start my uploads for this project with an
effective throttle by using 2 threads (this is a setting on the first
screen of the GWToolset. In practice, having tried a run of a couple
of hundred, this means that the tool is uploading 100MB sized images
at a rate of 2 every 5 minutes. This seems to not be causing any
issues.
WAY FORWARD
In the longer term the WMF is looking at alternatives for rendering
tiff thumbnails which will enable 50MP+ images to be handled; this may
or may not help solve the problem seen this week.[4]
I recommend that the GWToolset on-wiki guides include a recommendation
about how to choose the number of processing threads based on the
types of images to be uploaded. To date, no other project has seen
these problems, probably because the image resolutions fall well under
the 50MP threshold. The maximum allowed number of threads is 20, with
a default being 10. For the time being I suggest that we agree a best
practice that for upload projects with tiffs over 50MP, that no more
than 2 threads are used; these problems do not appear to exist for
projects uploading smaller resolution files.
I propose that WMF Operations consider finding ways of testing the
peak loads possible from the GWT and decide if this can be fixed by
future operational improvements, whether the tool might benefit from
some simple "load management" changes, or if establishing a best
practice for our (relatively) small number of GWT users would be a
sufficient community based control.
Links
1. https://commons.wikimedia.org/wiki/Commons:Batch_uploading/NYPL_Maps
2. https://commons.wikimedia.org/wiki/Commons_talk:Batch_uploading/NYPL_Maps
3. https://commons.wikimedia.org/wiki/Category:NYPL_maps_%28over_50_megapixels…
4. https://bugzilla.wikimedia.org/show_bug.cgi?id=52045
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Hi all,
Testing in the BETA cluster today. I can see how I can add a source
template (which in this case is Stichting Natuurbeelden), I would also like
to add the institution template (in our case {{institution: Nederlands
Instituut voor Beeld en Geluid}}) for Current Location as I have seen for
example with Fae's upload from the
Rijksmuseum<http://commons.wikimedia.beta.wmflabs.org/wiki/File:Adelaar_met_aap_in_zijn…>
.
I might later ask you for some feedback on some test-uploads.
Regards,
Jesse
--
Met vriendelijke groet,
*Jesse de Vos*
GLAM-wiki coördinator
*T* 035 - 677 39 37
*Aanwezig:* ma, di, do
<http://www.beeldengeluid.nl/>
*Nederlands Instituut voor Beeld en Geluid*
*Media Parkboulevard 1, 1217 WE Hilversum | Postbus 1060, 1200 BB
Hilversum | *
*beeldengeluid.nl* <http://www.beeldengeluid.nl/>
Thanks all for clarifying - good to know that it is widely acknowledged to be weird, but ultimately not a problem.
-Ally
Ally Crockford
Wikimedian-In-Residence
National Library of Scotland
George IV Bridge
Edinburgh EH1 1EW
Scotland, UK
e: a.crockford(a)nls.uk<mailto:a.crockford@nls.uk>
t: (0) 131 623 3797
w: http://www.nls.uk<http://www.nls.uk/>
Follow us on Twitter and Facebook
National Library of Scotland, Scottish Charity, No: SCO11086
This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message.
www.nls.uk
Having got the upload process itself down, I've been trying with different file-sizes. However, when I've uploaded larger files (2500px) to the Beta cluster, the image re-sizes to the smaller, 500px version whenever I try to view or download the image at full resolution, despite all indications being that the current version should be the larger format. I'm not sure whether this is an issue with the Beta cluster or with the Toolset, or if it only has to do with uploading subsequent versions of files. Has anyone run into this kind of problem before?
Example link: http://commons.wikimedia.beta.wmflabs.org/wiki/File:Imaginative_depiction_o…
I don't expect this to be an issue when uploading to Commons proper in most cases, but I find it strange nonetheless.
-Ally
Ally Crockford
Wikimedian-In-Residence
National Library of Scotland
e: a.crockford(a)nls.uk<mailto:a.crockford@nls.uk>
t: (0) 131 623 3797
w: http://www.nls.uk<http://www.nls.uk/>
Follow us on Twitter and Facebook
National Library of Scotland, Scottish Charity, No: SCO11086
This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message.
www.nls.uk
dear all,
we currently have several patches in gerrit that address issues you have filed in bugzilla. we need help with reviewing these patches before they can be deployed to production. if any of you have +2 privileges, please take a look at these patches and let me know if you feel they need any work or +2 them if you feel they are okay:
https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions…
thanks for your help!
with kind regards,
dan
I have been getting "Our servers are currently experiencing a
technical problem" when using GWT there. Is there a current known
problem, or it is me?
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Hi Ally,
After experimenting with the toolset on the Beta Cluster for the last
> couple of weeks, I’m getting close to uploading content onto Commons proper
>
This is wonderful to hear. I had a quick look at our images on the Beta
Cluster, this looks good.
> but still don’t have access to the toolset there. Could I therefore be
> given access?
>
You may request access on <
https://commons.wikimedia.org/wiki/Commons:Bureaucrats%27_noticeboard>.
Please provide a short rationale of why you need it and what you are
planning to do. Folks there might ask you some precisions but nothing
daunting :)
--
Jean-Frédéric
After experimenting with the toolset on the Beta Cluster for the last couple of weeks, I'm getting close to uploading content onto Commons proper but still don't have access to the toolset there. Could I therefore be given access? I'd like to start uploading content as soon as the metadata for the first collection is available (this collection will be small, approx. 90 images, and the procedure has worked properly for a sample of the images in the Beta Cluster)
Thanks,
Ally
Ally Crockford
Wikimedian-In-Residence
National Library of Scotland
George IV Bridge
Edinburgh EH1 1EW
Scotland, UK
e: a.crockford(a)nls.uk<mailto:a.crockford@nls.uk>
t: (0) 131 623 3797
w: http://www.nls.uk<http://www.nls.uk/>
Follow us on Twitter and Facebook
National Library of Scotland, Scottish Charity, No: SCO11086
This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message.
www.nls.uk
After the 3 file "preview" when running GWT on a fresh batch, it is
possible to get the following odd looking error:
Internal error
[8292dd50] 2014-05-09 22:50:47: Fatal exception of type MWException
It looks disasterous, but actually (in my experience of two instances)
indicates that GWT has encountered a attempt to re-upload the same
file. This was a mistake in my xml where some initial records were
duplicated (I am quite unsure why) and though the text page was
overwritten, the image was only uploaded once (a good thing).
I don't think this is a bug, more a potential feature, particularly as
the run then continues regardless. It would be nicer if the warning
message was a bit less cryptic and did not look like the user had just
crashed something badly (which "Fatal" normally indicates), but worth
ensuring it is an exceptional case, potentially created by a poor xml
source, and mentioned in the user guide.
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae