SUMMARY: This week I experienced an issue when uploading several
hundred very high resolution maps as part the NYPL maps project.[1]
Discussion has been going on in several places and this thread is an
attempt to share a discussion in one place so all users can benefit.
[Gilles, Could you join this low volume open email list to keep track
of GWT issues and be a voice for WMF Operations to help us reach a
recommendation for end user best practices?]
HISTORY
For our GLAM projects my upload was unusually stressful for the WMF
servers. Individual map scans are up to 300 MB images, and resolutions
can exceed 80 megapixels (80 million pixels). There are 20,000 tiff
images to be uploaded, I have completed around 12%. I used the
GLAMtoolset at full capacity (20 threads) though I had broken the xml
file up, so runs were a few hundred images at a time. My intention was
to ramp this up to a couple of thousand per upload "tranche".
I was contacted on Tuesday by operations asking for me to suspend the
upload as the demand for attempted thumbnail rendering of the tiff
images was too high a load on WMF servers.[2] Over 500 of the tiff
images were greater than 50 megapixels and as a consequence Commons
fails to render any thumbnails (they are created for jpegs greater
than this limit, this is a tiff specific constraint).[3]
CURRENT STATE
With no obvious immediate fix/work-around on the table from WMF ops, I
have proposed to re-start my uploads for this project with an
effective throttle by using 2 threads (this is a setting on the first
screen of the GWToolset. In practice, having tried a run of a couple
of hundred, this means that the tool is uploading 100MB sized images
at a rate of 2 every 5 minutes. This seems to not be causing any
issues.
WAY FORWARD
In the longer term the WMF is looking at alternatives for rendering
tiff thumbnails which will enable 50MP+ images to be handled; this may
or may not help solve the problem seen this week.[4]
I recommend that the GWToolset on-wiki guides include a recommendation
about how to choose the number of processing threads based on the
types of images to be uploaded. To date, no other project has seen
these problems, probably because the image resolutions fall well under
the 50MP threshold. The maximum allowed number of threads is 20, with
a default being 10. For the time being I suggest that we agree a best
practice that for upload projects with tiffs over 50MP, that no more
than 2 threads are used; these problems do not appear to exist for
projects uploading smaller resolution files.
I propose that WMF Operations consider finding ways of testing the
peak loads possible from the GWT and decide if this can be fixed by
future operational improvements, whether the tool might benefit from
some simple "load management" changes, or if establishing a best
practice for our (relatively) small number of GWT users would be a
sufficient community based control.
Links
1. https://commons.wikimedia.org/wiki/Commons:Batch_uploading/NYPL_Maps
2. https://commons.wikimedia.org/wiki/Commons_talk:Batch_uploading/NYPL_Maps
3. https://commons.wikimedia.org/wiki/Category:NYPL_maps_%28over_50_megapixels…
4. https://bugzilla.wikimedia.org/show_bug.cgi?id=52045
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Hi,
I'm about to upload a few hundred images that have been released by the
British Library.
I am all set to go, with carefully designed Commons filenames; but the
GWtoolset uploader is wrecking all the commas and brackets in my filenames.
What I want is:
File:Large flowering sensitive plant (Mimosa grandiflora) - New
illustration of the Sexual System of Carolus von Linnaeus (1807) - BL.jpg
File:Cherries - Pomona Britannica (1812), pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire (1814), plate XV - BL.jpg
What it's giving me is:
File:Large flowering sensitive plant -Mimosa grandiflora- - New
illustration of the Sexual System of Carolus von Linnaeus -1807- - BL.jpg
File:Cherries - Pomona Britannica -1812-- pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire -1814-- plate XV - BL.jpg
How do I turn this behaviour off, please, or how do I work around it, to
get the more easily human-readable names that I want?
Thanks,
James Heald.
Hi,
I had an odd problem with files not being created, which I think I can
put down to how long filenames are handled by GWT.
As an example, my xml specified (A) but GWT created (B):
A. File:Index Map No.2 of a part of Suffolk County. South Side - Ocean
Shore, Long Island. Part of Islip and Part of Brookhaven. Published by
E. Belcher Hyde. 97 Liberty Street, Brooklyn. 5 Beekman Street,
NYPL1633883.tiff (209 chars) (see link)
B. File:Index Map No. 2 of a part of Suffolk County. South Side -
Ocean Shore, Long Island. Easthampton. Published by E. Belcher Hyde.
97 Liberty Street, Brooklyn. 5 Beekman Street, Manhattan. 1916. Volume
NYPL1633.tiff (206 chars)
This seems an easy thing to warn the user about when reading the xml.
In terms of behaviour I would expect the tool to reject the xml as
malformed and warn about maximum allowed filename length, rather than
truncate the name, in this case truncation meant corrupting the unique
NYPL identifier.
It would be better if GWT allowed the maximum title length that
Commons allows (240 bytes, the number of visible characters varying by
charset).
I vaguely recall the Steering Committee discussing this last year, so
I'm unsure if this is worth raising in bugzilla. Suggestions?
Links
1. https://commons.wikimedia.org/wiki/File:Index_Map_No.2_of_a_part_of_Suffolk…
2. https://bugzilla.wikimedia.org/show_bug.cgi?id=30202
3. https://commons.wikimedia.org/wiki/Commons:Filenames
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
For those on this group who have not signed up for an account on
bugzilla, I strongly encourage you to do so. The GWToolset is likely
to have a number of changes logged on bugzilla and the more user
feedback we can add to the changes the better.
The most recent is Steinsplitter's
<https://bugzilla.wikimedia.org/show_bug.cgi?id=64634> raised to avoid
using GWT created hidden comment fields as markers on the image page.
Primarily this marks where the xml source data is logged on an image
page, which could be very useful if things are going wrong with the
templates on the page, or if someone wants to datamine using original
data, rather than the mapped result. Note that the Commons API can do
interesting things to support automated changes by identifying
sub-sections of an image page and returning the text, but this relies
on wiki-style sections rather than hidden comments.
It would be cool if someone could occasionally report back here if
there are bugs reported that would benefit from more user feedback
either on bugzilla or separately on this list.
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Dear all,
I take the opportunity to introduce Rromir, he's doing an internship about the GWToolset for Wikimedia CH.
Rromir is not an IT guy, neither am I, we will work together to test the GWToolset with three real batch of pictures coming from some of WMCH projects I'm leading.
The idea behind this internship is to evaluate the usability of the GWToolset in the context of Wikimedia CH projects.
Rromir is first getting familiar with the tool, then he will evaluate its accessibility to non-IT guy and if it's adapted to WMCH needs.
If everything is fine, the internship should lead to the creation of an handbook for GLAMs.
Is there any similar initiative somewhere?
Charles
___________________________________________________________
Charles ANDRES, Chief Science Officer
"Wikimedia CH" – Association for the advancement of free knowledge –
www.wikimedia.ch
Office +41 (0)21 340 66 21
Mobile +41 (0)78 910 00 97
Skype: charles.andres.wmch
IRC://irc.freenode.net/wikimedia-chhttp://prezi.com/user/Andrescharles/
Hi,
Currently on Commons, the access to the GWT is limited to a small group
of "happy fews" (who have the GWT permission).
Becoming a part of this group seems to be difficult, even for prominent
players of our community:
https://commons.wikimedia.org/wiki/Commons:Bureaucrats%27_noticeboard#GWToo…
To my opinion, we have nothing to loose (and maybe a lot to win) by
opening the GWT to all Commons users. This is the way we use to work and
what makes us successful. I don't see why we should proceed differently
here.
AFAIK, the only risk of this move would be to be flooded by inadequate
files. That's why we should maybe limit the number of parallel GWT
downloads to 1 or limit the overall number of uploads to 10 per XML file
for non-GWT permitted users. An other solution would be to adapt current
admin tools to allow them to efficiently deal with this new kind of
challenges.
Do you see any other risk?
Regards
Emmanuel
--
Volunteer
Technology, GLAM, Trainings
Zurich
+41 797 670 398
I'm happy to chime in and describe some of our experiences uploading
AV-material using the GWtoolset. We're expecting to make a donation
sometime the next few weeks. Let me know if that is needed!
Regards,
Jesse
2014-04-28 14:01 GMT+02:00 <glamtools-request(a)lists.wikimedia.org>:
> Send Glamtools mailing list submissions to
> glamtools(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/glamtools
> or, via email, send a message with subject or body 'help' to
> glamtools-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> glamtools-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Glamtools digest..."
>
>
> Today's Topics:
>
> 1. Re: Internship at Wikimedia CH around the GWToolset (Kippelboy)
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 28 Apr 2014 11:29:07 +0200
> From: Kippelboy <kippelboy(a)gmail.com>
> To: Conversations revolving around the development of GLAM Digital
> Tools <glamtools(a)lists.wikimedia.org>
> Cc: "rromir.imami(a)wikimedia.ch" <rromir.imami(a)wikimedia.ch>,
> "terburg(a)wikimedia.nl" <terburg(a)wikimedia.nl>
> Subject: Re: [Glamtools] Internship at Wikimedia CH around the
> GWToolset
> Message-ID: <18C24DF8-58B3-4493-BD31-C4913DE2BA87(a)gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Great news! Looking fwd to see the uploaded content and your learnings
>
> :-)
> Kippelboy
>
> Sent from my Casiotone
>
> El 27/04/2014, a les 23.55, Jean-Frédéric <jeanfrederic.wiki(a)gmail.com>
> va escriure:
> Hi Charles & list,
>
>
> >> If everything is fine, the internship should lead to the creation of an
> handbook for GLAMs.
> > WMSE are looking into making documentation for batch uploads better.
> Working together on the GWT handbook sounds lika a brilliant idea!
>
> Yay indeed! Sounds very good. I'd be happy to be looped into this work.
>
> If I understood correctly, Sebastiaan ter Burg from WMNL (cc-ed, not sure
> if he is on that list) is working on a GWT manual.
>
> >> Is there any similar initiative somewhere?
>
> Some time ago I documented a bit how I work for batch uploads. This is not
> totally relevant now that we have the GWT but in case it might be of some
> interest
> <
> https://commons.wikimedia.org/wiki/User:Jean-Fr%C3%A9d%C3%A9ric/Batch_uploa…
> >
> <https://github.com/Commonists/MassUploadLibrary/>
>
> Cheers,
> -- Jean-Frédéric
>
>
> > Dear all,
> >
> > I take the opportunity to introduce Rromir, he's doing an internship
> about the GWToolset for Wikimedia CH.
> >
> > Rromir is not an IT guy, neither am I, we will work together to test the
> GWToolset with three real batch of pictures coming from some of WMCH
> projects I'm leading.
> >
> > The idea behind this internship is to evaluate the usability of the
> GWToolset in the context of Wikimedia CH projects.
> >
> > Rromir is first getting familiar with the tool, then he will evaluate
> its accessibility to non-IT guy and if it's adapted to WMCH needs.
> >
> > If everything is fine, the internship should lead to the creation of an
> handbook for GLAMs.
> >
> > Is there any similar initiative somewhere?
> >
> > Charles
>
> _______________________________________________
> Glamtools mailing list
> Glamtools(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/glamtools
>
Could a list admin check the subscribers and compare with
<https://commons.wikimedia.org/w/index.php?title=Special:ListUsers&group=gwt…>?
I'm thinking it might be useful to post a standard welcome notice for
new users of the tool, including an invite to join this list to
discuss their experiences and initial reactions. Once you have used
the tool a couple of times, it is easy to forget which niggles took
the most time to get your head around.
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Hello!
I submitted an account request on the wmflabs site through the provided link, but I haven't heard anything about it over the last couple of weeks. I was wondering if I could get access to the toolset set up - I'm working with the National Library of Scotland to upload content over the next few months and I think this would be really useful.
Thanks,
Ally
Ally Crockford
Wikimedian-In-Residence
National Library of Scotland
George IV Bridge
Edinburgh EH1 1EW
Scotland, UK
e: a.crockford(a)nls.uk<mailto:a.crockford@nls.uk>
t: (0) 131 623 3797
w: http://www.nls.uk<http://www.nls.uk/>
Follow us on Twitter and Facebook
National Library of Scotland, Scottish Charity, No: SCO11086
This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message.
www.nls.uk