Hi
In case no one noticed it: I opened this bug last week, which prevents me
from finalising my mass upload:
<https://bugzilla.wikimedia.org/show_bug.cgi?id=68285>
Would be awesome if anyone had some time to look into it!
Thanks,
--
Jean-Frédéric
Fabrice Florin, the WMF's head of Multimedia has just pointed me to an
interesting meeting that will be taking place at the Hackathon of Wikimania:
https://wikimania2014.wikimedia.org/wiki/Hackathon#Structured_Data
> *Structured Data *The Multimedia and Wikidata teams will host a
> roundtable discussion on our new Structured Data project
> <https://www.mediawiki.org/wiki/Multimedia/Structured_Data>, to implement
> machine-readable data for storing information about images and media files
> on Wikimedia Commons. We plan to develop this project together in fall 2014
> and winter 2015, and it is likely to have a major impact on all aspects of
> multimedia usage, from viewing to uploads, editing, curation and using
> files on articles. We welcome the participation of all users familiar with
> using multimedia on Commons and other Wikimedia sites.
I think this sounds very much like the kind of thing that users of the
GLAMwikiToolset (who are attending Wikimania) would be interested in
knowing about - especially in the context of any future development in this
field that will help bring WIkimedia Commons and Wikidata closer together.
I'm sure this also fits well with the scope of Europeana's strategic
interests too.
You can sign up at the link above (no specific time give just yet).
Sincerely,
-Liam
wittylama.com
Peace, love & metadata
Hi,
Could someone explain what is happening when large uploads have long
breaks? My long term HABS upload (250,000 files) has had notable
pauses, mostly around an hour, though this morning has seen a 4 hour
break in uploading. I had thought that the upload has completed, but
as soon as I started another tranche, it restarted (possibly a
coincidence, though maybe it unlocked the scheduling for the job in
some way).
It would be nice to know, as it might come up during the sessions at
Wikimania and we would not want users to assume an upload had broken
if all they were seeing was a scheduling "pause".
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae
Hi Fae,
Thanks for this - I will definitely have a look at your book uploads and try to work out what the best approach is. There would definitely be sufficient content for a batch upload project page at some point when we look at uploading larger digitised book, but for the time being I was hoping to start with a smaller-scale test batch.
To the best of my knowledge, the library has no djvu versions of their digitised books, only PDFs, and I'm not yet clear whether they would be suitable for upload (or, to be honest, whether I could make the case in the limited time left in post - they've become comfortable with individual images, but that took time to get across, and when I suggested uploading batches of PDFs I still met resistance).
It might be that book uploads are something that have to wait until Autumn, and they will be a separate batch upload project entirely. It's sounding like that might be the better approach to take.
Thanks again!
Ally
Ally Crockford
Wikimedian-In-Residence
National Library of Scotland
George IV Bridge
Edinburgh EH1 1EW
Scotland, UK
e: a.crockford(a)nls.uk<mailto:a.crockford@nls.uk>
t: (0) 131 623 3797
w: http://www.nls.uk<http://www.nls.uk/>
Follow us on Twitter and Facebook
National Library of Scotland, Scottish Charity, No: SCO11086
This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message.
www.nls.uk
The NLS has some digitised books that I am keen to upload to Commons where appropriate and I'd like to do so as separate image files (for a number of reasons - can go into them if you'd like but would rather save time).
My question is whether anyone has done this in the past and has found a way to include the booknavibar template when uploading? I was thinking of incorporating the template into the XML file the organisation generates for upload, as they can shape that pretty freely, but wasn't sure if I would run into an issue with mapping. Any advice would be greatly appreciated!
Ally Crockford
Wikimedian-In-Residence
National Library of Scotland
George IV Bridge
Edinburgh EH1 1EW
Scotland, UK
e: a.crockford(a)nls.uk<mailto:a.crockford@nls.uk>
t: (0) 131 623 3797
w: http://www.nls.uk<http://www.nls.uk/>
Follow us on Twitter and Facebook
National Library of Scotland, Scottish Charity, No: SCO11086
This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message.
www.nls.uk
Hi all
I was using the GWToolset (on Commons) to do a batch upload and got this
message :
PHP fatal error in
/usr/local/apache/common-local/php-1.24wmf13/includes/OutputPage.php line
1296:
Call to a member function getText() on a non-object
Previously, I did an upload test with one file and went well:
https://commons.wikimedia.org/wiki/File:Mochila_Cangrejos_de_colores.jpg
This is a sample code of my XML file:
<?xml version="1.0" encoding="UTF-8"?>
<records>
<record>
<URL>http://mochila_images.s3.amazonaws.com/aaa008.jpg</URL>
<mochila_titulo>Mochila_Cangrejos de colores</mochila_titulo>
<description>Cangrejo Halloween en la Isla Iguana, Los
Santos</description>
<author>José Manuel Castrellon</author>
<source>Fundación Almanaque Azul</source>
<mochila_relacion_programa_educativo>{{Information
field|name=mochila_relacion_programa_educativo|value=Los seres vivos y su
ambiente}}</mochila_relacion_programa_educativo>
<mochila_provincia>{{Information field|name=mochila_provincia|value=Los
Santos}}</mochila_provincia>
<mochila_distrito>{{Information
field|name=mochila_distrito|value=Pedasí}}</mochila_distrito>
<mochila_corregimiento>{{Information
field|name=mochila_corregimiento|value=Mariabé}}</mochila_corregimiento>
<mochila_lugar>{{Information field|name=mochila_lugar|value=Isla
Iguana}}</mochila_lugar>
<Location>{{Location|7.63|-80.00|PA-7>}}</Location>
<mochila_georef_type>{{Information
field|name=mochila_georef_type|value=Centroide}}</mochila_georef_type>
<date>-</date>
<Category>Gecarcinus quadratus</Category>
<mochila_ecosistema>{{Information
field|name=mochila_ecosistema|value=}}</mochila_ecosistema>
<mochila_grupos_humanos>{{Information field|name=mochila_grupos
humanos|value=}}</mochila_grupos_humanos>
<mochila_palabras_clave>{{Information
field|name=mochila_palabras_clave|value=Animal, Crustáceo, Cangrejo,
Isla}}</mochila_palabras_clave>
<mochila_tema>{{Information field|name=mochila_tema|value=Interacción
hombre- ambiente}}</mochila_tema>
<mochila_paisaje>{{Information
field|name=mochila_paisaje|value=Natural}}</mochila_paisaje>
<mochila_nombre_comun>{{Information
field|name=mochila_nombre_comun|value=Cangrejo de mangle, cangrejo de
tierra, cangrejo hóloween}}</mochila_nombre_comun>
<mochila_codigo_admin>{{Information
field|name=mochila_codigo_admin|value=70503}}</mochila_codigo_admin>
<mochila_geologia>{{Information
field|name=mochila_geologia|value=Sedimentarias}}</mochila_geologia>
<mochila_cuenca>{{Information
field|name=mochila_cuenca|value=}}</mochila_cuenca>
<mochila_SINAP>{{Information field|name=mochila_SINAP|value=R.V.S. Isla
Iguana}}</mochila_SINAP>
</record>
</records>
I appreciate any help.
Monica Mora
Now that the first GLAM has uploaded their own content with the GWToolset
(Beeld en Geluid) we have just published a blogpost formally announcing the
tool's existence to the wider world:
http://pro.europeana.eu/pro-blog/-/blogs/sharing-multimedia-on-wikipedia-no…
Please share it with your colleagues, GLAMs, chapters, social media...
This post also highlights the work of the most prolific used of the tool
thus far (Fae) and the person who has made the most use of the content
which has been uploaded alredy (Taketa).
I'll tell people about this blogpost on the cultural partnerships mailing
lists tomorrow and also the Wikipedia Signpost. Hopefully this will also
generate some interest in Dan's proposed hacking workshop at Wikimania
(sign up here https://wikimania2014.wikimedia.org/wiki/Hackathon#GWToolset
) and the two presentations at Wikimania on the Sunday 11:30 session (
https://wikimania2014.wikimedia.org/wiki/Programme#Sunday.2C_August_10 )
Thank you to Dan especially for all the incredibly hard work you've put in
to this project over the years. Now that it's "out there" I suspect that it
will become popular enough that it starts to get peppered with people
trying to push the boundaries of what is possible. And, a bit like the way
Magnus' tools always seem to go from being 'proof of concept' to 'mandatory
tool' very quickly, I am hoping that the GWT will become a standard feature
of GLAM activities very soon.
On a personal note, I realise I've not been involved in actual development
phase of the software but it is nice to be at the 'birth' of this project
that I helped introduce
<https://commons.wikimedia.org/w/index.php?title=File%3A2011_GLAMcamp_Amster…>,
back at GLAMcamp:Amstersdam in 2011!
Sincerely,
-Liam
wittylama.com
Peace, love & metadata
SUMMARY: This week I experienced an issue when uploading several
hundred very high resolution maps as part the NYPL maps project.[1]
Discussion has been going on in several places and this thread is an
attempt to share a discussion in one place so all users can benefit.
[Gilles, Could you join this low volume open email list to keep track
of GWT issues and be a voice for WMF Operations to help us reach a
recommendation for end user best practices?]
HISTORY
For our GLAM projects my upload was unusually stressful for the WMF
servers. Individual map scans are up to 300 MB images, and resolutions
can exceed 80 megapixels (80 million pixels). There are 20,000 tiff
images to be uploaded, I have completed around 12%. I used the
GLAMtoolset at full capacity (20 threads) though I had broken the xml
file up, so runs were a few hundred images at a time. My intention was
to ramp this up to a couple of thousand per upload "tranche".
I was contacted on Tuesday by operations asking for me to suspend the
upload as the demand for attempted thumbnail rendering of the tiff
images was too high a load on WMF servers.[2] Over 500 of the tiff
images were greater than 50 megapixels and as a consequence Commons
fails to render any thumbnails (they are created for jpegs greater
than this limit, this is a tiff specific constraint).[3]
CURRENT STATE
With no obvious immediate fix/work-around on the table from WMF ops, I
have proposed to re-start my uploads for this project with an
effective throttle by using 2 threads (this is a setting on the first
screen of the GWToolset. In practice, having tried a run of a couple
of hundred, this means that the tool is uploading 100MB sized images
at a rate of 2 every 5 minutes. This seems to not be causing any
issues.
WAY FORWARD
In the longer term the WMF is looking at alternatives for rendering
tiff thumbnails which will enable 50MP+ images to be handled; this may
or may not help solve the problem seen this week.[4]
I recommend that the GWToolset on-wiki guides include a recommendation
about how to choose the number of processing threads based on the
types of images to be uploaded. To date, no other project has seen
these problems, probably because the image resolutions fall well under
the 50MP threshold. The maximum allowed number of threads is 20, with
a default being 10. For the time being I suggest that we agree a best
practice that for upload projects with tiffs over 50MP, that no more
than 2 threads are used; these problems do not appear to exist for
projects uploading smaller resolution files.
I propose that WMF Operations consider finding ways of testing the
peak loads possible from the GWT and decide if this can be fixed by
future operational improvements, whether the tool might benefit from
some simple "load management" changes, or if establishing a best
practice for our (relatively) small number of GWT users would be a
sufficient community based control.
Links
1. https://commons.wikimedia.org/wiki/Commons:Batch_uploading/NYPL_Maps
2. https://commons.wikimedia.org/wiki/Commons_talk:Batch_uploading/NYPL_Maps
3. https://commons.wikimedia.org/wiki/Category:NYPL_maps_%28over_50_megapixels…
4. https://bugzilla.wikimedia.org/show_bug.cgi?id=52045
Fae
--
faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae