Glamtools

glamtools@lists.wikimedia.org

171 discussions

Thinking about Structured Data on Commons ?
by James Heald 17 Aug '14

17 Aug '14

So now Wikimania has been and gone, can we think about where we're at with the Structured Data initative for Commons ? In particular Liam, I think you said after one of the sessions you were "all over this" -- it would be good to know your thoughts. I do think, as the GW toolset community, we ought to have a lot we should be able to offer here, because essentially we are doing big uploads from data which is *already* structured, so (i) we've got at least some experience already with working with data that is at least in some form structured (ii) we may know and be able to flag some awkward edge cases (iii) we would like to accompany uploads with data that can be "born structured", rather than converted later (iv) in any case we're uploading a lot of images, which somebody is going to have to convert to structured (v) we may have seen (or even written) some of the gnarlier templates on Commons, that migration will have to cope with. It's not clear (at least not yet to me) how the Multimedia and Wikidata teams may best want to be communicated with, but I'm including Keegan (WMF) in cc:, who I think is the staffer with assigned community liaison responsibility. The biggest message to me from looking through some of the documents after the meetings is just how much of the information is going to be stored as part of central main Wikidata. Essentially, if we upload an image of an object, then it is expected that an 'item' (ie a Q-number) for that object will be added to Wikidata, which will contain all the metadata that describes the object rather than just the image. The Wikidata community is already developing a very strong ontology to describe such objects -- key resources are https://www.wikidata.org/wiki/Wikidata:WikiProject_Visual_arts/Item_structu… https://www.wikidata.org/wiki/Wikidata:WikiProject_Books where there are active and friendly communities involved in refining them. We can get involved and help the process right now, by trying to identify and fill and gaps in these ontologies, and by being enthusiastic early adopters -- there is no reason we should not be getting involved right now, filling in appropriate metadata on Wikidata right now each time we upload an image to Commons -- real-world testing the current ontologies to see what creaks. Data specific to the image itself (rather than what it shows) will be stored in a separate Commons Wikibase. This will include such things as the file name, a file description, photographer, wikicontributor name, precise geographical location etc. Commons Wikibase is also likely to contain a tag-like "topic list" -- a list of all the Wikidata Q-numbers that apply to the image. These I think will be gathered by climbing up the Wikidata tree from any specified Subject identified for the image -- so a view of Westminster Abbey might get topics such as "Westminster; London; England; Cathedral; religious building" etc; and games will be invented to encourage people to identify more such topics for the best images. This should allow WM to introduced a proper combinatorial search engine based on tags for Commons; and many of the most egregious Commons intersection categories will wither on the vine. (There is debate as to whether Commons will end up needing *any* category pages, but I suspect it will, because they are just so convenient to use as places for jotting down facts -- on the other hand, it is possible one might be forced to create an associated Commons article/gallery for that). It would be nice (IMO) if there could be an interface to the topic list through the wikisource code for the filepage -- I think this would be well-received by the community, allow easy adaptation of existing bots, etc. But this may be resisted as being too fragile a point of failure, as it would mean that people making hand-edits would have to know (and get right) the meaningless number-strings of individual Q-numbers. Finally some very specific text data -- such as the EXIF data describing shutter-speed etc, is likely to continue to live on the file description page; because it's probably not something that people are primarily going to want to search, and it may be a bit unpredictable. Part of the immediate effort in the next few weeks is going to be to produce clearer ideas about what information is going to live where, and in particular what information is going to live on the Commons Wikibase, and how it will be structured. The good news is that much of the most complicated information will be stored on WikiData, so can be as detailed as we like (and can be accessed live now). On the other hand, the design for Commons Wikibase will initially aim to be as simple as possible, with the aim to evolve it as experience is gained, to migrate the edge cases later. The file description page (or something not entirely unlike it) will continue to exist as a view bringing together all the data. Current templates will be re-written to draw information from Wikidata. However, this won't yet be possible until the Wikidata team has implemented the "Arbitrary Access" feature -- the ability for a wikipage to access the properties of an arbitrary Wikidata Q-number. What's causing the hold-up is that if the properties of the Q-number item are edited, then all the pages that access that Q-number need to be marked as dirty and regenerated. That's easy if you only have one page that can access the Q-number, but hard if arbitrary pages can access it, through a chain of properties. (eg: the file page for a painting Q12345 may use property Pnnn to its creator Q4567 who has property Pxxx, a date of birth. If the date of birth gets made more precise, the system has to recurse back to indicate that all the file pages showing pictures of that creator's work need to be regererated. This is tough, but file-page templates won't be able to draw on Wikidata information until it is in place). It is progressively hoped to simplify the myriad of different templates used on the file pages as quickly as possible, to standardise them to draw from the structured data stores. Templates to display summary information about collection objects, which will draw from Wikidata, may well be standardised so they can easily be used on Wikipedias and other wikis -- or, to put that the other way round, since Wikipedias and other wikis will also be developing standardised templates to display summary object information, it should well be possible to use the same code twice. However, it would be good to get involved in the development of these templates, to make sure they accurately reflect the information we currently like to show in Commons. (There may be some important details to get right -- for example the Wikidata data-type for dates currently comprises a 'best' value, and an optional numeric range (which is great for sorting). But if the catalogue source data says eg "mid 17th century to early 18th century", do we want to make sure that precise string is still stored? And should it still be possible to make it visible? This needs close engagement; but probably principally with the community-based development effort in the Wikidata community groups. Already very standardised are the present Commons creator templates and Commons institution templates. These are likely to be an early quick win. Looking down a typical present-day filepage, that means that it is the Source/Photographer information in the present "Artist" template, which is currently free-form and often a freely composed pull together of multiple different sources of metadata, that is likely to be going to need the most work to unpick. This is also the field most commonly used for the credit link-back templates to the originating GLAM institutions, which are obviously a key consideration for our GLAM partners. These templates may currently often be very institution-specific, and may do quite complex stuff -- eg the present version of the British Library https://commons.wikimedia.org/wiki/Template:British_Library_image as used at eg https://commons.wikimedia.org/wiki/File:Cuthbert_discovers_piece_of_timber_… contains link-backs to a number of catalogues, each with their own corresponding text; and as well as linking back to the information about the underlying object (which is likely to be stored on WikiData), it will also likely contain a link-back to the source of the original file (in this case the specific file at BL images online), which being information specific to the file is likely to be living on the Commons Wikibase. The Source/Photographer field as a whole is (I think) likely to be one of the last on the file page to be assimilated, because it can be so sui-generis, and so the present rats nest of templates may continue to be acceptable for some time -- though even they are likely to need modification, as eg Photographer information moves to the Commons Wikibase. That said, each institution is only going to need to manage its own template. But it probably would make sense to start an effort to think * what is the structured data that typically lives in these templates? * and is there some standardisation we could start to get into the box, even now Apart from anything else, something readily customisable might be much easier for new institutions to adapt and adopt. For the migration project as a whole, an audit of all the source templates of this sort would be useful. That is something the MM/WD project team could perhaps usefully encourage the community to undertake for them. I have to admit there are lines I am not sure about, as to what gets a Wikidata entry and what does not. For example, when does a photograph deserve its own entry? Perhaps a bright line is that an image of a photograph one took oneself doesn't get an entry on Wikidata, but a photograph by Man Ray perhaps does. What about a photograph by a photographer by more intermediate notability? Or instead, perhaps an engraving from a book of 19th century engravings? It makes sense to create an identifier for the book on Wikidata; and also the place depicted. This is often almost enough to identify the particular image, but really one would want to store the page number, and perhaps the scan number as well. (Since one might well have either one or the other or both). It would probably be good to store some identifier for the set of scans as well -- this too probably doesn't belong on Wikidata, (although one might identify it as set number <identifer> from eg the Mechanical Curator collection, which itself probably then *would* get a Wikidata identifier). So the Commons wikibase probably needs to be able to identify images as having a sequence in a particular set, and that set as perhaps having an identifer that links it to a collection which has a particular Q-number on Wikidata. This is the kind of thinking we will particularly need to be doing over the next few weeks -- what is the metadata that will *not* be stored on Wikidata, so will *need* to be stored on the Commons Wikibase if it is to continue to be accessible? That is something that we as the community need to evolve, thinking of all the use cases we can. I am sure that there was something else I meant to say, but this email seems long enough already. There's a scratchpad of some bookmarks I started keeping on a subpage of my userpage at Wikidata that people are welcome to, https://www.wikidata.org/wiki/User:Jheald/bookmarks This gives a nutshell of where some different fields might be stored https://docs.google.com/presentation/d/1x-vOUr-zveLzoIP6uJC1Sz95xwuTFNwaBqJ… This etherpad is good, esp lines immediately after 140, and "What new fields should be created to complement the old fields?" at 156 (actually in the context of Upload Wizard, but it gives some ideas) http://etherpad.wikimedia.org/p/multimedia-wikidata-catchup There's a spreadsheet showing some of the fields they're thinking about https://docs.google.com/spreadsheets/d/1rk05EcLZpJaqOh5wymK6teIQufH9t0xn6oD… -- though I think quite a lot of what's down as living on WikiData should really be Commons WikiBase -- and also a suggestion based on some simple use-cases: https://docs.google.com/document/d/1C7UTB1kbaf_EisF3LmhpIQkb_ifSkB0rD8IGI9a… though I think we would probably see that as *too* simple, even for a first build, because for many of our applications sequence-number in set & set-identifer in collection are probably essential quantities to have (as they probably are for the WikiSource collection too). Finally, this is an etherpad from the Hackathon just been, which has a lot of useful links at the end. http://etherpad.wikimedia.org/p/structured-data-discussion-7-august-2014 Hope this initial brain dump is of at least some use, to make it worth its length, All best, James Heald. (User:Jheald).

4 6

Preparations for Wikimania
by Liam Wyatt 01 Aug '14

01 Aug '14

Dear all on the Glamtools and Europeana Steering Committee mailinglists. In preparation for Wikimania I thought I'd send out this summary of what's going on. Obviously not everyone can attend the conference, but for those that can, this will hopefully be of use to you. There is information here about the *Stall,* the *Flyer*, the *GWT Wiki page *the *Schedule*, and the *Task Force launch*. *Stall* We have been confirmed to have a stall for the duration of the conference in the main foyer. Here is the map <https://wikimania2014.wikimedia.org/wiki/Community_Village>. We are kind-of in the corner so I'm hoping one of the other groups that applied for a stall doesn't show up, and we can 'borrow' their space... A banner and lots of Europeana Labs/Europeana API etc. brochures have arrived in London already. I will be trying to sit at this stall during most lunches/breaks - and I would very much appreciate if YOU could join me. If you see the desk unattended, and you want to sit and help, please do. If past Wikimanias are any guide, chairs during break-times will be a valuable commodity! *Flyer* Also we have a very pretty flyer to hand out at this stall and at every GWToolset related presentation/hacking. No one should be able to leave this conference without having heard about the GWT! Wikimedia-UK generously agreed to organise and pay for the printing of this flyer - so we have 1250 of them! They are A5 size. (front side <https://commons.wikimedia.org/wiki/File:GLAMwiki_Toolset_Flyer_-_front_side…> image, back side <https://commons.wikimedia.org/wiki/File:GLAMwiki_Toolset_Flyer_-_back_side.…> image). Especial thanks go to Sebastiaan for his generous assistance taking my design and turning into this professional publication! *GLAMwiki Toolset Wiki page* As you can see, that flyer tells people to go to bit.ly/GWtoolset (which directs to [[Commons:GLAMwiki Toolset]] ). So, I've been very bold in completely restructuring that page to make it more attractive to new visitors - adding some example uploads, live statistics of usage, links to the different sub-pages (documentation, project history etc.), and trying to give an accurate but easily digestible summary of how to use the tool. I'm applying Cunningham's law <https://meta.wikimedia.org/wiki/Cunningham's_Law> by including the "instructions" section - please change it if it is incorrect. Also, anyone who is particularly good at making mediawiki pages look *pretty *is very welcome to go in and help improve it. *Schedule* GLAMwikiToolset/Europeana related events at Wikimania. For this I can't beat the summary that Dan Entous created earlier today which I've copied below. There are lots of relevant things of course but these are the directly related items. (I should also add to Dan's list a panel session about the Europeana Fashion events that is happening on Friday at 12.) *Wikimania - overall programme* > > - https://wikimania2014.wikimedia.org/wiki/Programme > > *GWToolset Hackathon Sprint* > > - this is for programmers -- *idea is to involve any programmers that > are interested in programming for the tool attend in order to learn how to > program for the > tool: https://wikimania2014.wikimedia.org/wiki/Hackathon#GWToolset > <https://wikimania2014.wikimedia.org/wiki/Hackathon#GWToolset>* > > > - there is no fixed schedule -- just *an open hacking sprint during > the day: **wednesday aug 6 and **thursday august 7* > > *Structured Data* > > - thursday, august 7 > > > - 10:00 - 11:30 > > > - > https://wikimania2014.wikimedia.org/wiki/Programme#Thursday.2C_August_7 > > > - this is probably a good discussion to attend > > *Europeana Task Force* > > - thursday, august 7 > > > - 16:00 - 17:00 > > > - > https://wikimania2014.wikimedia.org/wiki/Programme#Thursday.2C_August_7 > > *GWToolset - What is It* > > - sunday, august 10 > > > - 12:00 - 12:30 > > > - > https://wikimania2014.wikimedia.org/wiki/Programme#Sunday.2C_August_10 > > > - this is joris' talk about the toolset > > *GWToolset - Training Session* > > - sunday, august 10 > > > - 12:30 - 13:00 > > > - > https://wikimania2014.wikimedia.org/wiki/Programme#Sunday.2C_August_10 > > > - this is fae's training session on how to use the tool > > *Task Force launch* As indicated in this list of Dan's, the *Task Force *will be having its first meeting on Thursday afternoon. This is the agenda <https://docs.google.com/document/d/1RgDUkMZX00I_IfdZgnwcYVNb4skHv1da3DsJmMD…>. It will hopefully kick-off a very productive process to produce, in 6 months' time, a report for what the relationship between Europeana and Wikimedia should look like in accordance with their respective strategic plans. (The public documentation of this Task Force will be published on the Europeana website <http://pro.europeana.eu/network/task-forces/overview> very soon I am informed.) [Relatedly, there will be lots of formal and informal side-meetings happening that are related to Europeana specifically or GLAM-Wiki more generally. let's share notes!] Any questions, ideas or problems, please email (or tweet to me during the conference @Wittylama <https://twitter.com/Wittylama>). And with that - I look forward to seeing many of you at Wikimania! Your humble Europeana GLAM-Wiki coordinator, - Liam wittylama.com Peace, love & metadata

1 0

Structured Data presentation at Wikimania
by Liam Wyatt 28 Jul '14

28 Jul '14

Fabrice Florin, the WMF's head of Multimedia has just pointed me to an interesting meeting that will be taking place at the Hackathon of Wikimania: https://wikimania2014.wikimedia.org/wiki/Hackathon#Structured_Data > *Structured Data *The Multimedia and Wikidata teams will host a > roundtable discussion on our new Structured Data project > <https://www.mediawiki.org/wiki/Multimedia/Structured_Data>, to implement > machine-readable data for storing information about images and media files > on Wikimedia Commons. We plan to develop this project together in fall 2014 > and winter 2015, and it is likely to have a major impact on all aspects of > multimedia usage, from viewing to uploads, editing, curation and using > files on articles. We welcome the participation of all users familiar with > using multimedia on Commons and other Wikimedia sites. I think this sounds very much like the kind of thing that users of the GLAMwikiToolset (who are attending Wikimania) would be interested in knowing about - especially in the context of any future development in this field that will help bring WIkimedia Commons and Wikidata closer together. I'm sure this also fits well with the scope of Europeana's strategic interests too. You can sign up at the link above (no specific time give just yet). Sincerely, -Liam wittylama.com Peace, love & metadata

1 0

GWT may have breaks in uploads for an hour or more
by Fæ 25 Jul '14

25 Jul '14

Hi, Could someone explain what is happening when large uploads have long breaks? My long term HABS upload (250,000 files) has had notable pauses, mostly around an hour, though this morning has seen a 4 hour break in uploading. I had thought that the upload has completed, but as soon as I started another tranche, it restarted (possibly a coincidence, though maybe it unlocked the scheduling for the job in some way). It would be nice to know, as it might come up during the sessions at Wikimania and we would not want users to assume an upload had broken if all they were seeing was a scheduling "pause". Fae -- faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae

2 3

Re: [Glamtools] Uploading books via GWtoolset - query
by Crockford, Ally 23 Jul '14

23 Jul '14

Hi Fae, Thanks for this - I will definitely have a look at your book uploads and try to work out what the best approach is. There would definitely be sufficient content for a batch upload project page at some point when we look at uploading larger digitised book, but for the time being I was hoping to start with a smaller-scale test batch. To the best of my knowledge, the library has no djvu versions of their digitised books, only PDFs, and I'm not yet clear whether they would be suitable for upload (or, to be honest, whether I could make the case in the limited time left in post - they've become comfortable with individual images, but that took time to get across, and when I suggested uploading batches of PDFs I still met resistance). It might be that book uploads are something that have to wait until Autumn, and they will be a separate batch upload project entirely. It's sounding like that might be the better approach to take. Thanks again! Ally Ally Crockford Wikimedian-In-Residence National Library of Scotland George IV Bridge Edinburgh EH1 1EW Scotland, UK e: a.crockford(a)nls.uk<mailto:a.crockford@nls.uk> t: (0) 131 623 3797 w: http://www.nls.uk<http://www.nls.uk/> Follow us on Twitter and Facebook National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. www.nls.uk

1 0

Uploading books via GWtoolset - query
by Crockford, Ally 23 Jul '14

23 Jul '14

The NLS has some digitised books that I am keen to upload to Commons where appropriate and I'd like to do so as separate image files (for a number of reasons - can go into them if you'd like but would rather save time). My question is whether anyone has done this in the past and has found a way to include the booknavibar template when uploading? I was thinking of incorporating the template into the XML file the organisation generates for upload, as they can shape that pretty freely, but wasn't sure if I would run into an issue with mapping. Any advice would be greatly appreciated! Ally Crockford Wikimedian-In-Residence National Library of Scotland George IV Bridge Edinburgh EH1 1EW Scotland, UK e: a.crockford(a)nls.uk<mailto:a.crockford@nls.uk> t: (0) 131 623 3797 w: http://www.nls.uk<http://www.nls.uk/> Follow us on Twitter and Facebook National Library of Scotland, Scottish Charity, No: SCO11086 This communication is intended for the addressee(s) only. If you are not the addressee please inform the sender and delete the email from your system. The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of National Library of Scotland. This message is subject to the Data Protection Act 1998 and Freedom of Information (Scotland) Act 2002. No liability is accepted for any harm that may be caused to your systems or data by this message. www.nls.uk

2 1

GWToolset error message
by Monica Mora 23 Jul '14

23 Jul '14

Hi all I was using the GWToolset (on Commons) to do a batch upload and got this message : PHP fatal error in /usr/local/apache/common-local/php-1.24wmf13/includes/OutputPage.php line 1296: Call to a member function getText() on a non-object Previously, I did an upload test with one file and went well: https://commons.wikimedia.org/wiki/File:Mochila_Cangrejos_de_colores.jpg This is a sample code of my XML file: <?xml version="1.0" encoding="UTF-8"?> <records> <record> <URL>http://mochila_images.s3.amazonaws.com/aaa008.jpg</URL> <mochila_titulo>Mochila_Cangrejos de colores</mochila_titulo> <description>Cangrejo Halloween en la Isla Iguana, Los Santos</description> <author>José Manuel Castrellon</author> <source>Fundación Almanaque Azul</source> <mochila_relacion_programa_educativo>{{Information field|name=mochila_relacion_programa_educativo|value=Los seres vivos y su ambiente}}</mochila_relacion_programa_educativo> <mochila_provincia>{{Information field|name=mochila_provincia|value=Los Santos}}</mochila_provincia> <mochila_distrito>{{Information field|name=mochila_distrito|value=Pedasí}}</mochila_distrito> <mochila_corregimiento>{{Information field|name=mochila_corregimiento|value=Mariabé}}</mochila_corregimiento> <mochila_lugar>{{Information field|name=mochila_lugar|value=Isla Iguana}}</mochila_lugar> <Location>{{Location|7.63|-80.00|PA-7>}}</Location> <mochila_georef_type>{{Information field|name=mochila_georef_type|value=Centroide}}</mochila_georef_type> <date>-</date> <Category>Gecarcinus quadratus</Category> <mochila_ecosistema>{{Information field|name=mochila_ecosistema|value=}}</mochila_ecosistema> <mochila_grupos_humanos>{{Information field|name=mochila_grupos humanos|value=}}</mochila_grupos_humanos> <mochila_palabras_clave>{{Information field|name=mochila_palabras_clave|value=Animal, Crustáceo, Cangrejo, Isla}}</mochila_palabras_clave> <mochila_tema>{{Information field|name=mochila_tema|value=Interacción hombre- ambiente}}</mochila_tema> <mochila_paisaje>{{Information field|name=mochila_paisaje|value=Natural}}</mochila_paisaje> <mochila_nombre_comun>{{Information field|name=mochila_nombre_comun|value=Cangrejo de mangle, cangrejo de tierra, cangrejo hóloween}}</mochila_nombre_comun> <mochila_codigo_admin>{{Information field|name=mochila_codigo_admin|value=70503}}</mochila_codigo_admin> <mochila_geologia>{{Information field|name=mochila_geologia|value=Sedimentarias}}</mochila_geologia> <mochila_cuenca>{{Information field|name=mochila_cuenca|value=}}</mochila_cuenca> <mochila_SINAP>{{Information field|name=mochila_SINAP|value=R.V.S. Isla Iguana}}</mochila_SINAP> </record> </records> I appreciate any help. Monica Mora

2 3

GLAMwiki Toolset announcement blogpost
by Liam Wyatt 22 Jul '14

22 Jul '14

Now that the first GLAM has uploaded their own content with the GWToolset (Beeld en Geluid) we have just published a blogpost formally announcing the tool's existence to the wider world: http://pro.europeana.eu/pro-blog/-/blogs/sharing-multimedia-on-wikipedia-no… Please share it with your colleagues, GLAMs, chapters, social media... This post also highlights the work of the most prolific used of the tool thus far (Fae) and the person who has made the most use of the content which has been uploaded alredy (Taketa). I'll tell people about this blogpost on the cultural partnerships mailing lists tomorrow and also the Wikipedia Signpost. Hopefully this will also generate some interest in Dan's proposed hacking workshop at Wikimania (sign up here https://wikimania2014.wikimedia.org/wiki/Hackathon#GWToolset ) and the two presentations at Wikimania on the Sunday 11:30 session ( https://wikimania2014.wikimedia.org/wiki/Programme#Sunday.2C_August_10 ) Thank you to Dan especially for all the incredibly hard work you've put in to this project over the years. Now that it's "out there" I suspect that it will become popular enough that it starts to get peppered with people trying to push the boundaries of what is possible. And, a bit like the way Magnus' tools always seem to go from being 'proof of concept' to 'mandatory tool' very quickly, I am hoping that the GWT will become a standard feature of GLAM activities very soon. On a personal note, I realise I've not been involved in actual development phase of the software but it is nice to be at the 'birth' of this project that I helped introduce <https://commons.wikimedia.org/w/index.php?title=File%3A2011_GLAMcamp_Amster…>, back at GLAMcamp:Amstersdam in 2011! Sincerely, -Liam wittylama.com Peace, love & metadata

1 0

wikimania 2014 hackathon
by dan-nl 22 Jul '14

22 Jul '14

dear all, just added a sprint to the hackathon page for wikimania 2014. please pass this link on to anyone who you think might be interested -- https://wikimania2014.wikimedia.org/wiki/Hackathon#GWToolset. with kind regards, dan

2 1

GWToolset operational issues for very large TIFFs
by Fæ 18 Jul '14

18 Jul '14

SUMMARY: This week I experienced an issue when uploading several hundred very high resolution maps as part the NYPL maps project.[1] Discussion has been going on in several places and this thread is an attempt to share a discussion in one place so all users can benefit. [Gilles, Could you join this low volume open email list to keep track of GWT issues and be a voice for WMF Operations to help us reach a recommendation for end user best practices?] HISTORY For our GLAM projects my upload was unusually stressful for the WMF servers. Individual map scans are up to 300 MB images, and resolutions can exceed 80 megapixels (80 million pixels). There are 20,000 tiff images to be uploaded, I have completed around 12%. I used the GLAMtoolset at full capacity (20 threads) though I had broken the xml file up, so runs were a few hundred images at a time. My intention was to ramp this up to a couple of thousand per upload "tranche". I was contacted on Tuesday by operations asking for me to suspend the upload as the demand for attempted thumbnail rendering of the tiff images was too high a load on WMF servers.[2] Over 500 of the tiff images were greater than 50 megapixels and as a consequence Commons fails to render any thumbnails (they are created for jpegs greater than this limit, this is a tiff specific constraint).[3] CURRENT STATE With no obvious immediate fix/work-around on the table from WMF ops, I have proposed to re-start my uploads for this project with an effective throttle by using 2 threads (this is a setting on the first screen of the GWToolset. In practice, having tried a run of a couple of hundred, this means that the tool is uploading 100MB sized images at a rate of 2 every 5 minutes. This seems to not be causing any issues. WAY FORWARD In the longer term the WMF is looking at alternatives for rendering tiff thumbnails which will enable 50MP+ images to be handled; this may or may not help solve the problem seen this week.[4] I recommend that the GWToolset on-wiki guides include a recommendation about how to choose the number of processing threads based on the types of images to be uploaded. To date, no other project has seen these problems, probably because the image resolutions fall well under the 50MP threshold. The maximum allowed number of threads is 20, with a default being 10. For the time being I suggest that we agree a best practice that for upload projects with tiffs over 50MP, that no more than 2 threads are used; these problems do not appear to exist for projects uploading smaller resolution files. I propose that WMF Operations consider finding ways of testing the peak loads possible from the GWT and decide if this can be fixed by future operational improvements, whether the tool might benefit from some simple "load management" changes, or if establishing a best practice for our (relatively) small number of GWT users would be a sufficient community based control. Links 1. https://commons.wikimedia.org/wiki/Commons:Batch_uploading/NYPL_Maps 2. https://commons.wikimedia.org/wiki/Commons_talk:Batch_uploading/NYPL_Maps 3. https://commons.wikimedia.org/wiki/Category:NYPL_maps_%28over_50_megapixels… 4. https://bugzilla.wikimedia.org/show_bug.cgi?id=52045 Fae -- faewik(a)gmail.com https://commons.wikimedia.org/wiki/User:Fae

8 36

← Newer
1
...
9
10
11
12
13
14
15
...
18
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Glamtools