Hi, I'm preparing an image donation of some 350 picture books from 1810 to 1880 (taken from the collection http://www.geheugenvannederland.nl/?/en/collecties/prentenboeken_van_1810_to...) For every book I've constructed an XML file describing the pages (metadata). So eg. for a book of 20 pages I've an XML with 20 records. I can upload these in the normal way via the GWToolset webinterface, also assigning a Commons category to the book.
For 1 book that's doable, but for 350 books I would need to upload 350 XML files, 1 by 1, using the GWT-webinterface (using the same json mapping file for all uploads). But this would take me a lot of time (and it's rather boring)...
So I'm wondering if / how I could automate this. Is there a more direct/efficient way?
I can image that I could do some command line interfacing (Pywiki??), with the XML, the json-mapping and the target Commonscat-name as input parameters. Would that be an option?
Any tricks, tips & directions are very welcome
Met vriendelijke groet / With kind regards
Olaf Janssen
Wikipedia & open data coordinator
Koninklijke Bibliotheek - National Library of the Netherlands olaf.janssen@kb.nlmailto:olaf.janssen@kb.nl +31 (0)70 3140 388 @ookgezellig www.slideshare.net/OlafJanssenNLhttp://www.slideshare.net/OlafJanssenNL
[Koninklijke Bibliotheek, National Library of the Netherlands] Prins Willem-Alexanderhof 5 | 2595 BE Den Haag Postbus 90407 | 2509 LK Den Haag | (070) 314 09 11 | www.kb.nlhttp://www.kb.nl/ [http://www.kb.nl/sites/default/files/dots.jpg] English versionhttp://www.kb.nl/en/email | Disclaimerhttp://www.kb.nl/disclaimer
have you considered uploading to Internet Archive and then uploading to commons using IAupload tool? (this is the normal process for texts)
don't know if that can be automated
https://internetarchive.readthedocs.org/en/latest/cli.html https://tools.wmflabs.org/ia-upload/commons/init https://github.com/Tpt/ia-upload
i'm afraid i've only done them one at a time cheers
On Mon, Feb 15, 2016 at 7:18 AM, Olaf Janssen Olaf.Janssen@kb.nl wrote:
Hi,
I’m preparing an image donation of some 350 picture books from 1810 to 1880 (taken from the collection http://www.geheugenvannederland.nl/?/en/collecties/prentenboeken_van_1810_to... )
For every book I’ve constructed an XML file describing the pages (metadata). So eg. for a book of 20 pages I’ve an XML with 20 records. I can upload these in the normal way via the GWToolset webinterface, also assigning a Commons category to the book.
For 1 book that’s doable, but for 350 books I would need to upload 350 XML files, 1 by 1, using the GWT-webinterface (using the same json mapping file for all uploads). But this would take me a lot of time (and it’s rather boring)…
So I’m wondering if / how I could automate this. Is there a more direct/efficient way?
I can image that I could do some command line interfacing (Pywiki??), with the XML, the json-mapping and the target Commonscat-name as input parameters. Would that be an option?
Any tricks, tips & directions are very welcome
Met vriendelijke groet / With kind regards
Olaf Janssen
Wikipedia & open data coordinator
Koninklijke Bibliotheek - National Library of the Netherlands olaf.janssen@kb.nl
+31 (0)70 3140 388 @ookgezellig
www.slideshare.net/OlafJanssenNL
[image: Koninklijke Bibliotheek, National Library of the Netherlands] Prins Willem-Alexanderhof 5 | 2595 BE Den Haag Postbus 90407 | 2509 LK Den Haag | (070) 314 09 11 | www.kb.nl
English version http://www.kb.nl/en/email | Disclaimer http://www.kb.nl/disclaimer
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
The easiest way is to merge your XML files into one large XML, then put this through the GWT. Though I can imagine ways of automating the front end of GWT, it would be a clumsy way of going about it.
If your concern is that you want to create separate book categories, then add a category field in the XML that can vary by book. You can add several variable categories as an option on the mappings page. For example https://commons.wikimedia.org/wiki/File:DAILY_MENU_%28held_by%29_REVERE_HOUSE_%28at%29_%22BOSTON,_MA%22_%28%28HOTEL%3F%29%29_%28NYPL_Hades-269316-476896%29.tiff was uploaded with the category "NYPL Rare Book Division" automatically generated from the NYPL metadata. (To be fair, I'm not using the GWT for most of the NYPL material for reasons mentioned on the related project page.)
Responding to J Hayes comment in this thread, you can mass upload to IA with off-the shelf Python modules. However just as much care should be taken to map out the metadata using IA's metadata options, as they are incredibly open/flexible, the archives tend to be confusingly inconsistent. This would still leave a challenge of finding a good mapping for Commons templates if you then wanted to upload from IA to Commons rather than from somewhere else.
Fae
On 15 February 2016 at 12:18, Olaf Janssen Olaf.Janssen@kb.nl wrote:
Hi,
I’m preparing an image donation of some 350 picture books from 1810 to 1880 (taken from the collection http://www.geheugenvannederland.nl/?/en/collecties/prentenboeken_van_1810_to...)
For every book I’ve constructed an XML file describing the pages (metadata). So eg. for a book of 20 pages I’ve an XML with 20 records. I can upload these in the normal way via the GWToolset webinterface, also assigning a Commons category to the book.
For 1 book that’s doable, but for 350 books I would need to upload 350 XML files, 1 by 1, using the GWT-webinterface (using the same json mapping file for all uploads). But this would take me a lot of time (and it’s rather boring)…
So I’m wondering if / how I could automate this. Is there a more direct/efficient way?
I can image that I could do some command line interfacing (Pywiki??), with the XML, the json-mapping and the target Commonscat-name as input parameters. Would that be an option?
Any tricks, tips & directions are very welcome
Met vriendelijke groet / With kind regards
Olaf Janssen
Wikipedia & open data coordinator
Koninklijke Bibliotheek - National Library of the Netherlands olaf.janssen@kb.nl
+31 (0)70 3140 388 @ookgezellig
www.slideshare.net/OlafJanssenNL
Dear all,
Upload using ftp.company.com possible? -------------------------------------- A scanning company has scanned photos/slides for a Dutch GLAM and offers them at their ftp-site with username and passwd for download.
If they take away this username/passwd, would GWToolset accept as upload URL for instance
http://ftp.company.nl/(..directories to be specified..)/1_01.tif
? i would guess yes, but i'd like to here from you experts out there.
Thanks a lot, hans muller
Update of my question:
1. Would a ftp-domain (not http(s) protocol) like
be acceptable for GWToolset?
2. If so, would a call with user/passwd be acceptable as an upload URL for GWToolset? (Of course after whitelisting the domain.) Type:
http://ftp.company.com?user=...&password=...&dir=TIFF/1&file=1_0...
Or would .... for instance the access time be too slow for GWToolset (depends of course) etc.
Thank you for considering this question,
hans muller https://commons.wikimedia.org/wiki/User:Hansmuller
Op Zo, 28 februari, 2016 12:11 pm schreef Hans Muller:
Dear all,
Upload using ftp.company.com possible?
A scanning company has scanned photos/slides for a Dutch GLAM and offers them at their ftp-site with username and passwd for download.
If they take away this username/passwd, would GWToolset accept as upload URL for instance
http://ftp.company.nl/(..directories to be specified..)/1_01.tif
? i would guess yes, but i'd like to here from you experts out there.
Thanks a lot, hans muller
On Sun, Feb 28, 2016 at 6:49 AM, Hans Muller j.m.muller@hccnet.nl wrote:
Update of my question:
- Would a ftp-domain (not http(s) protocol) like
be acceptable for GWToolset?
is an http protocol url for a server named ftp. Did you mean ftp://ftp.company.com ?
http://ftp.company.com is acceptable (Since it starts with http://), but ftp://ftp.company.com would not be.
In principle we might be able to add ftp support (It uses curl on the backend which supports ftp, I think its just the validation code that rejects ftp). We'd also need to make sure that the squid proxy supports ftp, (Squid in principle supports ftp, but I have no idea if its enabled in Wikimedia.
So basically, I'd suggest filing a bug. If anyone was actually maintaining GWToolset it would probably get fixed. Given the current situation, who knows.
- If so, would a call with user/passwd be acceptable as an upload URL for
GWToolset? (Of course after whitelisting the domain.) Type:
http://ftp.company.com?user=...&password=...&dir=TIFF/1&file=1_0...
The syntax for username and password in urls is
ftp://username:password@ftp.company.com/TIFF/1/1_01.tiff
This is also true for http urls when using HTTP authentication.
Or would .... for instance the access time be too slow for GWToolset (depends of course) etc.
Timeouts would be the same as for http. Which are quite high, so it would probably be fine on that count.
-- -bawolff
Thanks for your swift reply, even on Sunday!
Method 1
http://ftp.company.com?user=NAME&password=PASSWD&dir=DIR&file=FI...
works! but Method 2
ftp://username:password@ftp.company.com/DIR/FILE
did not, at least in my hands.
Anyway, i wonder whether the group on Phabricator would whitelist a domain for use with Method 1. However, they did whitelist a Dutch scanning company domain
memorix.nl
it says on the GWToolset starting page list of whitelisted domains, perhaps also behind a user/passwd lock...
Thx and best regards, hans muller https://commons.wikimedia.org/wiki/User:Hansmuller
Op Zo, 28 februari, 2016 7:36 pm schreef bawolff:
On Sun, Feb 28, 2016 at 6:49 AM, Hans Muller j.m.muller@hccnet.nl wrote:
Update of my question:
- Would a ftp-domain (not http(s) protocol) like
be acceptable for GWToolset?
is an http protocol url for a server named ftp. Did you mean ftp://ftp.company.com ?
http://ftp.company.com is acceptable (Since it starts with http://), but ftp://ftp.company.com would not be.
In principle we might be able to add ftp support (It uses curl on the backend which supports ftp, I think its just the validation code that rejects ftp). We'd also need to make sure that the squid proxy supports ftp, (Squid in principle supports ftp, but I have no idea if its enabled in Wikimedia.
So basically, I'd suggest filing a bug. If anyone was actually maintaining GWToolset it would probably get fixed. Given the current situation, who knows.
- If so, would a call with user/passwd be acceptable as an upload URL
for GWToolset? (Of course after whitelisting the domain.) Type:
http://ftp.company.com?user=...&password=...&dir=TIFF/1&file=1_0...
The syntax for username and password in urls is
ftp://username:password@ftp.company.com/TIFF/1/1_01.tiff
This is also true for http urls when using HTTP authentication.
Or would .... for instance the access time be too slow for GWToolset (depends of course) etc.
Timeouts would be the same as for http. Which are quite high, so it would probably be fine on that count.
-- -bawolff