Hi there!
Over the next year, the Missouri Botanical Gardens plans to identify and extract illustrations from the BHL's 39.3 million scanned pages as part of the Art of Life project [1], and then to publish those illustrations to the Wikimedia Commons [2] (as well as to Flickr [3] and ArtStor). My colleagues and I have spent the last few months developed a metadata schema to provide structured information describing an image -- subjects, "agents" (i.e. publishers, painters, engravers and writers) and inscriptions. Within the Commons, we've created a template to handle this structured data, which we call "Information Art of Life" (based on the ubiquitous Information template): http://commons.wikimedia.org/wiki/Template:Information_Art_of_Life
Since the BHL doesn't have the resources to comprehensively describe all the images itself, our plan is for BHL staff members to minimally describe the illustrations and then to rely on the Commons community to improve metadata, descriptions and categorization. So when images are uploaded to the Commons from the BHL, they will have basic metadata in their "Information Art of Life" templates and basic categorization, and nothing else. We hope to encourage users of BHL illustrations (artists, biologists, humanities scholars, library staff and educators, among others) to take it from there, improving the metadata, descriptions and categorization on the uploaded images. However, as many of them would not have much experience with Wikipedia, we fear that the learning curve in understanding the Commons' template-based metadata system might turn away potential contributors.
To make it easier for non-Wikimedians to contribute, we have been considering developing tools to simplify updating these templates, such as by creating user scripts [5] to provide a form based interface to our template; maybe something visually similar to the Index page form that the ProofreadPage extension creates on Wikisource [6]. Do such tools already exist for the Commons somewhere? What do you think would be the easiest way to simplify the ways in which non-Wikimedians can use the Commons' cataloging system?
Thanks so much for your attention!
cheers, Gaurav http://commons.wikimedia.org/wiki/User:Gaurav
[1] http://biodivlib.wikispaces.com/Art+of+Life [2] http://commons.wikimedia.org/wiki/Category:Files_from_the_Biodiversity_Herit... [3] http://www.flickr.com/photos/biodivlibrary [4] Based on an external links search, see: http://commons.wikimedia.org/w/index.php?title=Special:LinkSearch&limit=... [5] http://commons.wikimedia.org/wiki/Commons:User_scripts [6] An example of an index page form created by the ProofreadPage extension on Wikisource: http://en.wikisource.org/w/index.php?title=Index:Field_Notes_of_Junius_Hende...
What sort of information are you looking for? The few files I checked on [[Category:Files from the Biodiversity Heritage Library]] (butterflies) seem to be described in detail (mention in description + category for each species), is this what you're aiming at?
Managing the information templates seems a nightmare, perhaps you should aim at categories. HotCat works well enough per se, but you still need to know the category guidelines (or better, the precise name of the category). It would be great if the autocompletion could be fixed so that 1) you don't need to know in advance whether the category you need is e.g. "Churches of Finland" vs. "Finnish churches", 2) redirects and soft-redirects are followed, e.g. from plural to singular and viceversa. If such a feature existed, maybe even files uploaded with the UploadWizard may at some point have categories.
Nemo
Heya,
A quick note for metadata fans: since my last e-mail, the BHL has released the first version of the BHL illustration schema for feedback at http://blog.biodiversitylibrary.org/2012/08/interested-in-improving-access-t... -- we'd love your feedback on what information you would like associated with BHL illustrations which would make it easy for you to find images you could use on Wikimedia projects, and then to reuse those images on Wikimedia projects. Do we have adequate copyright information, for instance? Please have a look at our schema and let us know!
On 25 August 2012 00:08, Federico Leva (Nemo) nemowiki@gmail.com wrote:
What sort of information are you looking for? The few files I checked on [[Category:Files from the Biodiversity Heritage Library]] (butterflies) seem to be described in detail (mention in description + category for each species), is this what you're aiming at?
Nemo: SO sorry for the late reply! We're aiming for something like the metadata on the following Commons images: - http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg - http://commons.wikimedia.org/wiki/File:Simonkai.jpg - http://commons.wikimedia.org/wiki/File:PasserMoabiticusWolf.jpg (other examples available at http://commons.wikimedia.org/wiki/Template:Information_Art_of_Life/Gallery)
These images have textual descriptions in the {{Information Art of Life}} template as well as corresponding categories to subjects; we also use the {{inscription}} and {{Creator}} templates to provide more information about what is actually in the image. The {{Creator}} template automatically add the images to the appropriate creator category.
Managing the information templates seems a nightmare, perhaps you should aim at categories.
I'm hopeful that eventually we'll be able to use software to smoothen this process: an {{Information Art of Life}} record would be automatically generated from the basic metadata available at the BHL when the image is uploaded to the Commons; a script could then re-extract the metadata via the Mediawiki API or by reading hidden "span" or "div" tags, for use in moving fully annotated images into other image repositories, such as ArtStor. Until then, I hope the Information Art of Life template will provide a way for Commons editors to structure information about the illustration, especially as pertains to biological species and other subjects.
One thing that would help would be for more templates which could help categorize images. I recently wrote the {{Agent}} template (see http://commons.wikimedia.org/wiki/Template:Agent) which uses {{#ifexists}} to test for a Creator template for the given creator name. If the name exists, it incorporates it into the page, adding the file to the correct creator category in the process. If the name doesn't exist, it instead creates a red-link to where the Creator page should be.
HotCat works well enough per se, but you still need to know the category guidelines (or better, the precise name of the category). It would be great if the autocompletion could be fixed so that 1) you don't need to know in advance whether the category you need is e.g. "Churches of Finland" vs. "Finnish churches", 2) redirects and soft-redirects are followed, e.g. from plural to singular and viceversa. If such a feature existed, maybe even files uploaded with the UploadWizard may at some point have categories.
That would be awesome to have! As something completely unrelated to everything else, has anybody worked on extracting the Commons categories as a Web Ontology Language (OWL) file? It'll be interesting to use OWL inferencing to "check" that categorized as organized consistently, although it would be a *huge* project to work on.
cheers, Gaurav
Do we have adequate copyright information, for instance?
I've only looked at one file: http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg
And it looks like you could improve the copyright info:
Here the copyright claim is that the author died more than 70 years ago, but there is no illustrator death date listed. So to verify the claim, we would need to do some research. So if you have the date of death, and if the book was published outside the US (here it was apparently London, UK), please provide it.
Also, note that the current copyright template says (after a big warning sign): "You must also include a United States public domain taghttp://commons.wikimedia.org/wiki/Commons:Copyright_tags#United_Statesto indicate why this work is in the public domain in the United States." In this case you should use http://commons.wikimedia.org/wiki/Template:PD-1923.
Toby / User:99of9
Hi Toby,
On 4 September 2012 22:06, Toby Hudson tobyyy@gmail.com wrote:
Do we have adequate copyright information, for instance?
I've only looked at one file: http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg
And it looks like you could improve the copyright info:
Here the copyright claim is that the author died more than 70 years ago, but there is no illustrator death date listed. So to verify the claim, we would need to do some research. So if you have the date of death, and if the book was published outside the US (here it was apparently London, UK), please provide it.
Also, note that the current copyright template says (after a big warning sign): "You must also include a United States public domain tag to indicate why this work is in the public domain in the United States." In this case you should use http://commons.wikimedia.org/wiki/Template:PD-1923.
Toby / User:99of9
Ugh, good catch. It looks like http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg might not actually be out of copyright -- it was first published in the UK (not the US as I thought) in 1899, so it remains in copyright for "70 years from the end of the calendar year in which the last remaining author of the work dies" (as per http://www.copyrightservice.co.uk/copyright/p01_uk_copyright_law). F. W. Frohawk, the illustrator, died in 1946 as per http://en.wikipedia.org/wiki/Frederick_William_Frohawk, so none of his works will enter the public domain until 1946+70+1 = 2017.
I've tagged it for deletion, thanks! We'd still love your feedback on the other images!
cheers, Gaurav
Hi Gaurav,
Glad to be of help. I don't have time to go through all your uploads, but here are some notes on a few random samples:
The general point I made in my other email still applies to some of your files
Here are some not currently tagged with a US PD notice (e.g. PD-1923 or PD-old-100): http://commons.wikimedia.org/wiki/File:Atlides_halesus_CramerStoll.png http://commons.wikimedia.org/wiki/File:Scotopelia_peliIbisV001P015AA.jpg
Or in some cases they are not tagged with a PD notice applicable to their country of publication: http://commons.wikimedia.org/wiki/File:Bassin_Houiller_Du_Gard.jpg http://commons.wikimedia.org/wiki/File:Die_Gattung_Nepenthes_illustration2.j... http://commons.wikimedia.org/wiki/File:Simonkai.jpg
Also, some have a license (implying a copyright claim) imported from Flickr. Switching to PD would be more accurate http://commons.wikimedia.org/wiki/File:Aesclepius,_Flora,_Ceres_and_Cupid_ho... http://commons.wikimedia.org/wiki/File:Flickr_-_BioDivLibrary_-_n102_w1150.j...
Best regards, and thanks for the hard work you're doing. Toby / User:99of9
On Thu, Sep 6, 2012 at 10:52 AM, Gaurav Vaidya gaurav@ggvaidya.com wrote:
Hi Toby,
On 4 September 2012 22:06, Toby Hudson tobyyy@gmail.com wrote:
Do we have adequate copyright information, for instance?
I've only looked at one file: http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg
And it looks like you could improve the copyright info:
Here the copyright claim is that the author died more than 70 years ago,
but
there is no illustrator death date listed. So to verify the claim, we
would
need to do some research. So if you have the date of death, and if the
book
was published outside the US (here it was apparently London, UK), please provide it.
Also, note that the current copyright template says (after a big warning sign): "You must also include a United States public domain tag to
indicate
why this work is in the public domain in the United States." In this
case
you should use http://commons.wikimedia.org/wiki/Template:PD-1923.
Toby / User:99of9
Ugh, good catch. It looks like http://commons.wikimedia.org/wiki/File:Greenwaxbill.jpg might not actually be out of copyright -- it was first published in the UK (not the US as I thought) in 1899, so it remains in copyright for "70 years from the end of the calendar year in which the last remaining author of the work dies" (as per http://www.copyrightservice.co.uk/copyright/p01_uk_copyright_law). F. W. Frohawk, the illustrator, died in 1946 as per http://en.wikipedia.org/wiki/Frederick_William_Frohawk, so none of his works will enter the public domain until 1946+70+1 = 2017.
I've tagged it for deletion, thanks! We'd still love your feedback on the other images!
cheers, Gaurav
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l
Hi Toby,
On 05-Sep-2012, at 9:23 PM, Toby Hudson wrote:
Glad to be of help. I don't have time to go through all your uploads, but here are some notes on a few random samples:
The general point I made in my other email still applies to some of your files
Here are some not currently tagged with a US PD notice (e.g. PD-1923 or PD-old-100): http://commons.wikimedia.org/wiki/File:Atlides_halesus_CramerStoll.png http://commons.wikimedia.org/wiki/File:Scotopelia_peliIbisV001P015AA.jpg
Or in some cases they are not tagged with a PD notice applicable to their country of publication: http://commons.wikimedia.org/wiki/File:Bassin_Houiller_Du_Gard.jpg http://commons.wikimedia.org/wiki/File:Die_Gattung_Nepenthes_illustration2.j... http://commons.wikimedia.org/wiki/File:Simonkai.jpg
Also, some have a license (implying a copyright claim) imported from Flickr. Switching to PD would be more accurate http://commons.wikimedia.org/wiki/File:Aesclepius,_Flora,_Ceres_and_Cupid_ho... http://commons.wikimedia.org/wiki/File:Flickr_-_BioDivLibrary_-_n102_w1150.j...
Thanks so much again for taking the time to check these images out! I've fixed all the images you mentioned, apart from http://commons.wikimedia.org/wiki/File:Simonkai.jpg, which was published in 1910, who authorship is unclear. Hopefully the (Hungarian) journal that image is from can tell us who he or she is; I've added this request to the Hungarian Village Pump on the Commons [1].
There's still some images that the Art of Life project is actively working on which need double-checking (see http://commons.wikimedia.org/wiki/Template:Information_Art_of_Life/Gallery), but the bigger task will be to sort out (1) the hundreds of images which I recently helped bulk-upload into the Commons [2], and (2) sorting out the thousands of BHL images already in the Commons [3], uploaded by different uploaders at different times. Fun!
I've added some information about dealing with UK/EU copyrights amongst BHL images to the BHL project page, emphasizing that both US and EU copyright tags are necessary for content published in the EU (as a lot of BHL's content is). Hopefully, that will help things a bit! It's at: http://commons.wikimedia.org/wiki/Commons:BHL#Copyrights
Best regards, and thanks for the hard work you're doing.
Thanks for the encouragement -- it's much appreciated! :)
cheers, Gaurav
[1] http://commons.wikimedia.org/wiki/Commons:Kocsmafal#Help_needed_to_check_aut... [2] http://commons.wikimedia.org/wiki/Commons:Flickr_batch_uploading/BHL_Art_of_... [3] http://commons.wikimedia.org/w/index.php?title=Special%3ALinkSearch&targ...