If anyone has a moment today and would like to comment, I would appreciate extra eyes on the test upload of images on beta at: http://commons.wikimedia.beta.wmflabs.org/w/index.php?title=Category:Images_...
The way GWT handles the Artwork template is not ideal, and things like the use of a hyphen in the filename, or the creator template for non-existent authors are not user customizable. If I proceed with this run as is, then I will also need to run a little post-upload house-keeping on the way apostrophes have been converted to html codes.
This run should be around 2,000 images and is limited to pre-20th century artworks from Japan. If this works well, then I'll look at other collections themes. I may need to tweak some category mapping in the background but the source of metadata as well as the image is now direct from the Rijksmuseum. Unfortunately there was limited reliable metadata in English, so I have stuck to one language.
I suggest comments are raised on the project page at https://commons.wikimedia.org/wiki/Commons:Batch_uploading/Art_of_Japan_in_the_Rijksmuseum (opinions section) unless they are of more general interest for all GWT users.
Fae
Looks OK for me. At least we can bot-cleanup the uploads.
Regards, Steinsplitter
Date: Sun, 11 May 2014 12:54:00 +0100 From: faewik@gmail.com To: glamtools@lists.wikimedia.org Subject: [Glamtools] GWT Help: Rijksmuseum upload final checks
If anyone has a moment today and would like to comment, I would appreciate extra eyes on the test upload of images on beta at: http://commons.wikimedia.beta.wmflabs.org/w/index.php?title=Category:Images_...
The way GWT handles the Artwork template is not ideal, and things like the use of a hyphen in the filename, or the creator template for non-existent authors are not user customizable. If I proceed with this run as is, then I will also need to run a little post-upload house-keeping on the way apostrophes have been converted to html codes.
This run should be around 2,000 images and is limited to pre-20th century artworks from Japan. If this works well, then I'll look at other collections themes. I may need to tweak some category mapping in the background but the source of metadata as well as the image is now direct from the Rijksmuseum. Unfortunately there was limited reliable metadata in English, so I have stuck to one language.
I suggest comments are raised on the project page at https://commons.wikimedia.org/wiki/Commons:Batch_uploading/Art_of_Japan_in_the_Rijksmuseum (opinions section) unless they are of more general interest for all GWT users.
Fae
faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
While you're running the clean-up script to fix all the apostrophes, you might also prefer the neater (and warning-free) {{PD-old-70-1923}} rather than {{PD-US}}{{PD-old}} -- though in fact, for these images of 3D objects, the license needed is surely {{Cc-zero}} or something similar, because there needs to be some release of the photographer's copyright.
I agree about the hyphen -- really, it could use an extra space either side of the hyphen, to more clearly separate the object short name from the museum identifier.
Best,
James.
On 11/05/2014 12:54, Fæ wrote:
If anyone has a moment today and would like to comment, I would appreciate extra eyes on the test upload of images on beta at: http://commons.wikimedia.beta.wmflabs.org/w/index.php?title=Category:Images_...
The way GWT handles the Artwork template is not ideal, and things like the use of a hyphen in the filename, or the creator template for non-existent authors are not user customizable. If I proceed with this run as is, then I will also need to run a little post-upload house-keeping on the way apostrophes have been converted to html codes.
This run should be around 2,000 images and is limited to pre-20th century artworks from Japan. If this works well, then I'll look at other collections themes. I may need to tweak some category mapping in the background but the source of metadata as well as the image is now direct from the Rijksmuseum. Unfortunately there was limited reliable metadata in English, so I have stuck to one language.
I suggest comments are raised on the project page at https://commons.wikimedia.org/wiki/Commons:Batch_uploading/Art_of_Japan_in_the_Rijksmuseum (opinions section) unless they are of more general interest for all GWT users.
Fae
On 11 May 2014 14:07, James Heald j.heald@ucl.ac.uk wrote:
While you're running the clean-up script to fix all the apostrophes, you might also prefer the neater (and warning-free) {{PD-old-70-1923}} rather than {{PD-US}}{{PD-old}} -- though in fact, for these images of 3D objects, the license needed is surely {{Cc-zero}} or something similar, because there needs to be some release of the photographer's copyright.
I agree about the hyphen -- really, it could use an extra space either side of the hyphen, to more clearly separate the object short name from the museum identifier.
...
With regard to the hyphen, I tried padding the fields with a space, but the padding gets trimmed again by GWT. Unfortunate, particularly as including renaming as part of automated house-keeping is not very good practice, even for new uploads.
Based on a separate discussion with David H., I may defer this run until GWT can allow a custom template option; hopefully in the coming week. Although I would like to use the default Artwork template, these oddities of automated formatting (mandated creator template, forced filename structure, html codes rather than characters) are creating a programming burden on my volunteer time that it would be sensible to avoid.
Fae
On 11/05/2014 14:37, Fæ wrote:
With regard to the hyphen, I tried padding the fields with a space, but the padding gets trimmed again by GWT. Unfortunate, particularly as including renaming as part of automated house-keeping is not very good practice, even for new uploads.
Based on a separate discussion with David H., I may defer this run until GWT can allow a custom template option; hopefully in the coming week. Although I would like to use the default Artwork template, these oddities of automated formatting (mandated creator template, forced filename structure, html codes rather than characters) are creating a programming burden on my volunteer time that it would be sensible to avoid.
Fae
None of the latter (mandated creator template, forced filename structure, html codes rather than characters) are problems with the {{Artwork}} template.
They are all design choices made by GWToolset, regardless of the template used.
The filename thing can be worked around (or at least this is what I did with the BL pics), by pre-processing the XML file to add a new field your own filename, and then using this new bespoke field as the "unique identifier". The downside (IIRC) is the GMToolset will still try to add the "Title" field, so this needs to be left empty.
The FIXES needed to GWToolset that therefore suggest themselves are: (1) Add a space either side of the hyphen, when creating compound filenames (2) Add a checkbox, to turn off the creation of compound filenames, or, alternatively, to enable the user to nominate which fields will be compounded, rather than inevatibly using the title-field.
The html code thing is also something that needs to be FIXed. (IIRC, I flagged the same thing six weeks ago, and eventually fixed mine with an AWB run; but then I was only uploading 430 images).
GWT was also suppressing <br> and <p> tags when I used it. I don't know if that is something it is still doing?
-- J.
On Sun, May 11, 2014 at 3:07 PM, James Heald j.heald@ucl.ac.uk wrote:
While you're running the clean-up script to fix all the apostrophes, you might also prefer the neater (and warning-free) {{PD-old-70-1923}} rather than {{PD-US}}{{PD-old}} -- though in fact, for these images of 3D objects, the license needed is surely {{Cc-zero}} or something similar, because there needs to be some release of the photographer's copyright.
Seconded here. Surely the objects are PD, but the photographs are definitely not, so you probably want to add the CC-zero template as well. It's a bit of a shame that the metadata isn't very clear on the license, but for now i guess just checking if the 'copyrightHolder' field is null or false is the best way forward. I did a little check and copyrighted works have a string in that field *and* the 'webImage' field is false (where PD works have an object there with image information).
Are you in contact with people from the Rijksmuseum by the way? If you want i could introduce you so they could have a look at the sample upload as well.
And thanks so much for the project page describing the upload progress, i'll be adding those for the National Library / Archive uploads as well.
-- Hay
On 12 May 2014 13:24, Hay (Husky) huskyr@gmail.com wrote:
Seconded here. Surely the objects are PD, but the photographs are definitely not, so you probably want to add the CC-zero template as well. It's a bit of a shame that the metadata isn't very clear on the license, but for now i guess just checking if the 'copyrightHolder' field is null or false is the best way forward. I did a little check and copyrighted works have a string in that field *and* the 'webImage' field is false (where PD works have an object there with image information).
It is a good improvement which I'll implement in the next run. In the permissions field, I may link to the terms as below...
I have updated the project page with: Licenses chosen can be based on this statement ''"All data and all images made available through the API are either in the public domain or are subject to a CC0 license."'' found [https://www.rijksmuseum.nl/en/api/terms-and-conditions-of-use here]. This should mean that the photographs are themselves released as CC0, with copyright of the art object being a separate issue (sticking to a cut-off of before the 20th century should mean PD can apply).
Are you in contact with people from the Rijksmuseum by the way? If you want i could introduce you so they could have a look at the sample upload as well.
Through David H. and Sebastiaan I am. I think it all looks okay, but I'm thinking that when the first 200 go "live" it may be worth pausing for a day or two for any feedback, and have a group call then, if there is anything that could be done better. I'm in no hurry, so waiting for feedback is not an issue.
And thanks so much for the project page describing the upload progress, i'll be adding those for the National Library / Archive uploads as well.
It is good practice, I think we should make it best practice for any GWT projects that are of any significant size (more than 1,000 images?). :-)
*Everyone* please remember to update https://commons.wikimedia.org/wiki/Commons:GWToolset_users with a note about your planned GWT projects, along with links to the project pages so that our volunteer community knows where to go with questions and suggestions.
Fae