Hi,
I'm about to upload a few hundred images that have been released by the British Library.
I am all set to go, with carefully designed Commons filenames; but the GWtoolset uploader is wrecking all the commas and brackets in my filenames.
What I want is:
File:Large flowering sensitive plant (Mimosa grandiflora) - New illustration of the Sexual System of Carolus von Linnaeus (1807) - BL.jpg
File:Cherries - Pomona Britannica (1812), pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire (1814), plate XV - BL.jpg
What it's giving me is:
File:Large flowering sensitive plant -Mimosa grandiflora- - New illustration of the Sexual System of Carolus von Linnaeus -1807- - BL.jpg
File:Cherries - Pomona Britannica -1812-- pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire -1814-- plate XV - BL.jpg
How do I turn this behaviour off, please, or how do I work around it, to get the more easily human-readable names that I want?
Thanks,
James Heald.
hi james,
glad to hear that you're getting ready to upload with gwtoolset. sorry that you're running into an issue. at the moment the following characters are replaced with a '-' in a title without a method to override any of them:
'#','<','>','[',']','|','{','}',':','¬','`','!','"','£','$','^','&','*','(',')','+','=','~','?',',',';',"'",'@'
this list was comprised based on several wiki articles:
* https://commons.wikimedia.org/wiki/Commons:File_naming * http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(technical_restric...) * http://www.mediawiki.org/wiki/Help:Bad_title * http://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist
i'm not sure who would or what process would “approve” the issue of relaxing that restriction to also allow the characters: '(',')',','. maybe someone else on this list would know. my guess is that if the commons admins and community are okay with it, then we can go ahead and allow those characters, but i don't know how that's done. maybe via an rfc or village pump article with votes ...
with kind regards, dan
On Mar 4, 2014, at 10:02 AM, James Heald j.heald@ucl.ac.uk wrote:
Hi,
I'm about to upload a few hundred images that have been released by the British Library.
I am all set to go, with carefully designed Commons filenames; but the GWtoolset uploader is wrecking all the commas and brackets in my filenames.
What I want is:
File:Large flowering sensitive plant (Mimosa grandiflora) - New illustration of the Sexual System of Carolus von Linnaeus (1807) - BL.jpg
File:Cherries - Pomona Britannica (1812), pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire (1814), plate XV - BL.jpg
What it's giving me is:
File:Large flowering sensitive plant -Mimosa grandiflora- - New illustration of the Sexual System of Carolus von Linnaeus -1807- - BL.jpg
File:Cherries - Pomona Britannica -1812-- pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire -1814-- plate XV - BL.jpg
How do I turn this behaviour off, please, or how do I work around it, to get the more easily human-readable names that I want?
Thanks,
James Heald.
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Files are now up, at
https://commons.wikimedia.org/wiki/Category:Images_released_by_British_Libra...
Luckily I asked for and was given file-mover permissions last week, so I am now slowly going through and restoring the punctuation. 35 done, 395 to go...
I'll come back with some more thoughts in a bit; but even in this form thank you for a very useful tool.
Best regards,
James.
On 05/03/2014 07:57, dan entous wrote:
hi james,
glad to hear that you're getting ready to upload with gwtoolset. sorry that you're running into an issue. at the moment the following characters are replaced with a '-' in a title without a method to override any of them:
'#','<','>','[',']','|','{','}',':','¬','`','!','"','£','$','^','&','*','(',')','+','=','~','?',',',';',"'",'@'
this list was comprised based on several wiki articles:
- https://commons.wikimedia.org/wiki/Commons:File_naming
- http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(technical_restric...)
- http://www.mediawiki.org/wiki/Help:Bad_title
- http://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist
i'm not sure who would or what process would “approve” the issue of relaxing that restriction to also allow the characters: '(',')',','. maybe someone else on this list would know. my guess is that if the commons admins and community are okay with it, then we can go ahead and allow those characters, but i don't know how that's done. maybe via an rfc or village pump article with votes ...
with kind regards, dan
On Mar 4, 2014, at 10:02 AM, James Heald j.heald@ucl.ac.uk wrote:
Hi,
I'm about to upload a few hundred images that have been released by the British Library.
I am all set to go, with carefully designed Commons filenames; but the GWtoolset uploader is wrecking all the commas and brackets in my filenames.
What I want is:
File:Large flowering sensitive plant (Mimosa grandiflora) - New illustration of the Sexual System of Carolus von Linnaeus (1807) - BL.jpg
File:Cherries - Pomona Britannica (1812), pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire (1814), plate XV - BL.jpg
What it's giving me is:
File:Large flowering sensitive plant -Mimosa grandiflora- - New illustration of the Sexual System of Carolus von Linnaeus -1807- - BL.jpg
File:Cherries - Pomona Britannica -1812-- pl.10 - BL.jpg
File:Rape threshing - The costume of Yorkshire -1814-- plate XV - BL.jpg
How do I turn this behaviour off, please, or how do I work around it, to get the more easily human-readable names that I want?
Thanks,
James Heald.
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
2014-03-05 8:57 GMT+01:00 dan entous d_entous@yahoo.com:
hi james,
glad to hear that you're getting ready to upload with gwtoolset. sorry that you're running into an issue. at the moment the following characters are replaced with a '-' in a title without a method to override any of them:
'#','<','>','[',']','|','{','}',':','¬','`','!','"','£','$','^','&','*','(',')','+','=','~','?',',',';',"'",'@'
this list was comprised based on several wiki articles:
http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(technical_restric...)
- http://www.mediawiki.org/wiki/Help:Bad_title
- http://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist
i'm not sure who would or what process would "approve" the issue of relaxing that restriction to also allow the characters: '(',')',','. maybe someone else on this list would know. my guess is that if the commons admins and community are okay with it, then we can go ahead and allow those characters, but i don't know how that's done. maybe via an rfc or village pump article with votes ...
Previous batch uploads were usually normalising names with the following (which I think is less stringent)
def cleanUpTitle(title):
""" Clean up the title of a potential mediawiki page. Otherwise the title of the page might not be allowed by the software.
""" title = title.strip() title = re.sub(u"[<{\\[]", u"(", title) title = re.sub(u"[>}\\]]", u")", title) title = re.sub(u"[ _]?\\(!\\)", u"", title) title = re.sub(u",:[ _]", u", ", title) title = re.sub(u"[;:][ _]", u", ", title) title = re.sub(u"[\t\n ]+", u" ", title) title = re.sub(u"[\r\n ]+", u" ", title) title = re.sub(u"[\n]+", u"", title) title = re.sub(u"[?!]([.\"]|$)", u"\\1", title) title = re.sub(u"[&#%?!]", u"^", title) title = re.sub(u"[;]", u",", title) title = re.sub(u"[/+\\\\:]", u"-", title) title = re.sub(u"--+", u"-", title) title = re.sub(u",,+", u",", title) title = re.sub(u"[-,^]([.]|$)", u"\\1", title) title = title.replace(u" ", u"_") title = title.strip(u"_") return title
< https://git.wikimedia.org/blob/pywikibot%2Fcore.git/ffb59e9e241881d13646191a...
Hope that helps,
On 05/03/2014 07:57, dan entous wrote:
hi james,
glad to hear that you're getting ready to upload with gwtoolset. sorry that you're running into an issue. at the moment the following characters are replaced with a '-' in a title without a method to override any of them:
'#','<','>','[',']','|','{','}',':','¬','`','!','"','£','$','^','&','*','(',')','+','=','~','?',',',';',"'",'@'
this list was comprised based on several wiki articles:
- https://commons.wikimedia.org/wiki/Commons:File_naming
- http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(technical_restric...)
- http://www.mediawiki.org/wiki/Help:Bad_title
- http://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist
i'm not sure who would or what process would “approve” the issue of relaxing that restriction to also allow the characters: '(',')',','. maybe someone else on this list would know. my guess is that if the commons admins and community are okay with it, then we can go ahead and allow those characters, but i don't know how that's done. maybe via an rfc or village pump article with votes ...
In respect of the list above, the links actually indicate a lot of flexiblity.
From http://www.mediawiki.org/wiki/Help:Bad_title it's clear that there is no technical problem with most of the characters -- it gives the example of Some¬`!"£$^&*()_+-=~?/.,;:'@ as something the software could handle, if necessary.
http://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist highlight some characters that would be a problem, but they are a lot more esoteric than anything I want (so long as none of my filenames contain rude parts of the body).
https://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_rest... contains a much shorter list of restricted characters - just # < > [ ] | { }
which leaves https://commons.wikimedia.org/wiki/Commons:File_naming with its rather vague statement to
"Avoid "funny" symbols (control characters, unneeded punctuation, etc.) that might be significant in future wiki markup."
(Vagueness in the draft seems to be one reason why it has never been adopted, but is still only a working proposal).
For me, the apostrophe in St John's Gospel or Cicero's Aratus or Breviari d'Amour
*is* essential punctuation, and the strange and unreadable Breviari d--39-Amour is not an acceptable substitute.
Similarly parentheses and commas are used in the names of so many images, that it would break far too much now for them ever to become "significant in future wiki markup".
I can request confirmation at Commons:Village Pump if it is really necessary; but just doing what the other bots do, per Jean-Frédéric's suggestion, seems a good way to go.
Sorry to keep harping on this, but as I said in my other post, it's really blocking me, because a full reupload of the files is going to be the only sane way to sort these issues out; and until that's happened I can't do anything to integrate these files into the wiki, because it will all just get wiped out (as will the work of anybody else who tries to work on or with them).
So if this isn't too big a fix to ask, I would be very very grateful.
All best,
James.
Hi Dan,
dan entous schreef op 5-3-2014 8:57:
hi james,
glad to hear that you're getting ready to upload with gwtoolset. sorry that you're running into an issue. at the moment the following characters are replaced with a '-' in a title without a method to override any of them:
'#','<','>','[',']','|','{','}',':','¬','`','!','"','£','$','^','&','*','(',')','+','=','~','?',',',';',"'",'@'
We should have a look at it at the hackathon and commit a patch. You, Jean-Fred and I are there so that should work. I think we can relax it a bit or make it less intrusive. Filed https://bugzilla.wikimedia.org/show_bug.cgi?id=64843 so we won't forget.
Maarten
The GWT allows quotes and normal brackets, I have not tested other Commons allowed characters. For example: https://commons.wikimedia.org/wiki/File:%22No.1_Seminary_Building._No._3_Mor...
Some of the problem might be how the XML file is encoded to UTF-8 by the user, rather than how the GWT handles it.
Fae
On 4 May 2014 19:50, Maarten Dammers maarten@mdammers.nl wrote:
Hi Dan,
dan entous schreef op 5-3-2014 8:57:
hi james,
glad to hear that you're getting ready to upload with gwtoolset. sorry that you're running into an issue. at the moment the following characters are replaced with a '-' in a title without a method to override any of them:
'#','<','>','[',']','|','{','}',':','¬','`','!','"','£','$','^','&','*','(',')','+','=','~','?',',',';',"'",'@'
We should have a look at it at the hackathon and commit a patch. You, Jean-Fred and I are there so that should work. I think we can relax it a bit or make it less intrusive. Filed https://bugzilla.wikimedia.org/show_bug.cgi?id=64843 so we won't forget.
Maarten
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Dear colleagues,
Before uploading some 1550 images in 200-odd batches to https://commons.wikimedia.org/wiki/Category:Prints_from_the_Peace_Palace_Lib... i hit on this problem:
When i input xml author metadata
<author>Pieter Christiaansz Bor (1559-1635). Engraver:. Photography: D-vorm, Bert en Lilian Mellink</author>
mapped to template {{Information}} GWT gives
Author Creator:Bert en Lilian Mellink Pieter Christiaansz Bor (1559-1635). Engraver:. Photography: D-vorm
instead of simply
Pieter Christiaansz Bor (1559-1635). Engraver:. Photography: D-vorm, Bert en Lilian Mellink.
Examples -------- http://commons.wikimedia.beta.wmflabs.org/wiki/File:Bor-Nederlantsche-Oorlog... https://commons.wikimedia.org/wiki/File:Bor-Nederlantsche-Oorloghen_9140.tif
GWTset tacitly assumes Template:Creator, e.g.
{{Information | author = {{Creator:Bert en Lilian Mellink Pieter Christiaansz Bor (1559-1635). Engraver:. Photography: D-vorm}}
?? Is there a workaround, my time is running out (1550 uploads to do before June 1)??
Thanks a lot, hans muller
Op Zo, 4 mei, 2014 9:52 pm schreef Fæ:
The GWT allows quotes and normal brackets, I have not tested other Commons allowed characters. For example: https://commons.wikimedia.org/wiki/File:%22No.1_Seminary_Building._No._3_M organ_Hall_Completed_1875._No.2_Library_Building_Dedicated_May_8th_1872._ Auburn_Theologigal_Seminary,_%28Incorporated_April_14th_A.D._1820%29_Aubu rn,_Cayuga_Co._NYPL1583067.tiff
Some of the problem might be how the XML file is encoded to UTF-8 by the user, rather than how the GWT handles it.
Fae
On 4 May 2014 19:50, Maarten Dammers maarten@mdammers.nl wrote:
Hi Dan,
dan entous schreef op 5-3-2014 8:57:
hi james,
glad to hear that you're getting ready to upload with gwtoolset. sorry that you're running into an issue. at the moment the following characters are replaced with a '-' in a title without a method to override any of them:
'#','<','>','[',']','|','{','}',':','¬','`','!','"','£','$','^','&' ,'*','(',')','+','=','~','?',',',';',"'",'@'
We should have a look at it at the hackathon and commit a patch. You, Jean-Fred and I are there so that should work. I think we can relax it a bit or make it less intrusive. Filed https://bugzilla.wikimedia.org/show_bug.cgi?id=64843 so we won't forget.
Maarten
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
-- faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae Personal and confidential, please do not circulate or re-quote.
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
On 5 May 2014 13:01, Hans Muller j.m.muller@hccnet.nl wrote: ...
Author Creator:Bert en Lilian Mellink Pieter Christiaansz Bor
...
This is a bug that has been a problem on my NYPL uploads (which I have yet to fix). What *probably* should be happening, is that the <author> field should be checked for a possible match to a {{creator}} template, and the template *then* used. Unfortunately it seems to use it for everything instead.
No quick fix, this needs to be reported as a bug and fixed for everyone. If when you have completed your uploads you leave me a prompt on my Commons talk page with a link to your parent category for uploads, I can let Faebot do the tidy-up as a housekeeping job (or you could use a tool like VisualFileChange to fix them); I'd probably get around to it within a day of your request.
Fae
I have knocked up a quick fix for Faebot to check all GWT uploads, see if they have a non-existent {{creator}} links and replace it.
Example: https://commons.wikimedia.org/w/index.php?title=File:Double_Page_Plate_No._1...
I'll get Faebot to run through the entire set in a bit. I'll not run this automatically, so drop a note on my talk page if you need it to re-run due to a new GWT upload. Once the bug is fixed this should be irrelevant.
Fae
On 5 May 2014 13:21, Fæ faewik@gmail.com wrote:
On 5 May 2014 13:01, Hans Muller j.m.muller@hccnet.nl wrote: ...
Author Creator:Bert en Lilian Mellink Pieter Christiaansz Bor
...
This is a bug that has been a problem on my NYPL uploads (which I have yet to fix). What *probably* should be happening, is that the <author> field should be checked for a possible match to a {{creator}} template, and the template *then* used. Unfortunately it seems to use it for everything instead.
No quick fix, this needs to be reported as a bug and fixed for everyone. If when you have completed your uploads you leave me a prompt on my Commons talk page with a link to your parent category for uploads, I can let Faebot do the tidy-up as a housekeeping job (or you could use a tool like VisualFileChange to fix them); I'd probably get around to it within a day of your request.
Fae
faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
On 5 May 2014 13:59, Fæ faewik@gmail.com wrote:
I'll get Faebot to run through the entire set in a bit. I'll not run this automatically, so drop a note on my talk page if you need it to re-run due to a new GWT upload. Once the bug is fixed this should be irrelevant.
On request I reverted to allowing red-links for "Images released by British Library Images Online". If anyone would like Faebot to remove red-linked creator templates from their GWT uploads, let me know what parent category they all appear in and I'll kick Faebot to sort it out. As there are mixed views, I'm not planning on running the fix across all GWT uploads again.
Fae
Hullo!
On 5 May, 2014, at 6:21 am, Fæ faewik@gmail.com wrote:
On 5 May 2014 13:01, Hans Muller j.m.muller@hccnet.nl wrote: ...
Author Creator:Bert en Lilian Mellink Pieter Christiaansz Bor
...
This is a bug that has been a problem on my NYPL uploads (which I have yet to fix). What *probably* should be happening, is that the <author> field should be checked for a possible match to a {{creator}} template, and the template *then* used. Unfortunately it seems to use it for everything instead.
No quick fix, this needs to be reported as a bug and fixed for everyone. If when you have completed your uploads you leave me a prompt on my Commons talk page with a link to your parent category for uploads, I can let Faebot do the tidy-up as a housekeeping job (or you could use a tool like VisualFileChange to fix them); I'd probably get around to it within a day of your request.
What about checking for the the Creator page on the fly? When I was playing around with uploading a bunch of BHL (http://biodiversitylibrary.org/) images to the Commons last year, I wrote a template that, given a creator’s name, would check for a “Creator: …” page with the same name. If it found one, it would insert the appropriate {{Creator}} template automatically; otherwise, it would display the name by itself with a note beside it inviting users to create the {{Creator}} page themselves. You can see an example at https://commons.wikimedia.org/wiki/File:English_Sparrow_making_a_home_in_a_h... — one of the photographer/illustrator has a creator page, but the other does not. If someone were to create a Creator page for Herman T. Bohlman, the template would insert it automatically the next time the page was purged.
It’s a quickly thrown together template, and definitely not the most efficient, but I’d be happy to help improve it if anybody finds it useful! The source is available at https://commons.wikimedia.org/wiki/Template:Agent
cheers, Gaurav
as far as i know, https://bugzilla.wikimedia.org/show_bug.cgi?id=62909 took care of this issue with https://gerrit.wikimedia.org/r/#/c/121094/ and https://gerrit.wikimedia.org/r/#/c/125401/
with kind regards, dan
On May 4, 2014, at 21:52 , Fæ faewik@gmail.com wrote:
The GWT allows quotes and normal brackets, I have not tested other Commons allowed characters. For example: https://commons.wikimedia.org/wiki/File:%22No.1_Seminary_Building._No._3_Mor...
Some of the problem might be how the XML file is encoded to UTF-8 by the user, rather than how the GWT handles it.
Fae
On 4 May 2014 19:50, Maarten Dammers maarten@mdammers.nl wrote:
Hi Dan,
dan entous schreef op 5-3-2014 8:57:
hi james,
glad to hear that you're getting ready to upload with gwtoolset. sorry that you're running into an issue. at the moment the following characters are replaced with a '-' in a title without a method to override any of them:
'#','<','>','[',']','|','{','}',':','¬','`','!','"','£','$','^','&','*','(',')','+','=','~','?',',',';',"'",'@'
We should have a look at it at the hackathon and commit a patch. You, Jean-Fred and I are there so that should work. I think we can relax it a bit or make it less intrusive. Filed https://bugzilla.wikimedia.org/show_bug.cgi?id=64843 so we won't forget.
Maarten
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
-- faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae Personal and confidential, please do not circulate or re-quote.
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools