Hi,
I had an odd problem with files not being created, which I think I can put down to how long filenames are handled by GWT.
As an example, my xml specified (A) but GWT created (B): A. File:Index Map No.2 of a part of Suffolk County. South Side - Ocean Shore, Long Island. Part of Islip and Part of Brookhaven. Published by E. Belcher Hyde. 97 Liberty Street, Brooklyn. 5 Beekman Street, NYPL1633883.tiff (209 chars) (see link) B. File:Index Map No. 2 of a part of Suffolk County. South Side - Ocean Shore, Long Island. Easthampton. Published by E. Belcher Hyde. 97 Liberty Street, Brooklyn. 5 Beekman Street, Manhattan. 1916. Volume NYPL1633.tiff (206 chars)
This seems an easy thing to warn the user about when reading the xml. In terms of behaviour I would expect the tool to reject the xml as malformed and warn about maximum allowed filename length, rather than truncate the name, in this case truncation meant corrupting the unique NYPL identifier.
It would be better if GWT allowed the maximum title length that Commons allows (240 bytes, the number of visible characters varying by charset).
I vaguely recall the Steering Committee discussing this last year, so I'm unsure if this is worth raising in bugzilla. Suggestions?
Links 1. https://commons.wikimedia.org/wiki/File:Index_Map_No.2_of_a_part_of_Suffolk_... 2. https://bugzilla.wikimedia.org/show_bug.cgi?id=30202 3. https://commons.wikimedia.org/wiki/Commons:Filenames
Fae
Sorry, got my examples confused. The general point about filename truncation is still correct.
Fae
On 30/04/2014, Fæ faewik@gmail.com wrote:
Hi,
I had an odd problem with files not being created, which I think I can put down to how long filenames are handled by GWT.
As an example, my xml specified (A) but GWT created (B): A. File:Index Map No.2 of a part of Suffolk County. South Side - Ocean Shore, Long Island. Part of Islip and Part of Brookhaven. Published by E. Belcher Hyde. 97 Liberty Street, Brooklyn. 5 Beekman Street, NYPL1633883.tiff (209 chars) (see link) B. File:Index Map No. 2 of a part of Suffolk County. South Side - Ocean Shore, Long Island. Easthampton. Published by E. Belcher Hyde. 97 Liberty Street, Brooklyn. 5 Beekman Street, Manhattan. 1916. Volume NYPL1633.tiff (206 chars)
This seems an easy thing to warn the user about when reading the xml. In terms of behaviour I would expect the tool to reject the xml as malformed and warn about maximum allowed filename length, rather than truncate the name, in this case truncation meant corrupting the unique NYPL identifier.
It would be better if GWT allowed the maximum title length that Commons allows (240 bytes, the number of visible characters varying by charset).
I vaguely recall the Steering Committee discussing this last year, so I'm unsure if this is worth raising in bugzilla. Suggestions?
Links
https://commons.wikimedia.org/wiki/File:Index_Map_No.2_of_a_part_of_Suffolk_... 2. https://bugzilla.wikimedia.org/show_bug.cgi?id=30202 3. https://commons.wikimedia.org/wiki/Commons:Filenames
Fae
faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
title truncation ---------------- i can adjust the max title length to 240, but will subtract the mediafile’s extension from the total title length, and then truncate the title based on that evaluation; e.g., .tiff is 5 bytes, .jpg is 4 bytes. i’ll add this to the list of items i hope to cover during the hackathon.
title building -------------- we also have an issue with how the title is built. currently, the values mapped to the template’s title parameter and title identifier are used to create the title. james healed ran into an issue when he was uploading some images from the british library where he didn’t understand this building process and was at first confused. he ended up creating one “special” field in the metadata that contained the _unique_ title he wanted to use. he then mapped that special, unique title to the title identifier parameter without mapping anything to the template title parameter. i suggest that we rename the gwtoolset title identifier parameter to gwtoolset-title and that we recommend that uploaders mimc what james did.
with kind regards, dan
On Apr 30, 2014, at 21:12 , Fæ faewik@gmail.com wrote:
Sorry, got my examples confused. The general point about filename truncation is still correct.
Fae
On 30/04/2014, Fæ faewik@gmail.com wrote:
Hi,
I had an odd problem with files not being created, which I think I can put down to how long filenames are handled by GWT.
As an example, my xml specified (A) but GWT created (B): A. File:Index Map No.2 of a part of Suffolk County. South Side - Ocean Shore, Long Island. Part of Islip and Part of Brookhaven. Published by E. Belcher Hyde. 97 Liberty Street, Brooklyn. 5 Beekman Street, NYPL1633883.tiff (209 chars) (see link) B. File:Index Map No. 2 of a part of Suffolk County. South Side - Ocean Shore, Long Island. Easthampton. Published by E. Belcher Hyde. 97 Liberty Street, Brooklyn. 5 Beekman Street, Manhattan. 1916. Volume NYPL1633.tiff (206 chars)
This seems an easy thing to warn the user about when reading the xml. In terms of behaviour I would expect the tool to reject the xml as malformed and warn about maximum allowed filename length, rather than truncate the name, in this case truncation meant corrupting the unique NYPL identifier.
It would be better if GWT allowed the maximum title length that Commons allows (240 bytes, the number of visible characters varying by charset).
I vaguely recall the Steering Committee discussing this last year, so I'm unsure if this is worth raising in bugzilla. Suggestions?
Links
https://commons.wikimedia.org/wiki/File:Index_Map_No.2_of_a_part_of_Suffolk_... 2. https://bugzilla.wikimedia.org/show_bug.cgi?id=30202 3. https://commons.wikimedia.org/wiki/Commons:Filenames
Fae
faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
-- faewik@gmail.com https://commons.wikimedia.org/wiki/User:Fae
Glamtools mailing list Glamtools@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/glamtools
Thanks for looking into this Dan. Two thoughts where other volunteers on this list might help out:
USER GUIDE - EXAMPLE FILE NAME FORMATS The user guide would really benefit from example formats for file names used in our projects. I tend to use: "File:<friendly description name up to 200 characters> <project abbreviation><unique identifier from GLAM source>.<extension>" with the project abbreviations normally being 3 or 4 letters (NARA, BL, NYPL) and the unique ID mostly being a string of numbers unique to the image/photograph, though this may be more complex for IDs relating to an artefact with multiple images.
XML GWT VALIDATOR Though the GWT recommends a standard xml validator, it would be great if we had a "user hackable" xml/ingestion validator of our own. This could include odd Commons specific disallowed characters in file names as well as checking field lengths and highlighting missing fields depending on the template chosen.
PS email from your Yahoo address keeps ending up in my gmail spam folder and I keep on marking it as 'not spam'. :-)
Fae
On 07/05/2014, dan entous d_entous@yahoo.com wrote:
title truncation
i can adjust the max title length to 240, but will subtract the mediafile’s extension from the total title length, and then truncate the title based on that evaluation; e.g., .tiff is 5 bytes, .jpg is 4 bytes. i’ll add this to the list of items i hope to cover during the hackathon.
title building
we also have an issue with how the title is built. currently, the values mapped to the template’s title parameter and title identifier are used to create the title. james healed ran into an issue when he was uploading some images from the british library where he didn’t understand this building process and was at first confused. he ended up creating one “special” field in the metadata that contained the _unique_ title he wanted to use. he then mapped that special, unique title to the title identifier parameter without mapping anything to the template title parameter. i suggest that we rename the gwtoolset title identifier parameter to gwtoolset-title and that we recommend that uploaders mimc what james did.
with kind regards, dan