An image metadata standard to automate attribution and license info

List overview All Threads
Download

newer

older

ImageTweaks demo now on display at...

Multimedia data being crunched,...

Jane Park

29 Sep 2015 29 Sep '15

11:19 p.m.

Hi everyone, I lead platform work <https://github.com/creativecommons/platform-initiative> at Creative Commons. As part of that work, we are exploring the potential of a standard field in EXIF that could make attribution and license info more sticky across the web. We are currently in the research phase -- talking to major image hosting platforms (and platforms that read and ingest images) about what kinds of image metadata they read and retain. Zhou and his engineering team at Wikimedia directed me to this list as I am seeking feedback from the Wikimedia community. Ultimately, we want to make it easier for platforms to display provenance and license info -- increase the likelihood that when a user lands on an image, they know who created it and what license to use it under. For example, images from Wikimedia Commons may get tweeted, but the image metadata is not retained in tweets. How can we work with platforms to use the same metadata standard so that info can be retained across them? Since we are just in the research phase now, I welcome your thoughts on Wikimedia Commons' and Wikipedia's own uses of image metadata. Specifically: 1. The most common image metadata standards we know about are EXIF and XMP. Which does Wikimedia primarily read and retain? Are there others that are more widely used? 2. Which standard does Wikimedia prefer? What would be easiest to implement? for Wikimedia, but also for the platforms that Wikimedia interfaces with. Aka, what are the pros and cons of each? Lastly, welcome any general thoughts about the feasibility and need for such a project. Best, Jane Jane Park @janedaily Creative Commons | Los Angeles Make a donation to support CC in 2015: http://bit.ly/supportcc2015

Attachments:

attachment.htm (text/html — 2.1 KB)

Show replies by date

Pine W

30 Sep 30 Sep

12:26 a.m.

New subject: An image metadata standard to automate attribution and license info

Good idea. I see a lot of EXIF and my hunch is that we would prefer that format, especially if the license data can be made to stick even after a photo has gone through image postprocessing in tools like the Adobe suite. Pine On Sep 29, 2015 2:19 PM, "Jane Park" <janepark(a)creativecommons.org> wrote:

...

Brian Wolff

1:47 a.m.

New subject: An image metadata standard to automate attribution and license info

On 9/29/15, Pine W <wiki.pine(a)gmail.com> wrote:

...

Adobe is the inventor of XMP. Photoshop is probably the most complete implementation of XMP in existence. I'm not that familiar with photoshop, but its highly likely that it properly maintains both exif and XMP metadata after any post processing. We prefer Exif and XMP equally (See also my other email). https://commons.wikimedia.org/wiki/Commons:EXIF has some info about how things work at commons.

Pine W

3:21 a.m.

New subject: An image metadata standard to automate attribution and license info

I see many images on Commons that have replaced info about the camera with info about the processing in Adobe tools. I don't know where exactly that camera metadata is getting stripped out, but I wish that it would be left intact after processing in Adobe tools and uploading to Commons. Pine On Sep 29, 2015 4:47 PM, "Brian Wolff" <bawolff(a)gmail.com> wrote:

...

On 9/29/15, Pine W <wiki.pine(a)gmail.com> wrote:

suite.

Pine

Brian Wolff

4:30 a.m.

New subject: An image metadata standard to automate attribution and license info

That's interesting. Do you have an example I could see? There are supposed to be two separate fields - one for camera (Actually 2 separate fields for make and model of camera), and one for software used to process the image. Ultimately though, if adobe is doing that, there's not really anything we can do about it. -- -bawolff On 9/29/15, Pine W <wiki.pine(a)gmail.com> wrote:

...

On 9/29/15, Pine W <wiki.pine(a)gmail.com> wrote:

suite.

Pine

Pine W

8:15 a.m.

New subject: An image metadata standard to automate attribution and license info

Here's an example: https://commons.wikimedia.org/wiki/File:Common_crane_grus_grus.jpg On Tue, Sep 29, 2015 at 7:30 PM, Brian Wolff <bawolff(a)gmail.com> wrote:

...

I see many images on Commons that have replaced info about the camera

with

info about the processing in Adobe tools. I don't know where exactly that camera metadata is getting stripped out, but I wish that it would be left intact after processing in Adobe tools and uploading to Commons. Pine On Sep 29, 2015 4:47 PM, "Brian Wolff" <bawolff(a)gmail.com> wrote:

On 9/29/15, Pine W <wiki.pine(a)gmail.com> wrote:

suite.

Pine

_______________________________________________ Multimedia mailing list Multimedia(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

bawolff

6:54 p.m.

New subject: An image metadata standard to automate attribution and license info

That file has exif for Software as being adobe, but the field for camera model is blank. It could be that the camera didn't record exif. Or maybe the original author intentionally removed the original metadata due to privacy concerns. Or perhaps it was unintentionally removed. Guess we can't really know without asking the original author. -- -bawolff On Wed, Sep 30, 2015 at 12:15 AM, Pine W <wiki.pine(a)gmail.com> wrote:

...

Here's an example: https://commons.wikimedia.org/wiki/File:Common_crane_grus_grus.jpg On Tue, Sep 29, 2015 at 7:30 PM, Brian Wolff <bawolff(a)gmail.com> wrote:

On 9/29/15, Pine W <wiki.pine(a)gmail.com> wrote: > Good idea. I see a lot of EXIF and my hunch is that we would prefer > that > format, especially if the license data can be made to stick even > after > a > photo has gone through image postprocessing in tools like the Adobe suite. > > Pine Adobe is the inventor of XMP. Photoshop is probably the most complete implementation of XMP in existence. I'm not that familiar with photoshop, but its highly likely that it properly maintains both exif and XMP metadata after any post processing. We prefer Exif and XMP equally (See also my other email). https://commons.wikimedia.org/wiki/Commons:EXIF has some info about how things work at commons. _______________________________________________ Multimedia mailing list Multimedia(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

_______________________________________________ Multimedia mailing list Multimedia(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Daniel and Elizabeth Case

7:01 p.m.

New subject: An image metadata standard to automate attribution and license info

My first ever contribution to this list after lurking for a couple of years:

...

I see many images on Commons that have replaced info about the camera with info about the processing in Adobe tools. I don't know >where exactly that camera metadata is getting stripped out, but I wish that it would be left intact after processing in Adobe tools and >uploading to Commons.

...

From considerable experience uploading, it strikes me that this is most common when you modify a file using Adobe software that has already been processed by Adobe software on another computer. I do not know if this is deliberate on Adobe’s part (or if so, what the reason would be) or a bug that no one’s cared to fix.

Daniel Case

Brian Wolff

1:44 a.m.

New subject: An image metadata standard to automate attribution and license info

On 9/29/15, Jane Park <janepark(a)creativecommons.org> wrote:

...

Hi Jane. We support both XMP and Exif (along with some other things like PNG iTXT chunks, the older non-xmp version of iptc, etc). To be specific, we only accept properties we are already aware of. Since XMP is an open standard that anyone can add to, we won't recognize any properties not on our whitelist. In the interface, we sometimes present all file metadata under the name "Exif" regardless of where it comes from, so depending on who you ask people might say we only support exif, which is untrue. Its just how we communicate this data to the user. I was actually the one who added the initial support for XMP in MediaWiki as part of a google summer of code project in 2010, so I'm intimately familar with that part of the code. So on the subject of ingestion for license data: Sometimes there are properties across different standards with the same meaning. Sometimes we try to map them together, often letting one type of metadata overwrite another. We try to follow http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf We generally only use the ingested metadata to display in a table on the image description page for extra infomration. We rely on the user to provide license info, and generally don't take that (or much else) from the file's metadata. The primary information about an image displayed by MediaWiki is directly from the user, not the file's metadata. One exception to that, is uploadWizard. It suggests some values based on image exif for date, author, gps location. It does not prefil license at this time. I don't know much about how it works or even if it uses MediaWiki's file metadata extraction routines, or its own thing. We recognize the following properties related to licensing/authorship: Exif: *Copyright *Artist PNG text chunks: *Copyright *Author *Artist Legacy IPTC: * Copyright (2:116) * Byline (2:80) * Credit (2:110) [Although, that doesn't mean what you normally think credit does] * Contact (2:118) XMP: *XMP (using http://creativecommons.org/ns# namespace): ** license ** morePermissions ** attributionURL ** attributionName *XMP (using http://ns.adobe.com/xap/1.0/rights/ namespace ): ** 'Certificate' ** 'Marked' ** 'Owner' **'UsageTerms' **WebStatement * XMP (http://purl.org/dc/elements/1.1/) **rights **creator **contributor *Exif encoded as XMP (aka http://ns.adobe.com/tiff/1.0/ namespace) **Artist **Copyright * XMP (new iptc http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/ ) **CreatorContactInfo Note: IPTC-as-XMP has some props related to copyright/authorship we don't support such as Licensor Note: We do not extract CC-tags from SVGs Its also possible I missed something here. As for adding metadata to images: *We serve the original image exactly as is, if the user asks for the image in the original size. This would include all original metadata, but we don't add any *If the user asks for a small image, we strip all metadata except colour profile. We add a comment (JPEG: a JPEG COM segment. PNG: some sort of text chunk) saying "File source: <url to file>" I personally think we should be adding copyright metadata to thumbnails (but I do not consider it a high priority). Particularly for larger thumbs, where the overhead would be minimal. XMP has a downside of not being that compact. XML is not very compact to begin with. Additonally the official spec suggests people add a lot of whitespace to allow in-place editing, and do not use compression, even if the format supports it, so that people can just scan the file. Most libraries that write it seem to do that. Which is unfortunate from our prespective where we are trying to minimize thumb size. If we were to start adding metadata to thumbs, I think we would start with exif Artist and possibly an exif copyright field that has info on the license. I would be supportable of a system where if the thumbnail is super small (say < 5 kb), we put nothing, if it is medium size (5-300kb) we put those two exif fields I mention. If its > 300kb, we put author, copyright, gps, creative commons xmp tags in it. But other's might feel that reducing thumb bandwidth is more important then the metadata, so its something that would probably have to be discussed (Not to mention, so far nobody has actually volunteered to code it...) That said, I do not see what creative commons stands to gain from adding more metadata standards. The existing XMP fields seem good for people who want fine grained info about the copyright of their image. It seems like new exif fields would take a long time to propagate to implementations As for what standards are widely used: Ancedotally, there's probably a lot more people using Exif then anything else, but that's due to automatic support for digitial cameras. It seems very few people explicitly mark their images with metadata. I hope that helps -- -Brian

Bartosz Dziewoński

1:09 p.m.

New subject: An image metadata standard to automate attribution and license info

On Wed, 30 Sep 2015 01:44:27 +0200, Brian Wolff <bawolff(a)gmail.com> wrote:

...

One exception to that, is uploadWizard. It suggests some values based on image exif for date, author, gps location. It does not prefil license at this time. I don't know much about how it works or even if it uses MediaWiki's file metadata extraction routines, or its own thing.

Yes, it does. We use the 'imageinfo' property returned by 'action=upload' API when uploading the file to stash, and prefill some of the information before showing user the form to complete the upload. UploadWizard attempts to prefill creation date, author, description and location fields. It ignores the license, like you said. -- Bartosz Dziewoński

Yongmin Hong

9:58 a.m.

New subject: An image metadata standard to automate attribution and license info

If you want Commons' community opinion, you'd better see commons-l. (CC'd) -- revi https://revi.me -- Sent from Android -- 2015. 9. 30. 오전 6:19에 "Jane Park" <janepark(a)creativecommons.xn--org>-4f21ay07k 작성:

...

Maarten Brinkerink

10:01 a.m.

New subject: An image metadata standard to automate attribution and license info

Dear Jane, Exciting stuff. Be sure to contact the people from http://commonsmachinery.se/about-us/ <http://commonsmachinery.se/about-us/> to ask about their experience. Best, Maarten

...

Op 30 sep. 2015, om 09:58 heeft Yongmin Hong <lists(a)revi.pe.kr> het volgende geschreven: If you want Commons' community opinion, you'd better see commons-l. (CC'd) -- revi https://revi.me <https://revi.me/> -- Sent from Android -- 2015. 9. 30. 오전 6:19에 "Jane Park" <janepark(a)creativecommons.org <mailto:janepark@creativecommons.org>>님이 작성: Hi everyone, I lead platform work <https://github.com/creativecommons/platform-initiative> at Creative Commons. As part of that work, we are exploring the potential of a standard field in EXIF that could make attribution and license info more sticky across the web. We are currently in the research phase -- talking to major image hosting platforms (and platforms that read and ingest images) about what kinds of image metadata they read and retain. Zhou and his engineering team at Wikimedia directed me to this list as I am seeking feedback from the Wikimedia community. Ultimately, we want to make it easier for platforms to display provenance and license info -- increase the likelihood that when a user lands on an image, they know who created it and what license to use it under. For example, images from Wikimedia Commons may get tweeted, but the image metadata is not retained in tweets. How can we work with platforms to use the same metadata standard so that info can be retained across them? Since we are just in the research phase now, I welcome your thoughts on Wikimedia Commons' and Wikipedia's own uses of image metadata. Specifically: The most common image metadata standards we know about are EXIF and XMP. Which does Wikimedia primarily read and retain? Are there others that are more widely used? Which standard does Wikimedia prefer? What would be easiest to implement? for Wikimedia, but also for the platforms that Wikimedia interfaces with. Aka, what are the pros and cons of each? Lastly, welcome any general thoughts about the feasibility and need for such a project. Best, Jane Jane Park @janedaily Creative Commons | Los Angeles Make a donation to support CC in 2015: http://bit.ly/supportcc2015 <http://bit.ly/supportcc2015> _______________________________________________ Multimedia mailing list Multimedia(a)lists.wikimedia.org <mailto:Multimedia@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/multimedia <https://lists.wikimedia.org/mailman/listinfo/multimedia> _______________________________________________ Multimedia mailing list Multimedia(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

Federico Leva (Nemo)

1 Oct 1 Oct

9:22 p.m.

New subject: An image metadata standard to automate attribution and license info

The main issue with this effort, on Wikimedia and elsewhere, will be that there is no guarantee we have any metadata about file attribution and copyright status. See also https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive Even if you made a <link rel="license">-like thing and collapsed all the PD tags under PDM's URL, it would be quite a mess. And Flickr is not so consistent in how it marks public domain. And Google is dropping support for license statements altogether. Etc. Nemo

Brian Wolff

10:33 p.m.

New subject: An image metadata standard to automate attribution and license info

In terms of adding metadata - Even if we don't know all files, that doesn't mean we don't know most files. Doing it for 90% of files we do know is better then doing it for 0% of files (imho)

...

And Google is dropping support for license statements altogether. Etc.

That's sad. I didn't know that. -- -bawolff

Federico Leva (Nemo)

2 Oct 2 Oct

7:26 a.m.

New subject: An image metadata standard to automate attribution and license info

Brian Wolff, 01/10/2015 22:33:

...

>And Google is dropping support >for license statements altogether. Etc.

That's sad. I didn't know that.

Note, I mean on their own photo hosting: Google+ doesn't allow marking photos with licenses, unlike Picasa. https://plus.google.com/111275098680181852126/posts/ZBD3vSWyvLw (And Panoramio was closed, and Commons is no longer shown on Gmaps.) Nemo

Gergo Tisza

3 Oct 3 Oct

12:40 a.m.

New subject: An image metadata standard to automate attribution and license info

On Thu, Oct 1, 2015 at 12:22 PM, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote:

...

As that page shows, we have machine-readable metadata for the license at least for 99% of Commons files and 99% of all files. The number probably gets much higher when weighted by number of views. I would certainly not consider missing metadata in 1% of our files the main issue.

Federico Leva (Nemo)

10:30 p.m.

New subject: An image metadata standard to automate attribution and license info

Ah, forgot to say: Jane, this initiative seems to very much overlap the DPLA/Europeana ongoing "standardization" effort: https://docs.google.com/document/d/1H6TWxGARqUMxJrc2sXjaBlOsg7UkUTb27rvtS8a… I hope you join forces for a combined outcome. CC's practical approach (for metadata, machine-readability etc.) can be very useful to counter some rather speculative discussions found elsewhere; and DPLA, Europeana have a lot of hands-on experience with very messy copyright information, which CC almost certainly lacks. Gergo Tisza, 03/10/2015 00:40:

...

The main issue with this effort, on Wikimedia and elsewhere, will be that there is no guarantee we have any metadata about file attribution and copyright status. See also https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive As that page shows, we have machine-readable metadata for the license at least for 99% of Commons files and 99% of all files. The number probably gets much higher when weighted by number of views. I would certainly not consider missing metadata in 1% of our files the main issue.

Numbers are not everything; there's much more in that page. The coverage is so good because we cheat ;-): public domain and all rights reserved files can be marked in the same way! https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive/How_to_fix_meta… And most of our tags on Commons have no meaning for the external world, for instance last time I checked we didn't use the public domain mark for public domain files. Nemo

Jean-Frédéric

4 Oct 4 Oct

2:23 a.m.

New subject: An image metadata standard to automate attribution and license info

...

And most of our tags on Commons have no meaning for the external world, for instance last time I checked we didn't use the public domain mark for public domain files.

We actually do, on about 2.6M files :) [1] (Transclusion of this on our PD templates is probably not consistent though, and maybe not justified in some cases.) [1] <https://commons.wikimedia.org/wiki/Category:CC-PD-Mark> -- Jean-Frédéric

Federico Leva (Nemo)

4 Feb 4 Feb

4:53 p.m.

New subject: An image metadata standard to automate attribution and license info

Federico Leva (Nemo), 03/10/2015 22:30:

...

Ah, forgot to say: Jane, this initiative seems to very much overlap the DPLA/Europeana ongoing "standardization" effort: https://docs.google.com/document/d/1H6TWxGARqUMxJrc2sXjaBlOsg7UkUTb27rvtS8a…

Looks like they published something: http://rightsstatements.org/ Nemo

Jane Park

17 Feb 17 Feb

7:06 p.m.

New subject: An image metadata standard to automate attribution and license info

Hey everyone, We’ve done some initial thinking on what it would take to standardize the copyright field in Exif to contain CC license info. We’d like for you to take a look and provide any initial comments and input. The draft is on Github, along with a command line tool that demonstrates the concept. The tool is essentially a proof of concept for a platform like Flickr where one could imagine a user passing in a JPEG image, adding some info about creator, title, and license info, and the tool would then update the specified Exif field(s) in the JPEG file with the details. Here’s the draft proposal: https://github.com/creativecommons/exif Here’s the command line proof of concept for developers: https://github.com/creativecommons/exif/blob/master/ccexif.pl Here’s the visual depicting how the process works generally: https://cloud.githubusercontent.com/assets/33296/13119028/a8995792-d56c-11e… You can leave feedback as an issue in the Github repo, on this discussion list, or directly to me. I’m also inviting feedback from potential platform partners. The goal is to get some version of this widely adopted, so interest and feedback is essential. I’ve listed below some of the platforms I am talking to. If you have any others you’d like to connect me with at this point, please get in touch! Best, Jane, Matt, and Rob Platforms to discuss EXIF standard with: 1. Wikimedia Commons 2. Flickr 3. 500px 4. Internet Archive 5. Imgur 6. Tumblr 7. Twitter 8. Buzzfeed 9. Google images/docs 10. Facebook On Thu, Feb 4, 2016 at 7:53 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com> wrote:

...

Federico Leva (Nemo), 03/10/2015 22:30:

Ah, forgot to say: Jane, this initiative seems to very much overlap the DPLA/Europeana ongoing "standardization" effort: https://docs.google.com/document/d/1H6TWxGARqUMxJrc2sXjaBlOsg7UkUTb27rvtS8a…

Looks like they published something: http://rightsstatements.org/ Nemo _______________________________________________ Multimedia mailing list Multimedia(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia

-- Jane Park Director of Platforms and Partnerships Creative Commons | Los Angeles @janedaily https://stateof.creativecommons.org/ Get updates: http://bit.ly/commonsnews

2993

days inactive

3134

days old

multimedia@lists.wikimedia.org

Manage subscription

19 comments

11 participants

tags (0)

participants (11)

Bartosz Dziewoński
bawolff
Brian Wolff
Daniel and Elizabeth Case
Federico Leva (Nemo)
Gergo Tisza
Jane Park
Jean-Frédéric
Maarten Brinkerink
Pine W
Yongmin Hong