On 9/29/15, Jane Park <janepark(a)creativecommons.org> wrote:
Hi everyone,
I lead platform work
<https://github.com/creativecommons/platform-initiative> at Creative
Commons. As part of that work, we are exploring the potential of a standard
field in EXIF that could make attribution and license info more sticky
across the web. We are currently in the research phase -- talking to major
image hosting platforms (and platforms that read and ingest images) about
what kinds of image metadata they read and retain. Zhou and his engineering
team at Wikimedia directed me to this list as I am seeking feedback from
the Wikimedia community.
Ultimately, we want to make it easier for platforms to display provenance
and license info -- increase the likelihood that when a user lands on an
image, they know who created it and what license to use it under. For
example, images from Wikimedia Commons may get tweeted, but the image
metadata is not retained in tweets. How can we work with platforms to use
the same metadata standard so that info can be retained across them?
Since we are just in the research phase now, I welcome your thoughts on
Wikimedia Commons' and Wikipedia's own uses of image metadata.
Specifically:
1. The most common image metadata standards we know about are EXIF and
XMP. Which does Wikimedia primarily read and retain? Are there others
that
are more widely used?
2. Which standard does Wikimedia prefer? What would be easiest to
implement? for Wikimedia, but also for the platforms that Wikimedia
interfaces with. Aka, what are the pros and cons of each?
Lastly, welcome any general thoughts about the feasibility and need for
such a project.
Best,
Jane
Jane Park
@janedaily
Creative Commons | Los Angeles
Make a donation to support CC in 2015:
http://bit.ly/supportcc2015
Hi Jane.
We support both XMP and Exif (along with some other things like PNG
iTXT chunks, the older non-xmp version of iptc, etc). To be specific,
we only accept properties we are already aware of. Since XMP is an
open standard that anyone can add to, we won't recognize any
properties not on our whitelist.
In the interface, we sometimes present all file metadata under the
name "Exif" regardless of where it comes from, so depending on who you
ask people might say we only support exif, which is untrue. Its just
how we communicate this data to the user.
I was actually the one who added the initial support for XMP in
MediaWiki as part of a google summer of code project in 2010, so I'm
intimately familar with that part of the code.
So on the subject of ingestion for license data:
Sometimes there are properties across different standards with the
same meaning. Sometimes we try to map them together, often letting one
type of metadata overwrite another. We try to follow
http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf
We generally only use the ingested metadata to display in a table on
the image description page for extra infomration. We rely on the user
to provide license info, and generally don't take that (or much else)
from the file's metadata. The primary information about an image
displayed by MediaWiki is directly from the user, not the file's
metadata.
One exception to that, is uploadWizard. It suggests some values based
on image exif for date, author, gps location. It does not prefil
license at this time. I don't know much about how it works or even if
it uses MediaWiki's file metadata extraction routines, or its own
thing.
We recognize the following properties related to licensing/authorship:
Exif:
*Copyright
*Artist
PNG text chunks:
*Copyright
*Author
*Artist
Legacy IPTC:
* Copyright (2:116)
* Byline (2:80)
* Credit (2:110) [Although, that doesn't mean what you normally think
credit does]
* Contact (2:118)
XMP:
*XMP (using
http://creativecommons.org/ns# namespace):
** license
** morePermissions
** attributionURL
** attributionName
*XMP (using
http://ns.adobe.com/xap/1.0/rights/ namespace ):
** 'Certificate'
** 'Marked'
** 'Owner'
**'UsageTerms'
**WebStatement
* XMP (
http://purl.org/dc/elements/1.1/)
**rights
**creator
**contributor
*Exif encoded as XMP (aka
http://ns.adobe.com/tiff/1.0/ namespace)
**Artist
**Copyright
* XMP (new iptc
http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/ )
**CreatorContactInfo
Note: IPTC-as-XMP has some props related to copyright/authorship we
don't support such as Licensor
Note: We do not extract CC-tags from SVGs
Its also possible I missed something here.
As for adding metadata to images:
*We serve the original image exactly as is, if the user asks for the
image in the original size. This would include all original metadata,
but we don't add any
*If the user asks for a small image, we strip all metadata except
colour profile. We add a comment (JPEG: a JPEG COM segment. PNG: some
sort of text chunk) saying "File source: <url to file>"
I personally think we should be adding copyright metadata to
thumbnails (but I do not consider it a high priority). Particularly
for larger thumbs, where the overhead would be minimal.
XMP has a downside of not being that compact. XML is not very compact
to begin with. Additonally the official spec suggests people add a lot
of whitespace to allow in-place editing, and do not use compression,
even if the format supports it, so that people can just scan the file.
Most libraries that write it seem to do that. Which is unfortunate
from our prespective where we are trying to minimize thumb size.
If we were to start adding metadata to thumbs, I think we would start
with exif Artist and possibly an exif copyright field that has info on
the license. I would be supportable of a system where if the thumbnail
is super small (say < 5 kb), we put nothing, if it is medium size
(5-300kb) we put those two exif fields I mention. If its > 300kb, we
put author, copyright, gps, creative commons xmp tags in it. But
other's might feel that reducing thumb bandwidth is more important
then the metadata, so its something that would probably have to be
discussed (Not to mention, so far nobody has actually volunteered to
code it...)
That said, I do not see what creative commons stands to gain from
adding more metadata standards. The existing XMP fields seem good for
people who want fine grained info about the copyright of their image.
It seems like new exif fields would take a long time to propagate to
implementations
As for what standards are widely used: Ancedotally, there's probably a
lot more people using Exif then anything else, but that's due to
automatic support for digitial cameras. It seems very few people
explicitly mark their images with metadata.
I hope that helps
--
-Brian