On 13/10/2014 13:03, Daniel Kinzler wrote:
Am 13.10.2014 00:17, schrieb Jane Darnell:
I think the place for all data about an image
should be Wikidata.
Do you really mean *any* image?
E.g., if we have a scan of an old book with 50 engravings, do you want to make a
wikidata item for each engraving? Or just for the book? Engravings are often
simple illustrations, not notable of and by themselves, and there is frequently
very little we can say about them, except for which book they were published in.
It seems to me that it makes more sense to just model the book on Wikidata, not
each illustration (or even every page, including the text-only ones, in case
they are extracted to a png file or something).
Thinking about books of engravings, eg a set like this:
https://commons.wikimedia.org/wiki/Category:Views_of_the_Seats_of_Noblemen_…
There is a fair amount one can say about each of these engravings: what
the subject is; and where that location is; who was the artist, and who
was the engraver; when the engraving was first published (which may or
may not be the same as the date at which it was first collected).
We probably also want to identify the *edition* of the book it was taken
from, and probably also the scan-set -- each with a page number or
sequence number, so the set can be easily retrieved and displayed in the
right order.
In terms of items required, at the moment membership of a scan-set or an
edition of the book might be handled by membership of a category. It's
not clear how it is intended to represent such categories and their
memberships in the new structured approach. Does one associate the
scanset item directly with a category? Or is the scanset item its own
thing, that one maps the category onto? And is the scanset an item on
Wikidata, or an item somewhere else?
A further issue arises when we have more than one copy of the same
engraving.
eg:
https://commons.wikimedia.org/wiki/File:Neale%281818%29_p6.190_-_Fleurs,_Ro…
https://www.wikidata.org/wiki/File:MA%281829%29_p.340_-_Fleurs_-_John_Prest…
At the moment on Commons one can make a gallery of "other versions" on
the filepage, each with a short footer to explain what that version is.
So it probably makes sense to be able to record that we have multiple
representations or versions of the same basic thing, which presumably
means some kind of object to represent that basic thing - here an engraving.
Turning to Gergo's model of "squashing" all of the information onto a
limited number of nodes (ie an item per file, plus some floating items
on Wikidata), and just making information into properties of those
items, I think there is a problem.
The specific thing is that we want to associate various properties
together, as all being tied to a particular stage of development of the
work -- ie a distinguishable "work" entity, in the language of the draft
"Multimedia data model" API at
https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj…
In particular, in the case of rights information, we need to carefully
associate the rights information with the other fields it relates to:
the author, the date, the nature of the contribution, the act of
licensing or release or assessment.
This is tricky because there may be multiple "stages of development"
associated with a single file, each with its own
author/date/contribution/license information. Yet there may
nevertheless only be the one file on Commons.
Even if the image has been 'restored' by a Commons user, this will not
necessarily generate a separate file -- standard practice for many
restorers (myself included) is to upload the restored version over the
previous version, so the reader can easily compare the two by looking at
the file history (and access an earlier version to download, if they so
wish).
(Another example could be where we may want to associate a particular
music file of a piece of classical music with a particular modern
edition of the score, even if the piece was originally from the 18th
century. Even if the only file we have is the recording, we still need
to be able to reflect the rights in the score.)
Another important class of data is date information. There may be
multiple dates associated with an image -- and we may want to sort, or
filter, or order by any of them. But really, to be meaningful, we don't
really want to associate the dates with the image, but rather with a
stage of development in the derivative chain that has led to the image.
So again, the idea of what the API the "work" comes forward, but again
there cannot be presumed to be a bi-directionally unique 1 <-> 1
identity between a "work" in this sense and any image on Commons, nor
(unless decreed otherwise) an item on Wikidata.
I don't know the right way to go forward, which is why I started this
thread.
On the one hand, I'd like to avoid if possible a vast multiplication of
items on Wikidata, for all the reasons I brought up a couple of months
ago, when I wondered whether there should be an item created on Wikidata
for every present Commons Category -- something which made me uneasy.
But on the other hand, there is a huge virtue in consistency -- on there
being a particular place where you know a particular piece of
information will be (if it exists); rather than there being a complexity
of multiple places it could be, depending on whether this has an item or
not, or that has an item or not, or the other.
I think something we definitely do need is worked-through examples of
how data might be stored for some quite complicated cases, for people to
be able to discuss and critique, rather than only the most simple type
of cases discussed so far.
So, for example, suppose we had as a particular test-case the following:
An image that has been enhanced & overwritten by a User in 2014 -- based
on a scan from a set made and released by an Institution in 2012 -- of
an engraving published in an 1850s book -- but created and first
published in the 1830s, by an engraver after a sketching artist -- after
an oil painting (since destroyed) painted by an important painter in the
1540s.
How in detail do we think that might be stored, identifying the
different contributors and dates and contributions, so one could sort by
* (a) contributor and the nature of their contribution
-- eg best surviving representation of every known painting associated
with Holbein.
* (b) date and the nature of the contribution
-- eg best surviving representation of every known painting made in the
1540s
-- eg engravings first published in the 1830s
I don't know what the way forward is, but I think this is the kind of
information we ought to be able to represent; and of sort we ought to be
able to do.
-- James.