On 13/10/2014 13:03, Daniel Kinzler wrote:
Am 13.10.2014 00:17, schrieb Jane Darnell:
I think the place for all data about an image should be Wikidata.
Do you really mean *any* image?
E.g., if we have a scan of an old book with 50 engravings, do you want to make a wikidata item for each engraving? Or just for the book? Engravings are often simple illustrations, not notable of and by themselves, and there is frequently very little we can say about them, except for which book they were published in.
It seems to me that it makes more sense to just model the book on Wikidata, not each illustration (or even every page, including the text-only ones, in case they are extracted to a png file or something).
Thinking about books of engravings, eg a set like this: https://commons.wikimedia.org/wiki/Category:Views_of_the_Seats_of_Noblemen_a...
There is a fair amount one can say about each of these engravings: what the subject is; and where that location is; who was the artist, and who was the engraver; when the engraving was first published (which may or may not be the same as the date at which it was first collected).
We probably also want to identify the *edition* of the book it was taken from, and probably also the scan-set -- each with a page number or sequence number, so the set can be easily retrieved and displayed in the right order.
In terms of items required, at the moment membership of a scan-set or an edition of the book might be handled by membership of a category. It's not clear how it is intended to represent such categories and their memberships in the new structured approach. Does one associate the scanset item directly with a category? Or is the scanset item its own thing, that one maps the category onto? And is the scanset an item on Wikidata, or an item somewhere else?
A further issue arises when we have more than one copy of the same engraving.
eg:
https://commons.wikimedia.org/wiki/File:Neale%281818%29_p6.190_-_Fleurs,_Rox... https://www.wikidata.org/wiki/File:MA%281829%29_p.340_-_Fleurs_-_John_Presto...
At the moment on Commons one can make a gallery of "other versions" on the filepage, each with a short footer to explain what that version is.
So it probably makes sense to be able to record that we have multiple representations or versions of the same basic thing, which presumably means some kind of object to represent that basic thing - here an engraving.
Turning to Gergo's model of "squashing" all of the information onto a limited number of nodes (ie an item per file, plus some floating items on Wikidata), and just making information into properties of those items, I think there is a problem.
The specific thing is that we want to associate various properties together, as all being tied to a particular stage of development of the work -- ie a distinguishable "work" entity, in the language of the draft "Multimedia data model" API at https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVjm...
In particular, in the case of rights information, we need to carefully associate the rights information with the other fields it relates to: the author, the date, the nature of the contribution, the act of licensing or release or assessment.
This is tricky because there may be multiple "stages of development" associated with a single file, each with its own author/date/contribution/license information. Yet there may nevertheless only be the one file on Commons.
Even if the image has been 'restored' by a Commons user, this will not necessarily generate a separate file -- standard practice for many restorers (myself included) is to upload the restored version over the previous version, so the reader can easily compare the two by looking at the file history (and access an earlier version to download, if they so wish).
(Another example could be where we may want to associate a particular music file of a piece of classical music with a particular modern edition of the score, even if the piece was originally from the 18th century. Even if the only file we have is the recording, we still need to be able to reflect the rights in the score.)
Another important class of data is date information. There may be multiple dates associated with an image -- and we may want to sort, or filter, or order by any of them. But really, to be meaningful, we don't really want to associate the dates with the image, but rather with a stage of development in the derivative chain that has led to the image. So again, the idea of what the API the "work" comes forward, but again there cannot be presumed to be a bi-directionally unique 1 <-> 1 identity between a "work" in this sense and any image on Commons, nor (unless decreed otherwise) an item on Wikidata.
I don't know the right way to go forward, which is why I started this thread.
On the one hand, I'd like to avoid if possible a vast multiplication of items on Wikidata, for all the reasons I brought up a couple of months ago, when I wondered whether there should be an item created on Wikidata for every present Commons Category -- something which made me uneasy.
But on the other hand, there is a huge virtue in consistency -- on there being a particular place where you know a particular piece of information will be (if it exists); rather than there being a complexity of multiple places it could be, depending on whether this has an item or not, or that has an item or not, or the other.
I think something we definitely do need is worked-through examples of how data might be stored for some quite complicated cases, for people to be able to discuss and critique, rather than only the most simple type of cases discussed so far.
So, for example, suppose we had as a particular test-case the following:
An image that has been enhanced & overwritten by a User in 2014 -- based on a scan from a set made and released by an Institution in 2012 -- of an engraving published in an 1850s book -- but created and first published in the 1830s, by an engraver after a sketching artist -- after an oil painting (since destroyed) painted by an important painter in the 1540s.
How in detail do we think that might be stored, identifying the different contributors and dates and contributions, so one could sort by
* (a) contributor and the nature of their contribution -- eg best surviving representation of every known painting associated with Holbein. * (b) date and the nature of the contribution -- eg best surviving representation of every known painting made in the 1540s -- eg engravings first published in the 1830s
I don't know what the way forward is, but I think this is the kind of information we ought to be able to represent; and of sort we ought to be able to do.
-- James.