Re: [Wikidata-l] [Multimedia] Inclusion criteria for Wikidata items for paintings, engravings, illustrations, manuscript folios, photographs, old postcards, etc ?

16 Oct 2014

On 13/10/2014 13:03, Daniel Kinzler wrote:
...
  Am 13.10.2014 00:17, schrieb Jane Darnell:
  I think the place for all data about an image
should be Wikidata. 
 Do you really mean *any* image?

 E.g., if we have a scan of an old book with 50 engravings, do you want to make a
 wikidata item for each engraving? Or just for the book? Engravings are often
 simple illustrations, not notable of and by themselves, and there is frequently
 very little we can say about them, except for which book they were published in.

 It seems to me that it makes more sense to just model the book on Wikidata, not
 each illustration (or even every page, including the text-only ones, in case
 they are extracted to a png file or something). 
Thinking about books of engravings, eg a set like this:
https://commons.wikimedia.org/wiki/Category:Views_of_the_Seats_of_Noblemen_…

There is a fair amount one can say about each of these engravings: what 
the subject is; and where that location is; who was the artist, and who 
was the engraver; when the engraving was first published (which may or 
may not be the same as the date at which it was first collected).

We probably also want to identify the *edition* of the book it was taken 
from, and probably also the scan-set -- each with a page number or 
sequence number, so the set can be easily retrieved and displayed in the 
right order.

In terms of items required, at the moment membership of a scan-set or an 
edition of the book might be handled by membership of a category.  It's 
not clear how it is intended to represent such categories and their 
memberships in the new structured approach.  Does one associate the 
scanset item directly with a category?  Or is the scanset item its own 
thing, that one maps the category onto?  And is the scanset an item on 
Wikidata, or an item somewhere else?

A further issue arises when we have more than one copy of the same 
engraving.

eg:

https://commons.wikimedia.org/wiki/File:Neale%281818%29_p6.190_-_Fleurs,_Ro…
https://www.wikidata.org/wiki/File:MA%281829%29_p.340_-_Fleurs_-_John_Prest…

At the moment on Commons one can make a gallery of "other versions" on 
the filepage, each with a short footer to explain what that version is.

So it probably makes sense to be able to record that we have multiple 
representations or versions of the same basic thing, which presumably 
means some kind of object to represent that basic thing - here an engraving.

Turning to Gergo's model of "squashing" all of the information onto a 
limited number of nodes (ie an item per file, plus some floating items 
on Wikidata), and just making information into properties of those 
items, I think there is a problem.

The specific thing is that we want to associate various properties 
together, as all being tied to a particular stage of development of the 
work -- ie a distinguishable "work" entity, in the language of the draft 
"Multimedia data model" API at
https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj…

In particular, in the case of rights information, we need to carefully 
associate the rights information with the other fields it relates to: 
the author, the date, the nature of the contribution, the act of 
licensing or release or assessment.

This is tricky because there may be multiple "stages of development" 
associated with a single file, each with its own 
author/date/contribution/license information.  Yet there may 
nevertheless only be the one file on Commons.

Even if the image has been 'restored' by a Commons user, this will not 
necessarily generate a separate file -- standard practice for many 
restorers (myself included) is to upload the restored version over the 
previous version, so the reader can easily compare the two by looking at 
the file history (and access an earlier version to download, if they so 
wish).

(Another example could be where we may want to associate a particular 
music file of a piece of classical music with a particular modern 
edition of the score, even if the piece was originally from the 18th 
century.  Even if the only file we have is the recording, we still need 
to be able to reflect the rights in the score.)

Another important class of data is date information.  There may be 
multiple dates associated with an image -- and we may want to sort, or 
filter, or order by any of them.  But really, to be meaningful, we don't 
really want to associate the dates with the image, but rather with a 
stage of development in the derivative chain that has led to the image. 
  So again, the idea of what the API the "work" comes forward, but again 
there cannot be presumed to be a bi-directionally unique 1 <-> 1 
identity between a "work" in this sense and any image on Commons, nor 
(unless decreed otherwise) an item on Wikidata.

I don't know the right way to go forward, which is why I started this 
thread.

On the one hand, I'd like to avoid if possible a vast multiplication of 
items on Wikidata, for all the reasons I brought up a couple of months 
ago, when I wondered whether there should be an item created on Wikidata 
for every present Commons Category -- something which made me uneasy.

But on the other hand, there is a huge virtue in consistency -- on there 
being a particular place where you know a particular piece of 
information will be (if it exists); rather than there being a complexity 
of multiple places it could be, depending on whether this has an item or 
not, or that has an item or not, or the other.

I think something we definitely do need is worked-through examples of 
how data might be stored for some quite complicated cases, for people to 
be able to discuss and critique, rather than only the most simple type 
of cases discussed so far.

So, for example, suppose we had as a particular test-case the following:

An image that has been enhanced & overwritten by a User in 2014 -- based 
on a scan from a set made and released by an Institution in 2012 -- of 
an engraving published in an 1850s book -- but created and first 
published in the 1830s, by an engraver after a sketching artist -- after 
an oil painting (since destroyed) painted by an important painter in the 
1540s.

How in detail do we think that might be stored, identifying the 
different contributors and dates and contributions, so one could sort by

* (a) contributor and the nature of their contribution
-- eg best surviving representation of every known painting associated 
with Holbein.
* (b) date and the nature of the contribution
-- eg best surviving representation of every known painting made in the 
1540s
-- eg engravings first published in the 1830s

I don't know what the way forward is, but I think this is the kind of 
information we ought to be able to represent; and of sort we ought to be 
able to do.

   -- James.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] [Multimedia] Inclusion criteria for Wikidata items for paintings, engravings, illustrations, manuscript folios, photographs, old postcards, etc ?