New subject: [Wikidata-l] Inclusion criteria for Wikidata items for paintings, engravings, illustrations, manuscript folios, photographs, old postcards, etc ?

30 Sep 2014


      Hi everybody,
With the Structured Data for Commons project about to move into high 
gear, it seems to me that there's something the Wikidata community needs 
to have a serious discussion about, before APIs start getting designed 
and set in stone.
Specifically: when should an object have an item with its own Q-number 
created for it on Wikidata?  What are the limits?  (Are there any limits?)
The position so far seems to be essentially that a Wikidata item has 
only been created when an object either already has a fully-fledged 
Wikipedia article written for it, or reasonably could have.
So objects that aren't particularly notable typically have not had 
Wikidata items made for them.
Indeed, practically the first message Lydia sent to me when I started 
trying to work on Commons and Wikidata was to underline to me that 
Wikidata objects should generally not be created for individual Commons 
files.
But, if I'm reading the initial plans and API thoughts of the Multimedia 
team correctly, eg
https://commons.wikimedia.org/w/index.php?title=File%3AStructured_Data_-_Sli...
and
https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVjm...
there seems to be the key assumption that, for any image that contains 
information relating to something beyond the immediate photograph or 
scan, there will be some kind of 'original work' item on main Wikidata 
that the file page will be able to reference, such that the 'original 
work' Wikidata item will be able to act as a place to locate any 
information specifically relating to the original work.
Now in many ways this is a very clean division to be able to make.  It 
removes any question of having to judge "notability"; and it removes any 
ambiguity or diversity of where information might be located -- if the 
information relates to the original work, then it will be stored on 
Wikidata.
But it would appear to imply a potentially *huge* increase in the 
inclusion criteria for Wikidata, and the number of Wikidata items 
potentially creatable.
So it seems appropriate that the Wikidata community should discuss and 
sign off just what should and should not be considered appropriate, 
before things get much further.
For example, a year ago the British Library released 1 million 
illustrations from out-of-copyright books, which increasingly have been 
uploaded to Commons.  Recently the Internet Archive has announced plans 
to release a further 12 million, with more images either already 
uploading or to follow from other major repositories including eg the 
NYPL, the Smithsonian, the Wellcome Foundation, etc, etc.
How many of these images, all scanned from old originals, are going to 
need new Q-numbers for those originals?  Is this okay?  Or are some of 
them too much?
For example, for maps, cf this data schema
https://docs.google.com/spreadsheets/d/1Hn8VQ1rBgXj3avkUktjychEhluLQQJl5v6WR... 
, each map sheet will have a separate Northernmost, Southernmost, 
Easternmost, Westernmost bounding co-ordinates.  Does that mean each map 
sheet should have its own Wikidata item?
For book illustrations, perhaps it is would be enough just to reference 
the edition of the book.  But if individual illustrations have their own 
artist and engraver details, does that mean the illustration needs to 
have its own Wikidata item?  Similarly, if the same engraving has 
appeared in many books, is that also a sign that it should have its own 
Wikidata item?
What about old photographs, or old postcards, similarly.  When should 
these have their own Wikidata item?  If they have their own known 
creator, and creation date, then is it most simple just to give them a 
Wikidata item, so that such information about an original underlying 
work is always looked for on Wikidata?  What if multiple copies of the 
same postcard or photograph are known, published or re-published at 
different times?  But the potential number of old postcards and 
photographs, like the potential number of old engravings, is *huge*.
What if an engraving was re-issued in different "states"  (eg a 
re-issued engraving of a place might have been modified if a tower had 
been built).  When should these get different items?
At
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts#Wikidata...
where I raised some of these issues a couple of weeks ago, there has 
even been the suggestion that particular individual impressions of an 
engraving might deserve their own separate items; or even everything 
with a separate accession number, so if a museum had three copies of an 
engraving, we would make three separate items, each carrying their own 
accession number, identifying the accession number that belonged to a 
particular File.
(See also other sections at 
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts for 
further relevant discussions on how to represent often quite complicated 
relations with Wikidata properties).
With enough items, we could re-create and represent essentially the 
entire FRBR tree.
We could do this.  We may even need to do this, if MM team's outline for 
Commons is to be implemented in its apparent current form.
But it seems to me that we shouldn't just sleepwalk into it.
It does seem to me that this does represent (at least potentially) a 
*very* large expansion in the number of items, and widening of the 
inclusion criteria, for what Wikidata is going to encompass.
I'm not saying it isn't the right thing to do, but given the potential 
scale of the implications, I do think it is something we do need to have 
properly worked through as a community, and confirmed that it is indeed 
what we *want* to do.
All best,
James.
(Note that this is a slightly different discussion, though related, to 
the one I raised a few weeks ago as to whether Commons categories -- eg 
for particular sets of scans -- should necessarily have their own 
Q-number on Wikidata.  Or whether some -- eg some intersection 
categories -- should just have an item on Commons data.   But it's 
clearly related: is the simplest thing just to put items for everything 
on Wikidata?  Or does one try to keep Wikidata lean, and no larger than 
it absolutely needs to be; albeit then having to cope with the 
complexity that some categories would have a Q-number, and some would not.)