Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)

3 Sep 2014

@Joe Filceolaire

Fair enough.  I had misread the rules.  I thought it was the Commons Cat 
that needed to have a sitelink to some other page on any Wikimedia 
Project, rather than the requirement just being that a Wikidata item 
needed to have a sitelink to eg a Commons Cat.

So per the current rules, these Commons Cats could all have Wikidata 
items (though I still think that would be a mistake).

...
  I fact I believe nearly every Commons Category has a
corresponding
 wikidata category item. 
That is not correct.

There are currently 3,338,000 categories on Commons (excluding redirects)

About 250,000 category-like items on Wikidata have links to Commons (the 
number is similar either counting sitelinks, or property P373.)

About 688,000 article-like items on Wikidata have links to Commons 
categories using property P373.

So between 2,400,000 and 2,650,000 categories on Commons are currently 
pointed to by neither a category-like item, not an article-like item.

In my view that should continue to be the case.

We're setting up a separate database or namespace for Commons files 
anyway; so doesn't it make more sense for entities like Commons 
categories that really only relate to Commons to have items held in that 
database or namespace, rather than in main Wikidata?

What are the advantages of adding two and a half million items of 
wiki-junk to Wikidata?

Yes, like other items on CommonsData, the properties of such C-items 
would normally point to Q-items on main Wikidata.

Looking at the modelling of the two categories in more detail:

First, Category:Images released by British Library Images Online

* It's not clear that BL Images Online would actually have its own 
Q-item.  The British Library certainly does.  Images Online is one of 
many parts of the BL.

But even if we create Images Online as a useful thing to link to, that's 
not really the point.  This category (despite its title) is really for a 
specific release of images from BL Images Online.  If there were another 
release, that would have a new different (sub-)category.

Yes, we could perhaps capture the set with a query specifying the source 
and the date.  But as a distinctive set, its useful to have a (C-)item 
that can represent it, (i) acting as a container for the query, and any 
other information about the set that might be relevant; and (ii) acting 
as a target for searches, so the set can be retrieved directly with a 
simple search, rather than requiring a complex search combining multiple 
properties.

Secondly, Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd

Again, the important thing is that (despite its title) what this 
category really represents is a particular set of *scans*.

There are already titles where we have multiple sets of scans for a 
single book, from different sources, often with different image 
characteristics.

In the jargon, these scan-sets are called "manifestations" of the work. 
  On main Wikidata, current guidance is to have Q-items for works, and 
Q-items for editions, but not Q-items for manifestations of editions. 
So on current sourcing guidance, again, this category should not have a 
Q-item.

But it does make sense for it to have an item for operational reasons on 
Commons, so (IMO) it makes sense for it to have a C-item on CommonsData.

The C-item would reference the Q-item on WikiData about the edition; but 
would also contain information specific to the C-item -- for example, 
that the source for these scans was a particular copy of the book 
scanned and released as part of the Mechanical Curator collection.

Scans of other copies of the same edition of the same book might have 
separately been released as part of the Mechanical Curator collection, 
part of the Wellcome collection, part of a release by the NYPL, or part 
of the Internet Archive Book Images collection (which in itself can 
contain multiple releases of the same book, from different libraries).

This source information can be quite detailed, along with credit-line 
information, and specific link-back information.  So (IMO) it makes 
sense to be able to hold it as a single item for the set, rather than 
only be able to extract it as a query from the individual images.

Furthermore, this is information that one wants to be able to display on 
the Commons category page.  It doesn't make sense to have to run a query 
over the images (which images? all of them?) in the category, just to be 
able to display header information on the category page.

   -- James.

On 01/09/2014 17:43, Joe Filceolaire wrote:
...
  James
 I think the problem is not as difficult as you have described.

 If we look at http://www.wikidata.org/wiki/Wikidata:Notability then you
 will see that each wikimedia commons page can have a corresponding item.
 The comment that "a sitelink to a category page in Wikimedia Commons is
 *not* allowed on main article items" means that Commons Category pages
 should link to Category items and not to items linked to wikipedia
 articles. It does not mean items linked to Commons Categories are not
 allowed. I fact I believe nearly every Commons Category has a corresponding
 wikidata category item.

 Notability Criterion 3. reads "(An item is acceptable if) It fulfills *some
 structural need*, for example: it is needed to make statements made in
 other items more useful.". I believe that this allows the creation of items
 for institutions, photographers, books etc as required to describe Commons
 files. Considering the two examples you identified:

 Category:Images released by British Library Images Online
 Each of these images can have the statement 'Source:British Library Images
 Online'. This statement requires a CommonsData Property "Source" and a
 wikidata item "British Library Images Online". As this wikidata item is
 needed to complete this statement therefore it meets wikidata notability 3.

 Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd
 Again wikidata items can be created for the book "Metropolitan
 Improvements" and for the author "Thomas Hosmer Shepherd" and for the
books
 publisher (if known). All of these are clearly all considered as notable
 under https://www.wikidata.org/wiki/Help:Sources. These wikidata items can
 then be linked to from statements in CommonsData describing each of the
 images.

 Note that this all works without needing to link to the Category Qitems.

 In practice this means that if a Commons file is in a certain category then
 we can know that certain statements will apply to that file. Later,
 eventually, we can find those files by searching for files to which those
 statements apply and ignore the categorisation since all the information
 inherent in membership of that Category has been included in the form of
 statements. We do not need a "container for structured information
 for structured information associated with each commonscat". This
 structured information can just be included in CommonsData, without any
 separate 'container'.

 Eventually, when the information inherent in the categorisation system has
 been translated into structured data, and the query system is a lot more
 useful than today, and the Categories based on idiosyncratic selection
 criteria have been transitioned into Galleries where they should have been
 all along then Categories may no longer be needed.

 But perhaps we will keep them anyway.

 Joe

 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] Commons Categories again (was Re: Commons Wikibase)