Fair enough. I had misread the rules. I thought it was the Commons Cat
that needed to have a sitelink to some other page on any Wikimedia
Project, rather than the requirement just being that a Wikidata item
needed to have a sitelink to eg a Commons Cat.
So per the current rules, these Commons Cats could all have Wikidata
items (though I still think that would be a mistake).
I fact I believe nearly every Commons Category has a
wikidata category item.
That is not correct.
There are currently 3,338,000 categories on Commons (excluding redirects)
About 250,000 category-like items on Wikidata have links to Commons (the
number is similar either counting sitelinks, or property P373.)
About 688,000 article-like items on Wikidata have links to Commons
categories using property P373.
So between 2,400,000 and 2,650,000 categories on Commons are currently
pointed to by neither a category-like item, not an article-like item.
In my view that should continue to be the case.
We're setting up a separate database or namespace for Commons files
anyway; so doesn't it make more sense for entities like Commons
categories that really only relate to Commons to have items held in that
database or namespace, rather than in main Wikidata?
What are the advantages of adding two and a half million items of
wiki-junk to Wikidata?
Yes, like other items on CommonsData, the properties of such C-items
would normally point to Q-items on main Wikidata.
Looking at the modelling of the two categories in more detail:
First, Category:Images released by British Library Images Online
* It's not clear that BL Images Online would actually have its own
Q-item. The British Library certainly does. Images Online is one of
many parts of the BL.
But even if we create Images Online as a useful thing to link to, that's
not really the point. This category (despite its title) is really for a
specific release of images from BL Images Online. If there were another
release, that would have a new different (sub-)category.
Yes, we could perhaps capture the set with a query specifying the source
and the date. But as a distinctive set, its useful to have a (C-)item
that can represent it, (i) acting as a container for the query, and any
other information about the set that might be relevant; and (ii) acting
as a target for searches, so the set can be retrieved directly with a
simple search, rather than requiring a complex search combining multiple
Secondly, Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd
Again, the important thing is that (despite its title) what this
category really represents is a particular set of *scans*.
There are already titles where we have multiple sets of scans for a
single book, from different sources, often with different image
In the jargon, these scan-sets are called "manifestations" of the work.
On main Wikidata, current guidance is to have Q-items for works, and
Q-items for editions, but not Q-items for manifestations of editions.
So on current sourcing guidance, again, this category should not have a
But it does make sense for it to have an item for operational reasons on
Commons, so (IMO) it makes sense for it to have a C-item on CommonsData.
The C-item would reference the Q-item on WikiData about the edition; but
would also contain information specific to the C-item -- for example,
that the source for these scans was a particular copy of the book
scanned and released as part of the Mechanical Curator collection.
Scans of other copies of the same edition of the same book might have
separately been released as part of the Mechanical Curator collection,
part of the Wellcome collection, part of a release by the NYPL, or part
of the Internet Archive Book Images collection (which in itself can
contain multiple releases of the same book, from different libraries).
This source information can be quite detailed, along with credit-line
information, and specific link-back information. So (IMO) it makes
sense to be able to hold it as a single item for the set, rather than
only be able to extract it as a query from the individual images.
Furthermore, this is information that one wants to be able to display on
the Commons category page. It doesn't make sense to have to run a query
over the images (which images? all of them?) in the category, just to be
able to display header information on the category page.
On 01/09/2014 17:43, Joe Filceolaire wrote:
I think the problem is not as difficult as you have described.
If we look at http://www.wikidata.org/wiki/Wikidata:Notability
will see that each wikimedia commons page can have a corresponding item.
The comment that "a sitelink to a category page in Wikimedia Commons is
*not* allowed on main article items" means that Commons Category pages
should link to Category items and not to items linked to wikipedia
articles. It does not mean items linked to Commons Categories are not
allowed. I fact I believe nearly every Commons Category has a corresponding
wikidata category item.
Notability Criterion 3. reads "(An item is acceptable if) It fulfills *some
structural need*, for example: it is needed to make statements made in
other items more useful.". I believe that this allows the creation of items
for institutions, photographers, books etc as required to describe Commons
files. Considering the two examples you identified:
Category:Images released by British Library Images Online
Each of these images can have the statement 'Source:British Library Images
Online'. This statement requires a CommonsData Property "Source" and a
wikidata item "British Library Images Online". As this wikidata item is
needed to complete this statement therefore it meets wikidata notability 3.
Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd
Again wikidata items can be created for the book "Metropolitan
Improvements" and for the author "Thomas Hosmer Shepherd" and for the
publisher (if known). All of these are clearly all considered as notable
. These wikidata items can
then be linked to from statements in CommonsData describing each of the
Note that this all works without needing to link to the Category Qitems.
In practice this means that if a Commons file is in a certain category then
we can know that certain statements will apply to that file. Later,
eventually, we can find those files by searching for files to which those
statements apply and ignore the categorisation since all the information
inherent in membership of that Category has been included in the form of
statements. We do not need a "container for structured information
for structured information associated with each commonscat". This
structured information can just be included in CommonsData, without any
Eventually, when the information inherent in the categorisation system has
been translated into structured data, and the query system is a lot more
useful than today, and the Categories based on idiosyncratic selection
criteria have been transitioned into Galleries where they should have been
all along then Categories may no longer be needed.
But perhaps we will keep them anyway.
Wikidata-l mailing list