@Joe Filceolaire
Fair enough. I had misread the rules. I thought it was the Commons Cat that needed to have a sitelink to some other page on any Wikimedia Project, rather than the requirement just being that a Wikidata item needed to have a sitelink to eg a Commons Cat.
So per the current rules, these Commons Cats could all have Wikidata items (though I still think that would be a mistake).
I fact I believe nearly every Commons Category has a corresponding wikidata category item.
That is not correct.
There are currently 3,338,000 categories on Commons (excluding redirects)
About 250,000 category-like items on Wikidata have links to Commons (the number is similar either counting sitelinks, or property P373.)
About 688,000 article-like items on Wikidata have links to Commons categories using property P373.
So between 2,400,000 and 2,650,000 categories on Commons are currently pointed to by neither a category-like item, not an article-like item.
In my view that should continue to be the case.
We're setting up a separate database or namespace for Commons files anyway; so doesn't it make more sense for entities like Commons categories that really only relate to Commons to have items held in that database or namespace, rather than in main Wikidata?
What are the advantages of adding two and a half million items of wiki-junk to Wikidata?
Yes, like other items on CommonsData, the properties of such C-items would normally point to Q-items on main Wikidata.
Looking at the modelling of the two categories in more detail:
First, Category:Images released by British Library Images Online
* It's not clear that BL Images Online would actually have its own Q-item. The British Library certainly does. Images Online is one of many parts of the BL.
But even if we create Images Online as a useful thing to link to, that's not really the point. This category (despite its title) is really for a specific release of images from BL Images Online. If there were another release, that would have a new different (sub-)category.
Yes, we could perhaps capture the set with a query specifying the source and the date. But as a distinctive set, its useful to have a (C-)item that can represent it, (i) acting as a container for the query, and any other information about the set that might be relevant; and (ii) acting as a target for searches, so the set can be retrieved directly with a simple search, rather than requiring a complex search combining multiple properties.
Secondly, Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd
Again, the important thing is that (despite its title) what this category really represents is a particular set of *scans*.
There are already titles where we have multiple sets of scans for a single book, from different sources, often with different image characteristics.
In the jargon, these scan-sets are called "manifestations" of the work. On main Wikidata, current guidance is to have Q-items for works, and Q-items for editions, but not Q-items for manifestations of editions. So on current sourcing guidance, again, this category should not have a Q-item.
But it does make sense for it to have an item for operational reasons on Commons, so (IMO) it makes sense for it to have a C-item on CommonsData.
The C-item would reference the Q-item on WikiData about the edition; but would also contain information specific to the C-item -- for example, that the source for these scans was a particular copy of the book scanned and released as part of the Mechanical Curator collection.
Scans of other copies of the same edition of the same book might have separately been released as part of the Mechanical Curator collection, part of the Wellcome collection, part of a release by the NYPL, or part of the Internet Archive Book Images collection (which in itself can contain multiple releases of the same book, from different libraries).
This source information can be quite detailed, along with credit-line information, and specific link-back information. So (IMO) it makes sense to be able to hold it as a single item for the set, rather than only be able to extract it as a query from the individual images.
Furthermore, this is information that one wants to be able to display on the Commons category page. It doesn't make sense to have to run a query over the images (which images? all of them?) in the category, just to be able to display header information on the category page.
-- James.
On 01/09/2014 17:43, Joe Filceolaire wrote:
James I think the problem is not as difficult as you have described.
If we look at http://www.wikidata.org/wiki/Wikidata:Notability then you will see that each wikimedia commons page can have a corresponding item. The comment that "a sitelink to a category page in Wikimedia Commons is *not* allowed on main article items" means that Commons Category pages should link to Category items and not to items linked to wikipedia articles. It does not mean items linked to Commons Categories are not allowed. I fact I believe nearly every Commons Category has a corresponding wikidata category item.
Notability Criterion 3. reads "(An item is acceptable if) It fulfills *some structural need*, for example: it is needed to make statements made in other items more useful.". I believe that this allows the creation of items for institutions, photographers, books etc as required to describe Commons files. Considering the two examples you identified:
Category:Images released by British Library Images Online Each of these images can have the statement 'Source:British Library Images Online'. This statement requires a CommonsData Property "Source" and a wikidata item "British Library Images Online". As this wikidata item is needed to complete this statement therefore it meets wikidata notability 3.
Category:Metropolitan Improvements (1828) Thomas Hosmer Shepherd Again wikidata items can be created for the book "Metropolitan Improvements" and for the author "Thomas Hosmer Shepherd" and for the books publisher (if known). All of these are clearly all considered as notable under https://www.wikidata.org/wiki/Help:Sources. These wikidata items can then be linked to from statements in CommonsData describing each of the images.
Note that this all works without needing to link to the Category Qitems.
In practice this means that if a Commons file is in a certain category then we can know that certain statements will apply to that file. Later, eventually, we can find those files by searching for files to which those statements apply and ignore the categorisation since all the information inherent in membership of that Category has been included in the form of statements. We do not need a "container for structured information for structured information associated with each commonscat". This structured information can just be included in CommonsData, without any separate 'container'.
Eventually, when the information inherent in the categorisation system has been translated into structured data, and the query system is a lot more useful than today, and the Categories based on idiosyncratic selection criteria have been transitioned into Galleries where they should have been all along then Categories may no longer be needed.
But perhaps we will keep them anyway.
Joe
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l