Hello-
pardon me for any hint of ranting ....
Please consider a different namespace than the Category namespace for
defining TYPEs of things; of course, I am proposing a standard MW "Type"
namespace. There are benefits to this refinement.
1. Easier for users - today categories are a mixture of plurals and
singulars, a complete mess. Users simply don't think that the -type- of a
page is a plural concept. They just don't. I have seen time and again their
confusion. They wonder why a type of page is "Persons" not "Person".
2. Easier for admins - today the category namespace is open to all users, as
it should be to accommodate "folksonomies". Wikidata though will introduce
types of pages and surely would like to have some control over the
controlled vocabulary for the wiki. What does one do? I guess lock down
individual category pages. I don't know if an admin can list pages that are
'controlled' without resorting to yet another category. A separate namespace
for controlled vocabularies can have more efficient namespace-level
security.
3. Easier for the system - today an sql index holds categories attached to
pages; it's probably the most heavily referenced in the system. If every
page in a wiki has a category, and there are say 4 standard/special
attributes per page, then there's a minimum 5 triples per page. For 10K
pages, there are 50K triples minimum before a single infobox factoid has
been tagged. That's a burden that may not scale. Allocating a separate index
for the Type namespace is smart because a global/wikifarm's Type namespace
could be referenced rather than a local Category namespace.
4. Sem forms - today forms are displayed in part by referencing the MW
namespace. As an admin I'd like to allow people to create their own forms.
But the MW namespace -- being a utility namespace -- consequently needs to
be open to the general public. Forms instead should be associated with
page-types via the Type namespace, an unambiguous design that puts MW: back
under admin control.
5. Ontological justification - are (owl) classes really just souped-up
categories? A takeaway I have from a conversation with Ward Cunningham is
that the Category namespace is simply for lists of things (it was even at
first called "List" I believe). Yes, classes and lists have the common
notion of a set of things, but really, when did rdfs:Class become a subclass
of rdf:Bag? Never was. And in this regard note that rdf:type doesn't
reference a resource whose rdf:type is "Type" as one (not in the club) might
naively think. Strange to most when singular names designate a set of things
(like owl:Class does, which should be called owl:Classes I guess) Typing
category pages is pretty problematic too ... given the distinction between
metaclasses and annotation properties versus classes and class properties.
6. Common sense -- categories are LISTS of things, they should not be used
for types of things. Types of things are singular in nature (with
exceptions) while categories pretty much ALWAYS have plural names so as to
be consistent with the definition that a category is just a list of pages.
IOW, categories should not be used for types of things nor for subject
headings. I am not seeking the perfect as another writer forewarns us all. I
am seeking to learn from mistakes, not to burn them into the next generation
of MW software. In short, I'd like to see separate namespaces for subjects,
nouns, adjectives, adverbs, participles, etc - a complete dictionary of
common words & phrases. Frankly I see it harmful to the whole community to
throw all these into one namespace -- category -- as it results in an
unmanageable design and wildly unpredictable contents.
In summary I'd like to see a LEXICAL SEMANTIC design for Wikidata. Again,
this note does *not* seek perfection, it is seeking to identify and to learn
from our experiences. My experience is that the Category namespace has been
functionally overloaded to the detriment of 'good' system design.
John McClure