Hello- pardon me for any hint of ranting ....
Please consider a different namespace than the Category namespace for defining TYPEs of things; of course, I am proposing a standard MW "Type" namespace. There are benefits to this refinement.
1. Easier for users - today categories are a mixture of plurals and singulars, a complete mess. Users simply don't think that the -type- of a page is a plural concept. They just don't. I have seen time and again their confusion. They wonder why a type of page is "Persons" not "Person".
2. Easier for admins - today the category namespace is open to all users, as it should be to accommodate "folksonomies". Wikidata though will introduce types of pages and surely would like to have some control over the controlled vocabulary for the wiki. What does one do? I guess lock down individual category pages. I don't know if an admin can list pages that are 'controlled' without resorting to yet another category. A separate namespace for controlled vocabularies can have more efficient namespace-level security.
3. Easier for the system - today an sql index holds categories attached to pages; it's probably the most heavily referenced in the system. If every page in a wiki has a category, and there are say 4 standard/special attributes per page, then there's a minimum 5 triples per page. For 10K pages, there are 50K triples minimum before a single infobox factoid has been tagged. That's a burden that may not scale. Allocating a separate index for the Type namespace is smart because a global/wikifarm's Type namespace could be referenced rather than a local Category namespace.
4. Sem forms - today forms are displayed in part by referencing the MW namespace. As an admin I'd like to allow people to create their own forms. But the MW namespace -- being a utility namespace -- consequently needs to be open to the general public. Forms instead should be associated with page-types via the Type namespace, an unambiguous design that puts MW: back under admin control.
5. Ontological justification - are (owl) classes really just souped-up categories? A takeaway I have from a conversation with Ward Cunningham is that the Category namespace is simply for lists of things (it was even at first called "List" I believe). Yes, classes and lists have the common notion of a set of things, but really, when did rdfs:Class become a subclass of rdf:Bag? Never was. And in this regard note that rdf:type doesn't reference a resource whose rdf:type is "Type" as one (not in the club) might naively think. Strange to most when singular names designate a set of things (like owl:Class does, which should be called owl:Classes I guess) Typing category pages is pretty problematic too ... given the distinction between metaclasses and annotation properties versus classes and class properties.
6. Common sense -- categories are LISTS of things, they should not be used for types of things. Types of things are singular in nature (with exceptions) while categories pretty much ALWAYS have plural names so as to be consistent with the definition that a category is just a list of pages.
IOW, categories should not be used for types of things nor for subject headings. I am not seeking the perfect as another writer forewarns us all. I am seeking to learn from mistakes, not to burn them into the next generation of MW software. In short, I'd like to see separate namespaces for subjects, nouns, adjectives, adverbs, participles, etc - a complete dictionary of common words & phrases. Frankly I see it harmful to the whole community to throw all these into one namespace -- category -- as it results in an unmanageable design and wildly unpredictable contents.
In summary I'd like to see a LEXICAL SEMANTIC design for Wikidata. Again, this note does *not* seek perfection, it is seeking to identify and to learn from our experiences. My experience is that the Category namespace has been functionally overloaded to the detriment of 'good' system design.
John McClure
Hi John,
if you check out the data model [1] you will see that we do not plan to use the category system for classification.
I hope this answers the concerns, Cheers, Denny
[1] https://meta.wikimedia.org/wiki/Wikidata/Data_model
2012/4/4 John McClure jmcclure@hypergrove.com
Hello- pardon me for any hint of ranting ....
Please consider a different namespace than the Category namespace for defining TYPEs of things; of course, I am proposing a standard MW "Type" namespace. There are benefits to this refinement.
- Easier for users - today categories are a mixture of plurals and
singulars, a complete mess. Users simply don't think that the -type- of a page is a plural concept. They just don't. I have seen time and again their confusion. They wonder why a type of page is "Persons" not "Person".
- Easier for admins - today the category namespace is open to all users,
as it should be to accommodate "folksonomies". Wikidata though will introduce types of pages and surely would like to have some control over the controlled vocabulary for the wiki. What does one do? I guess lock down individual category pages. I don't know if an admin can list pages that are 'controlled' without resorting to yet another category. A separate namespace for controlled vocabularies can have more efficient namespace-level security.
- Easier for the system - today an sql index holds categories attached to
pages; it's probably the most heavily referenced in the system. If every page in a wiki has a category, and there are say 4 standard/special attributes per page, then there's a minimum 5 triples per page. For 10K pages, there are 50K triples minimum before a single infobox factoid has been tagged. That's a burden that may not scale. Allocating a separate index for the Type namespace is smart because a global/wikifarm's Type namespace could be referenced rather than a local Category namespace.
- Sem forms - today forms are displayed in part by referencing the MW
namespace. As an admin I'd like to allow people to create their own forms. But the MW namespace -- being a utility namespace -- consequently needs to be open to the general public. Forms instead should be associated with page-types via the Type namespace, an unambiguous design that puts MW: back under admin control.
- Ontological justification - are (owl) classes really just souped-up
categories? A takeaway I have from a conversation with Ward Cunningham is that the Category namespace is simply for lists of things (it was even at first called "List" I believe). Yes, classes and lists have the common notion of a set of things, but really, when did rdfs:Class become a subclass of rdf:Bag? Never was. And in this regard note that rdf:type doesn't reference a resource whose rdf:type is "Type" as one (not in the club) might naively think. Strange to most when singular names designate a set of things (like owl:Class does, which should be called owl:Classes I guess) Typing category pages is pretty problematic too ... given the distinction between metaclasses and annotation properties versus classes and class properties.
- Common sense -- categories are LISTS of things, they should not be used
for types of things. Types of things are singular in nature (with exceptions) while categories pretty much ALWAYS have plural names so as to be consistent with the definition that a category is just a list of pages.
IOW, categories should not be used for types of things nor for subject headings. I am not seeking the perfect as another writer forewarns us all. I am seeking to learn from mistakes, not to burn them into the next generation of MW software. In short, I'd like to see separate namespaces for subjects, nouns, adjectives, adverbs, participles, etc - a complete dictionary of common words & phrases. Frankly I see it harmful to the whole community to throw all these into one namespace -- category -- as it results in an unmanageable design and wildly unpredictable contents.
In summary I'd like to see a LEXICAL SEMANTIC design for Wikidata. Again, this note does *not* seek perfection, it is seeking to identify and to learn from our experiences. My experience is that the Category namespace has been functionally overloaded to the detriment of 'good' system design.
John McClure
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l