On 8 May 2013 18:26, Sumana Harihareswara sumanah@wikimedia.org wrote:
Recently a lot of people have been talking about what's possible and what's necessary regarding MediaWiki, CatScan-like tools, and real category intersection; this mail has some pointers.
The long-term solution is a sparkly query for, e.g., people with aspects novelist + Singaporean, and it would be great if Wikidata could be the data-source. Generally people don't really want to search using hierarchical categories; they want tags and they want AND. But MediaWiki's current power users do use hierarchical labels, so any change would have to deal with current users' expectations. Also my head hurts just thinking of the "but my intuitively obvious ontology is better than yours" arguments.
To put a nice clear stake in the ground, a magic-world-of-loveliness sparkly proposal for 2015* might be:
* Categories are implemented in Wikidata * -> They're in whatever language the user wants (so fr:Chat and en:Cat and nl:kat and zh-han-t:貓 …) * -> They're properly queryable * -> They're shared between wikis (pooled expertise)
* Pages are implicitly in the parent categories of their explicit categories * -> Pages in <Politicians from the Netherlands> are in <People from the Netherlands by profession> (its first parent) and <People from the Netherlands> (its first parent's parent) and <Politicians> (its second parent) and <People> (its second parent's parent) and … * -> Yes, this poses issues given the sometimes cyclic nature of categories' hierarchies, but this is relatively trivial to code around
* Readers can search, querying across categories regardless of whether they're implicit or explicit * -> A search for the intersection of <People from the Netherlands> with <Politicians> will effectively return results for <Politicians from the Netherlands> (and the user doesn't need to know or care that this is an extant or non-extant category) * -> Searches might be more than just intersections, e.g. "<Painters from the United Kingdom> AND <Living people> NOT <Members of the Royal Academy>" or whatever. * -> Such queries might be cached (and, indeed, the intersections that people search for might be used to suggest new categorisation schemata that wikis had previously not considered - e.g. <British politicians> & <People with pet cats> & <People who died in hot-ballooning accidents)
* Editors can tag articles with leaf or branch categories, potentially over-lapping and the system will rationalise the categories on save to the minimally-spanning subset (or whatever is most useful for users, the database, and/or both) * -> Editors don't need to know the hierarchy of categories *a priori* when adding pages to them (yay, less difficulty) * -> Power editors don't need to type in loads of different categories if they have a very specific one in mind (yay, still flexible) * -> Categories shown to readers aren't necessarily the categories saved in the database, at editorial judgement (otherwise, would a page not be in just a single category, namely the intersection of all its tagged categories?)
Apart from the time and resources needed to make this happen and operational, does this sound like something we'd want to do? It feels like this, or something like it, would serve our editors and readers the best from their perspective, if not our sysadmins. :-)
[Snip]
I think the best place to pursue this topic is probably in https://meta.wikimedia.org/wiki/Talk:Beyond_categories . It's unlikely Wikimedia Foundation will be able to make engineers available to work on this anytime soon, but I would not be surprised if the Wikidata developer community or volunteers found this interesting enough to work on.
I guess I should post this there too, maybe once someone's told me if it's mad-cap. ;-)
J.