Re: [Wikitech-l] category intersection conversations

8 May 2013


      On 8 May 2013 18:26, Sumana Harihareswara sumanah@wikimedia.org wrote:
...
Recently a lot of people have been talking about what's possible and
what's necessary regarding MediaWiki, CatScan-like tools, and real
category intersection; this mail has some pointers.
The long-term solution is a sparkly query for, e.g., people with aspects
novelist + Singaporean, and it would be great if Wikidata could be the
data-source.  Generally people don't really want to search using
hierarchical categories; they want tags and they want AND. But
MediaWiki's current power users do use hierarchical labels, so any
change would have to deal with current users' expectations.  Also my
head hurts just thinking of the "but my intuitively obvious ontology is
better than yours" arguments.
To put a nice clear stake in the ground, a magic-world-of-loveliness
sparkly proposal for 2015* might be:
* Categories are implemented in Wikidata
* -> They're in whatever language the user wants (so fr:Chat and en:Cat and
nl:kat and zh-han-t:貓 …)
* -> They're properly queryable
* -> They're shared between wikis (pooled expertise)
* Pages are implicitly in the parent categories of their explicit categories
* -> Pages in <Politicians from the Netherlands> are in <People from the
Netherlands by profession> (its first parent) and <People from the
Netherlands> (its first parent's parent) and <Politicians> (its second
parent) and <People> (its second parent's parent) and …
* -> Yes, this poses issues given the sometimes cyclic nature of
categories' hierarchies, but this is relatively trivial to code around
* Readers can search, querying across categories regardless of whether
they're implicit or explicit
* -> A search for the intersection of <People from the Netherlands> with
<Politicians> will effectively return results for <Politicians from the
Netherlands> (and the user doesn't need to know or care that this is an
extant or non-extant category)
* -> Searches might be more than just intersections, e.g. "<Painters from
the United Kingdom> AND <Living people> NOT <Members of the Royal Academy>"
or whatever.
* -> Such queries might be cached (and, indeed, the intersections that
people search for might be used to suggest new categorisation schemata that
wikis had previously not considered - e.g. <British politicians> & <People
with pet cats> & <People who died in hot-ballooning accidents)
* Editors can tag articles with leaf or branch categories, potentially
over-lapping and the system will rationalise the categories on save to the
minimally-spanning subset (or whatever is most useful for users, the
database, and/or both)
* -> Editors don't need to know the hierarchy of categories *a priori* when
adding pages to them (yay, less difficulty)
* -> Power editors don't need to type in loads of different categories if
they have a very specific one in mind (yay, still flexible)
* -> Categories shown to readers aren't necessarily the categories saved in
the database, at editorial judgement (otherwise, would a page not be in
just a single category, namely the intersection of all its tagged
categories?)
Apart from the time and resources needed to make this happen and
operational, does this sound like something we'd want to do? It feels like
this, or something like it, would serve our editors and readers the best
from their perspective, if not our sysadmins. :-)
[Snip]

...
I think the best place to pursue this topic is probably in
https://meta.wikimedia.org/wiki/Talk:Beyond_categories .  It's unlikely
Wikimedia Foundation will be able to make engineers available to work on
this anytime soon, but I would not be surprised if the Wikidata
developer community or volunteers found this interesting enough to work on.
I guess I should post this there too, maybe once someone's told me if it's
mad-cap. ;-)
J.
-- 
James D. Forrester
Product Manager, VisualEditor
Wikimedia Foundation, Inc.

jforrester@wikimedia.org | @jdforrester

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] category intersection conversations