[I worry we're talking about operational details, which should be a wider
discussion, rather than a technology/feasibility conversation to which this
list is more suited. Perhaps moving this on-wiki would be best?]
On 9 May 2013 09:28, Brad Jorsch <bjorsch(a)wikimedia.org> wrote:
On Wed, May 8, 2013 at 10:47 PM, James Forrester
* Pages are implicitly in the parent categories
of their explicit
* -> Pages in <Politicians from the
Netherlands> are in <People from the
Netherlands by profession> (its first parent) and <People from the
Netherlands> (its first parent's parent) and <Politicians> (its second
parent) and <People> (its second parent's parent) and …
* -> Yes, this poses issues given the sometimes cyclic nature of
categories' hierarchies, but this is relatively trivial to code around
Category cycles are the least of it. The fact that the existing
category hierarchy isn't based on any sensible-for-inference ontology
is a bigger problem.
Let's consider what would happen to one of my favorite examples on enwiki:
* The article for Romania is in <Black Sea countries>. Ok.
* And that category is in <Black Sea>, so Romania is in that too.
Which is a little strange, but not too bad.
* And <Black Sea> is in <Seas of Russia> and <Landforms of Ukraine>.
Huh? Romania doesn't belong in either of those, despite that being
equivalent to your example where pages in <Politicians from the
Netherlands> also end up in <People> via <Politicians>.
And it gets worse the further up you go. You would have Romania in
<Liquids> a few more levels up.
For this to work, each wiki would have to redo its category hierarchy
as a real ontology based on is-a relationships, rather than the
current is-somehow-related-to. Or we would have to introduce some
magic word or something to tell MediaWiki that <Politicians> is-a
<People> is a valid inference while <Black Sea countries> is-a <Black
In other words, code-wise adding "tags" to an article is the same as
categories with inference and querying. But trying to use the existing
category setup as it exists on something like enwiki as "tags" for
inference (or querying, to a lesser extent) seems like GIGO.
Quite - the bit of my proposal where the categories would get created on
Wikidata from scratch as a synthesis of the needs of the editing community.
Implicitly, these would have clear semantics about the correctitude of
their usage governed by something analogous to how Wikidata's community are
managing the roll-out of statements on the system. In terms of tools to
prevent this becoming an issue, Wikidata's nature means we could easily
make sure that the domain of a category would be limited (e.g. "Fluids"
maps to "substances", not "instances of substances").
* Readers can
search, querying across categories regardless of whether
they're implicit or explicit
* -> A search for the intersection of <People from the Netherlands> with
<Politicians> will effectively return results for <Politicians from the
Netherlands> (and the user doesn't need to know or care that this is an
extant or non-extant category)
A person who is originally from the Netherlands but moved to Germany
and became a politician there would be in <People from the
Netherlands> and <Politicians>, but maybe should not be in
<Politicians from the Netherlands> depending on how exactly you define
Indeed; I deliberately chose to use <Politicians from the Netherlands>
rather than <Politicians of the Netherlands> or <Politicians in the
Netherlands> which are distinct categories with entirely different
semantics, but you're right that semantics would need to be clear.
James D. Forrester
Product Manager, VisualEditor
Wikimedia Foundation, Inc.
jforrester(a)wikimedia.org | @jdforrester