[Labs-l] Separating "content" categorizes from organizational categories

Jaime Crespo jcrespo at wikimedia.org
Tue Mar 15 08:00:03 UTC 2016


> 2016-03-14 22:46 GMT+00:00 Huji Lee <huji.huji at gmail.com>:
>>
>> An article titled [[Willie Trombone]] can be in two categories:
>> [[Category:People born in Neverhood]] and [[Category:Articles missing date
>> of birth]]
>>
>> In my opinion, the first one is a content category, as it is categorizing
>> the subject of the article. The latter, in contrast, is an organizational
>> category, as it is about the article itself and not about its subject.
>>
>> Is there a reliable way to distinguish these categories from each other,
>> e.g. using Wikidata and its hierarchies? I am not looking for perfection;
>> anything that does most of the job is good enough, especially if it is
>> something that can be queries via SQL.

On some communities, they solved this division by creating
pseudo-namespaces (not real namespaces, only title prefixes, such as
on eswiki [0]). Categories that are project-only start with the name
"Wikipedia:".

Not such a thing on enwiki, but you could do a recursive search on
categories such as Wikipedia maintenance [1]. Recursive queries are
not an option in MySQL, but can be emulated on code. The number of
subcategories should not be too large there, and probably can be even
pre-computed.

[0]<https://es.wikipedia.org/wiki/Categor%C3%ADa:Wikipedia:Mantenimiento>
[1]<https://en.wikipedia.org/wiki/Category:Wikipedia_maintenance>



More information about the Labs-l mailing list