Hi Thad,
One quick red flag - I'm not sure how familiar you are with the category system, but automatically parsing it without sanity-checking can very quickly lead you into a minefield. There are a substantial number of category trees which seem reasonable at first, but rapidly go in unexpected directions.
For example, if you start at "Category:Road transport" on enwiki and go two categories deep, you get "Category:Parking facilities‎" - so far, so good - but you also get "Category:The Hitchhiker's Guide to the Galaxy", "Category:Songs about buses", and "Category:Cycling journalists‎".
There are a few pairs of categories which contain each other directly, and a much larger number which contain each other once you go a few levels of subcategories deep & so go in endless loops. And, of course, a clean category tree in French might be a mess in German, or vice versa.
None of this is to say "don't do it", but rather "don't expect it to be clean and tidy" :-)
Andrew.
On 7 October 2016 at 02:08, Thad Guidry thadguidry@gmail.com wrote:
Thanks Stas,
So, hmm...we'd have to build our own parser or something is what your saying ? Because Wikidata doesn't have those kinds of connections in its graph and also doesn't have a SPARQL service yet against the Wikipedia API:Categorymembers https://www.mediawiki.org/wiki/API:Categorymembers to deduce those subcategories, right ?
How difficult is it for someone to create a service like the LabelService , but instead using the WP Categorymembers API ? Or do you have some other ideas ?
On Thu, Oct 6, 2016 at 6:10 PM Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Can it all be done in SPARQL against some services that already expose WP subcategories given a specific category ? Or is there an API that does this already ? other tools that might expose WP categories ?
I don't think subcategory relationship is not recorded in Wikidata. E.g. https://www.wikidata.org/wiki/Q7361750 contains https://www.wikidata.org/wiki/Q14436424 but neither have any indication of that. The problem I guess is that category hierarchy is different on all wikis, so it's hard to have one property that expresses it on Wikidata.
You could do through "subclass of" and "category's main topic" but not sure that'd capture all. E.g.: http://tinyurl.com/h7qpcdn but that only captures one subcategory, since other items don't have the same hierarchy in Wikidata. -- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata