What's the best way to tell if an article is in a general category? Specifically, given a page title, I want to know if that page is (transitively) in Category:Song. For example, I can get
http://en.wikipedia.org/w/api.php?action=query&prop=categories&title...
which gives me (in part)
<page pageid="41953" ns="0" title="Bohemian Rhapsody"> <categories> <cl ns="14" title="Category:1975 singles" /> <cl ns="14" title="Category:1976 singles" /> <cl ns="14" title="Category:1991 singles" /> <cl ns="14" title="Category:1992 singles" /> <cl ns="14" title="Category:All articles with unsourced statements" /> <cl ns="14" title="Category:Articles with hAudio microformats" /> <cl ns="14" title="Category:Articles with unsourced statements from April 2010" /> <cl ns="14" title="Category:BRIT Award for British Single" /> <cl ns="14" title="Category:Christmas number-one singles in the United Kingdom" /> <cl ns="14" title="Category:Dutch Top 40 number-one singles" /> <cl ns="14" title="Category:Good articles" /> <cl ns="14" title="Category:Grammy Hall of Fame Award recipients" /> <cl ns="14" title="Category:Irish Singles Chart number-one singles" /> <cl ns="14" title="Category:Music videos directed by Bruce Gowers" /> <cl ns="14" title="Category:Number-one singles in Australia" /> <cl ns="14" title="Category:Number-one singles in New Zealand" /> <cl ns="14" title="Category:Number-one singles in Spain" /> <cl ns="14" title="Category:Parlophone singles" /> <cl ns="14" title="Category:Queen (band) songs" /> <cl ns="14" title="Category:RPM Top Singles number-one singles" /> <cl ns="14" title="Category:Rock ballads" /> <cl ns="14" title="Category:Singlechart" /> <cl ns="14" title="Category:Singlechart usages for Billboardhot100" /> <cl ns="14" title="Category:Singlechart usages for Dutch100" /> <cl ns="14" title="Category:Singlechart usages for New Zealand" /> <cl ns="14" title="Category:Singlechart usages for UK" /> <cl ns="14" title="Category:Songs written by Freddie Mercury" /> <cl ns="14" title="Category:UK Singles Chart number-one singles" /> <cl ns="14" title="Category:Use British English from August 2010" /> <cl ns="14" title="Category:Use dmy dates from August 2010" /> </categories> </page> What I really want to know is if the page is in Category:Song. In fact, it is, because of the category chain: 1975 singles -> 1975 songs -> Songs by year -> Songs, but finding that path involved some intuition about likely paths to explore. Lacking such intuition, is there any better mechanism in the API other than an exhaustive search through the category tree?
-- Roy Smith roy@panix.com
On Sun, Oct 10, 2010 at 07:50:43PM -0400, Roy Smith wrote:
Lacking such intuition, is there any better mechanism in the API other than an exhaustive search through the category tree?
No. Note that enwiki has many *un*intuitive category chains as well. For example, you might not expect [[Death Valley]] to be in [[:Category:Bodies of water]], but it is: * [[Death Valley]] is in [[:Category:Death Valley National Park]] * ... which is in [[:Category:Northern Mojave-Mono Lake region]] * ... which is in [[:Category:Watersheds of California]] * ... which is in [[:Category:Watersheds of the United States]] * ... which is in [[:Category:Watersheds]] * ... which is in [[:Category:Rivers]] * ... which is in [[:Category:Bodies of water]]
2010/10/11 Brad Jorsch b-jorsch@alum.northwestern.edu
On Sun, Oct 10, 2010 at 07:50:43PM -0400, Roy Smith wrote:
Lacking such intuition, is there any better mechanism in the API other than an exhaustive search through the category tree?
This is the output of my python prompt working by bot into it:wikisource (where: Poemi and Romanzi are two category names; list_in_cat is a simple python routine which returns the list of articles into a category; Georgiche is the name of a page, a poem):
"Georgiche" in list_in_cat("Poemi")
Getting [[Categoria:Poemi]]... True
"Georgiche" in list_in_cat("Romanzi")
Getting [[Categoria:Romanzi]]... False
Do you need something like this?
Alex
On Oct 11, 2010, at 2:04 AM, Alex Brollo wrote:
This is the output of my python prompt working by bot into it:wikisource (where: Poemi and Romanzi are two category names; list_in_cat is a simple python routine which returns the list of articles into a category; Georgiche is the name of a page, a poem):
"Georgiche" in list_in_cat("Poemi")
Getting [[Categoria:Poemi]]... True
"Georgiche" in list_in_cat("Romanzi")
Getting [[Categoria:Romanzi]]... False
Do you need something like this?
Yes, that's exactly what I'm looking for!
-- Roy Smith roy@panix.com
Roy Smith wrote:
What's the best way to tell if an article is in a general category? Specifically, given a page title, I want to know if that page is (transitively) in Category:Song. For example, I can get
http://en.wikipedia.org/w/api.php?action=query&prop=categories&title... http://en.wikipedia.org/w/api.php?action=query&prop=categories&titles=bohemian Rhapsody&cllimit=50
which gives me (in part)
(...)
What I really want to know is if the page is in Category:Song. In fact, it is, because of the category chain: 1975 singles -> 1975 songs -> Songs by year -> Songs, but finding that path involved some intuition about likely paths to explore. Lacking such intuition, is there any better mechanism in the API other than an exhaustive search through the category tree?
-- Roy Smith roy@panix.com mailto:roy@panix.com
You could implement some heuristic, like "follow categories which contain the target category name in them", which would lead you to Category:Songs via Category:Songs written by Freddie Mercury -> Category:Songs by songwriter -> Category:Songs But in order to determine a page is NOT in that category you will need an exhaustive search (skipping already traversed categories).
If you want to test all articles for the same category, it would be easier to build the graph or child categores of Category:Song, and you could then directly test for pertenence to any of them (or make the article list while you are browsing the categories).
mediawiki-api@lists.wikimedia.org