On 12 August 2012 09:58, Pavan Kumar pavankumarstudent@yahoo.com wrote:
Hi, I know wikipedia has the information if lot of actors Is there a way I can write a Mediawiki API to get the list of all actors in wikipedia and there corresponding links... I am asking for a search in wikipedia :-)))) is it available...
pl let me know
Well it's not as simple as you were probably hoping.
There is only a "MediaWiki API" that is the same on all the projects related to Wikipedia. This means it is totally unaware of the content. This means it knows nothing about Encyclopedias or how one might be formatted to fit in a MediaWiki wiki.
It only knows about the generic concepts and operations of a wiki.
This means you can't query for "all actors".
Fortunately one of the generic concepts of a wiki is that of "categories" and you can use the API to investigate what's in a category:
http://www.mediawiki.org/wiki/API:Categorymembers
You can query the Category:Actors like so: http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&...
This will return not just the pages about actors though, but anything people have put in the category, some of which may surprise you. Here's an example in XML:
<?xml version="1.0"?> <api> <query> <categorymembers> <cm pageid="35149376" ns="0" title="Tany Youne" /> <cm pageid="35963938" ns="0" title="Donna Wyant" /> <cm pageid="35778902" ns="14" title="Category:Actors by award" /> </categorymembers> </query> <query-continue> <categorymembers cmcontinue="..." /> </query-continue> </api>
Most importantly in a category you will usually find subcategories, so to build up a list of all the actors in Wikipedia you will need to descend recursively into some of those subcategories.
The biggest problem here is that counter to expectations of many, the categories in Wikipedia are not arranged into a strict hierarchy. There is not a "tree" of categories but "graph" connected in all kinds of whimsical ways.
So you will need to analyse yourself the subcategories of the actor category and make a hard-codes list of which to include, or you will need to design some clever heuristics to decide which subcategory paths to follow and which to ignore.
Some of this work may or may not have already been turned into an "ontology" that you can query using SPARQL in DBpedia, which is data mined from Wikipedia:
Good luck. Andrew Dunbar (hippietrail)