[Mediawiki-api] parse wikipedia

Roan Kattouw roan.kattouw at home.nl
Sun Feb 22 19:40:01 UTC 2009


marco tanzi schreef:
> Hi folks,
> 
> I am trying to work with the wikipedia API and i am having some little  
> problems :-)
> 
> I can fetch the main description of the topic i am looking for using:
> 
> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=0&format=json&pageids=wiki_id
> 
> I received a correct json object, but the content of the revision is  
> full of data I do not need like {{....}} [[...]] ecc. I would like to  
> get only the clean description, only text (like the one visible from  
> the wiki website).
> 
> How can I do that? there is some parser to clean my json object?
> 
> hope someone could help me out|!

There's nothing cut out for you AFAIK. You can either get the wikitext 
content (which is what you're doing now) or the HTML version through 
action=parse. Your best bet would probably be to handle the {{}} and 
[[]] stuff yourself using regexes or something.

Roan Kattouw (Catrope)



More information about the Mediawiki-api mailing list