Hi folks,
I am trying to work with the wikipedia API and i am having some little problems :-)
I can fetch the main description of the topic i am looking for using:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop...
I received a correct json object, but the content of the revision is full of data I do not need like {{....}} [[...]] ecc. I would like to get only the clean description, only text (like the one visible from the wiki website).
How can I do that? there is some parser to clean my json object?
hope someone could help me out|!
kind regards Marco
marco tanzi schreef:
Hi folks,
I am trying to work with the wikipedia API and i am having some little problems :-)
I can fetch the main description of the topic i am looking for using:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop...
I received a correct json object, but the content of the revision is full of data I do not need like {{....}} [[...]] ecc. I would like to get only the clean description, only text (like the one visible from the wiki website).
How can I do that? there is some parser to clean my json object?
hope someone could help me out|!
There's nothing cut out for you AFAIK. You can either get the wikitext content (which is what you're doing now) or the HTML version through action=parse. Your best bet would probably be to handle the {{}} and [[]] stuff yourself using regexes or something.
Roan Kattouw (Catrope)
On Fri, Feb 20, 2009 at 6:45 PM, marco tanzi tanzi.marco@gmail.com wrote:
I received a correct json object, but the content of the revision is full of data I do not need like {{....}} [[...]] ecc. I would like to get only the clean description, only text (like the one visible from the wiki website).
Run the parsed text (action=parse) through an HTML parser that strips all the tags.
Bryan
it would be really nice if we could get html output from the api query ... this would avoid issuing doing dozens of action parse requests separately.
It apperas to be mentioned pretty regularly... does anyone know if a bug to that end has been filed.. I will plop one in there if none exists (did not find any with in my quick search)
--michael
Bryan Tong Minh wrote:
On Fri, Feb 20, 2009 at 6:45 PM, marco tanzi tanzi.marco@gmail.com wrote:
I received a correct json object, but the content of the revision is full of data I do not need like {{....}} [[...]] ecc. I would like to get only the clean description, only text (like the one visible from the wiki website).
Run the parsed text (action=parse) through an HTML parser that strips all the tags.
Bryan
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
mediawiki-api@lists.wikimedia.org