Hi guys,
I am writing a ruby application to retrieve the wikipedia data: the main description and the main image (the one on the box in the left side). As parameter I have the cruid of the wiki page, so I call the wiki API to get the data, now start the problems:
- Main description: I call the following link to retrieve the json object with the data of the main description
http://en.wikipedia.org/w/api.php?action=query&pageids=52780&prop=re... the object is well formed but the text is on wikipedia format.
How is possible to convert it into a plain text? (without {{ }}, [ ] and <ref>) is it possible to get a text plain directly ?
- Main img (if present) my second problem is to find the right image to show after a research
I have tried to fetch the main image of a wiki page using the following link: http://en.wikipedia.org/w/api.php?action=query&pageids=52780&prop=im...
but this object that i receive contains all the images of the page without specify where this images are used.
how is possible to know exactly the image used on the left box of the wiki page?
anyone can help me?
Kind regards Marco
marco tanzi wrote:
Hi guys,
I am writing a ruby application to retrieve the wikipedia data: the main description and the main image (the one on the box in the left side). As parameter I have the cruid of the wiki page, so I call the wiki API to get the data, now start the problems:
You're probably better asking for http://en.wikipedia.org/w/index.php?title=Foo&action=render or http://en.wikipedia.org/w/index.php?oldid=52780&action=render
Treat the first image as the one you want. You can get plain text by removing anything between < > (and undoing entitites).
hi Platonides,
Many thanks for your fast response
about the text, what I am looking for is a json response with inside only the main description, and not all the page without css,
about the image I tried to use the first as the main one, but the service retrieve the images on alphabetic order, from time to time the first image is very far from the topic you search. (Es if I look for Italy I'd like to see the fleg of the nation and not the face of some politician)
regards Marco
On 22 Feb 2009, at 16:05, Platonides wrote:
marco tanzi wrote:
Hi guys,
I am writing a ruby application to retrieve the wikipedia data: the main description and the main image (the one on the box in the left side). As parameter I have the cruid of the wiki page, so I call the wiki API to get the data, now start the problems:
You're probably better asking for http://en.wikipedia.org/w/index.php?title=Foo&action=render or http://en.wikipedia.org/w/index.php?oldid=52780&action=render
Treat the first image as the one you want. You can get plain text by removing anything between < > (and undoing entitites).
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
marco tanzi wrote:
hi Platonides,
Many thanks for your fast response
about the text, what I am looking for is a json response with inside only the main description, and not all the page without css,
It's the rendered page content. The CSS is simply not being applied because that goes on the (missing) <head>
I think that's the best you can get.
about the image I tried to use the first as the main one, but the service retrieve the images on alphabetic order, from time to time the first image is very far from the topic you search. (Es if I look for Italy I'd like to see the fleg of the nation and not the face of some politician)
I meant on the rendered page.
mediawiki-api@lists.wikimedia.org