Hi folks,
I would like to know if it is possible to retrieve from a wikipedia page only the main description and, if it is available, the image. For example I would like to get only the main description and the image of the U2 band from http://en.wikipedia.org/wiki/index.html?curid=52780 . How can I do this? I looked on the Wikipedia API (http://en.wikipedia.org/w/api.php ) but I haven't found nothing that feet my needs. It would be great if there is a web service that retrieve an XML/JSON object with this data
Hope to hear from you soon!
regards, Marco
marco tanzi schreef:
Hi folks,
I would like to know if it is possible to retrieve from a wikipedia page only the main description and, if it is available, the image. For example I would like to get only the main description and the image of the U2 band from http://en.wikipedia.org/wiki/index.html?curid=52780. How can I do this? I looked on the Wikipedia API (http://en.wikipedia.org/w/api.php) but I haven't found nothing that feet my needs. It would be great if there is a web service that retrieve an XML/JSON object with this data
What could work is:
http://en.wikipedia.org/w/api.php?action=query&titles=U2&prop=revisi...
This simply returns everything before the first == Section == header.
As for the image: there's no such thing as *the* image belonging to a page. You can get all images on a page, of course, using
http://en.wikipedia.org/w/api.php?action=query&titles=U2&prop=images
(Note that these two queries can also be combined by using prop=revisions|images .)
Roan Kattouw (Catrope)
Thanks Roan for your fast replay,
my problem now is that I have only the curid and not the title of the page... How can I get the page title from a curid?
regards, Marco
marco tanzi schreef:
Hi folks,
I would like to know if it is possible to retrieve from a wikipedia page only the main description and, if it is available, the image. For example I would like to get only the main description and the image of the U2 band from http://en.wikipedia.org/wiki/index.html?curid=52780 . How can I do this? I looked on the Wikipedia API (http://en.wikipedia.org/w/api.php) but I haven't found nothing that feet my needs. It would be great if there is a web service that retrieve an XML/JSON object with this data
What could work is:
http://en.wikipedia.org/w/api.php?action=query&titles=U2&prop=revisi...
This simply returns everything before the first == Section == header.
As for the image: there's no such thing as *the* image belonging to a page. You can get all images on a page, of course, using
http://en.wikipedia.org/w/api.php?action=query&titles=U2&prop=images
(Note that these two queries can also be combined by using prop=revisions|images .)
Roan Kattouw (Catrope)
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Hi Roan,
thanks for your reply, was really helpful!!!
May I ask you another suggestion?
If I search on wikipedia about 'George Bush', I will see the following results:
George Bush may refer to:
* George Bush (biblical scholar), 19th century biblical scholar and preacher * George Washington Bush (1779–1863), first black settler in what is now the state of Washington * .........
how can I fetch the list of short descriptions using the API?
kind regards Marco
marco tanzi schreef:
Thanks Roan for your fast replay,
my problem now is that I have only the curid and not the title of the page... How can I get the page title from a curid?
Instead of titles=U2 , use pageids=52780 in those queries I mentioned.
Roan Kattouw (Catrope)
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
marco tanzi schreef:
Hi Roan,
thanks for your reply, was really helpful!!!
May I ask you another suggestion?
If I search on wikipedia about 'George Bush', I will see the following results:
George Bush may refer to:
* George Bush (biblical scholar), 19th century biblical scholar and
preacher * George Washington Bush (1779–1863), first black settler in what is now the state of Washington * .........
how can I fetch the list of short descriptions using the API?
kind regards Marco
The page [[George Bush]] is a disamiguation page, that is, it's a page that links to pages about different George Bushes. There's nothing special about a disambiguation page or its descriptions, so you'll just have to get its content with:
api.php?action=query&titles=George_Bush&prop=revisions&rvprop=content
Roan Kattouw (Catrope)
Hi Roan,
first I want to say you thank one more time for yours helpful suggestions.
Now I can fetch the main description of the topic i am looking for using:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop...
I received a correct json object, but the content of the revision is full of data I do not need like {{....}} [[...]] ecc. I would like to get only the clean description, only text (like the one visible from the wiki website). I'm trying to find a regex pattern but without good results...
How can I do that? there is some parser to clean my json object?
hope someone could help me out|!
kind regards Marco
mediawiki-api@lists.wikimedia.org