Hello
I have in my DB urls of wikipedia pages for many capital cities around the world (For example: http://en.wikipedia.org/wiki/London)
I would like to retrieve the short description of the city that all the cities have before the "Contents" table. I need it in plain text of HTML.
I should I do it?
I have tried using it like that: http://en.wikipedia.org/w/index.php?action=render&title=London
But is there a way to retrieve only the short description?
If you use python, you could try Andreas Eisele's wikipedia text extraction code: http://github.com/AndreasEisele/wpTextExtractor
It's based on mwlib, and uses a python module that I'm working on called wikipydia that acts as a simple python interface to the Wikipedia API.
Chris Callison-Burch Johns Hopkins University
On Feb 3, 2010, at 3:22 PM, IgalSt wrote:
Hello
I have in my DB urls of wikipedia pages for many capital cities around the world (For example: http://en.wikipedia.org/wiki/London)
I would like to retrieve the short description of the city that all the cities have before the "Contents" table. I need it in plain text of HTML.
I should I do it?
I have tried using it like that: http://en.wikipedia.org/w/index.php?action=render&title=London
But is there a way to retrieve only the short description?
-- View this message in context: http://old.nabble.com/Get-a-description-about-cities-tp27442797p27442797.htm... Sent from the WikiMedia API mailing list archive at Nabble.com.
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
2010/2/3 IgalSt xoxsoft@gmail.com:
Hello
I have in my DB urls of wikipedia pages for many capital cities around the world (For example: http://en.wikipedia.org/wiki/London)
I would like to retrieve the short description of the city that all the cities have before the "Contents" table. I need it in plain text of HTML.
I should I do it?
I have tried using it like that: http://en.wikipedia.org/w/index.php?action=render&title=London
But is there a way to retrieve only the short description?
You can get the wikitext of the text before the first section with:
http://en.wikipedia.org/w/index.php?action=raw&title=London%C2%A7ion=
Getting just the HTML of a section is not supported, although you can easily look for <h2> (or another one of h1-h6) and cut off there.
Roan Kattouw (Catrope)
mediawiki-api@lists.wikimedia.org