Here in this link http://download.wikimedia.org/enwiki/latest/  u can find the file called "enwiki-latest-abstract.xml" witch has all the abstracts of the wikipedia in english.

More information about this u can find in http://en.wikipedia.org/wiki/Wikipedia_database


Daniel Hasan Dalip

On Wed, Jan 27, 2010 at 3:34 PM, aditya srinivas <usaditya86@gmail.com> wrote:

I am writing a Java program to extract the abstract of the wikipedia page given the title of the wikipedia page. I have done some research and found out that the abstract with be in rvsection=0

 So for example if I want the abstract of 'Eiffel Tower" wiki page then I am querying using the api in the following way.


and parse the XML data which we get and take the wikitext in the tag <rev xml:space="preserve">  which represents the abstract of the wikipedia page. But this wiki text also contains the infobox data which I do not need. I would like to know if there is anyway in which I can remove the infobox data and get only the wikitext related to the page's abstract Or if there is any alternative method by which I can get the abstract of the page directly.

Looking forward to your help.

Thanks in Advance
Aditya Uppu 

Mediawiki-api mailing list