On Fri, Aug 6, 2010 at 10:06 AM, Léa Massiot lea.massiot@ign.fr wrote:
A colleague told me about that... so we had a look at it. Unfortunately, abstracts are not correct most of the time...
Example (in French):
<title>Wikipédia : Arabie saoudite</title> <url>http://fr.wikipedia.org/wiki/Arabie_saoudite</url> <abstract>| lien_villes=Villes d'Arabie saoudite</abstract> <links> [...] -----------------------------------------------------------------
Cheers,
Lmhelp
In that case I recommend the parser in mwlib for fast extraction of the text in the first paragraph: http://code.pediapress.com/wiki/wiki/mwlib