On Fri, Aug 6, 2010 at 10:18 AM, Léa Massiot <lea.massiot(a)ign.fr> wrote:
Are you sure this will be able to extract the
introductory paragraph (only) which is not in any
section... (because it is not trivial).
There is only one example I could find at
http://code.pediapress.com/wiki/wiki/mwlib
... which is not so easy to understand by the way...
Cheers,
--
Lmhelp
mwlib is a fairly full featured parser for wikitext. It is not documented,
but by using the dir and help python commands you can easily navigate its
methods. It creates a parse tree from which you can reconstruct the plain
text. Once you have the plain text extracting paragraphs is straightforward.