[Mediawiki-l] Wikitext grammar

BPJ bpj at melroch.se
Mon Aug 9 18:02:51 UTC 2010


2010-08-07 20:24, lmhelp skrev:
>
>> So why not use the "real" parser?
>
> Exactly. Where can it be found, please?
>
> Thanks and all the best,
> --
> Lmhelp

fetch the html from wikipedia.org with something like wget
(playing nicely and using delays!) and then extract the
first <p> element with something which parses the html
into a tree. I've done that using perl with HTML::Tree.
Generally a regular expression like /<p\b.+?<\/p>/ might
do the extraction just as well, but cheaper and faster
if you,re just after the first <p> element!
Really cheap, I know!

/BP



More information about the MediaWiki-l mailing list