Parsing article text outside MediaWiki? - Wikitech-l

25 Jan 2006

I'm looking to parse the text contents ("old_text" field) of articles in
the
"text" table from a pages_articles.xml dump into MySQL, without using
MediaWiki.

How exactly would I do this? I tried extracting functions from parser.php but
this proved fairly complicated given the dependencies involved (with includes
and global variables); and I'm not sure how the functions work anyway.

What I need is something that can ignore stuff like Image and WikiQuote code,
replace the headers with h1/h2/h3/etc. tags, and maybe parse external and
internal links.

In other words, I'm looking for something that takes the text field and parses
it to be able to display in basic readable HTML output without any bells and
whistles.

Is there any function like this available anywhere? Or some guide on the specs
of such a function?

Saqib