Nick Jenkins wrote:
Can I please ask a unusual question: Is there some way to get MediaWiki to render a page from just the wiki source, and no database?
If you do a bunch of hacking into the internals, probably...
and without having to delve deeply into the internals of how MediaWiki works.
d'oh! ;)
I'm happy for all links to either be edit links or not (i.e. no reason to check whether a page exists or not). Also happy for any templates or other inclusions in wiki strings to be ignored, so it's a straight string transformation, and so that there should be no reason to connect to or use a database.
The reason I'm asking is I'm trying to scope out if it's possible to pass in the wiki source of an article, convert it to HTML, and then HTML validate that output, so as to find invalid HTML and mis-uses of HTML in wiki articles.
You can't guarantee that without doing template inclusions, though as for instance template inclusions can be embedded in HTML attribute values (yyyuuuccckkkkk!) and mistakes there are a likely source of borken HTML output.
Strictly speaking, any invalid HTML in output is a bug in the wiki; we _try_ to guarantee validating XHTML 1.0 transitional output, but we don't try as hard as we should. In particular there's no validation of attribute _values_ for those attributes which are allowed, some of which may have some limits on the DTD, and the check for proper nesting is extremely bad and ugly code which was hacked together a couple years ago and no one's dared to replace it. For now we run parsed output through the HTML Tidy library for an additional cleanup pass on Wikipedia; this is optional in MediaWiki and requires either the tidy executable or the PHP extension form.
-- brion vibber (brion @ pobox.com)