Nick Jenkins wrote:
Can I please ask a unusual question: Is there some way
to get
MediaWiki to render a page from just the wiki source, and no database?
If you do a bunch of hacking into the internals, probably...
and without having to delve deeply into the internals
of how MediaWiki
works.
d'oh! ;)
I'm happy for all links to either be edit links or
not (i.e. no reason
to check whether a page exists or not). Also happy for any templates
or other inclusions in wiki strings to be ignored, so it's a straight
string transformation, and so that there should be no reason to
connect to or use a database.
The reason I'm asking is I'm trying to scope out if it's possible to
pass in the wiki source of an article, convert it to HTML, and then
HTML validate that output, so as to find invalid HTML and mis-uses of
HTML in wiki articles.
You can't guarantee that without doing template inclusions, though as
for instance template inclusions can be embedded in HTML attribute
values (yyyuuuccckkkkk!) and mistakes there are a likely source of
borken HTML output.
Strictly speaking, any invalid HTML in output is a bug in the wiki; we
_try_ to guarantee validating XHTML 1.0 transitional output, but we
don't try as hard as we should. In particular there's no validation of
attribute _values_ for those attributes which are allowed, some of which
may have some limits on the DTD, and the check for proper nesting is
extremely bad and ugly code which was hacked together a couple years ago
and no one's dared to replace it. For now we run parsed output through
the HTML Tidy library for an additional cleanup pass on Wikipedia; this
is optional in MediaWiki and requires either the tidy executable or the
PHP extension form.
-- brion vibber (brion @
pobox.com)