> > Can I please ask a unusual question: Is there some way to get
> > MediaWiki to render a page from just the wiki source, and no database?
>
> If you do a bunch of hacking into the internals, probably...
>
> > and without having to delve deeply into the internals of how MediaWiki
> > works.
>
> d'oh! ;)
I actually did something like this with MediaWiki 1.4 (gross hacks on
the internals to get a Wiki-string to HTML string conversion without
requiring database access), but I did not enjoy the experience (in
particular the use of globals, plus the dependency tree in which a
file would include another file or two, which would include another,
etc., plus getting the initialization order right was non-trivial),
and honestly it never worked reliably (from memory it worked sometimes
but not always, almost certainly due to something I stuffed up with a
hack).
I was hoping it might have changed in 1.5 :-(
In case anyone ever feels tempted to repeat the experiment, I started
from an RC of MediaWiki 1.4, and needed these files:
./DefaultSettings.php
./languages
./languages/LanguageUtf8.php
./languages/Language.php
./languages/Names.php
./languages/LanguageEn.php
./Parser.php
./includes
./includes/User.php
./includes/Utf8Case.php
./includes/WatchedItem.php
./includes/Skin.php
./includes/SkinStandard.php
./includes/Image.php
./includes/Feed.php
./includes/RecentChange.php
./includes/SkinPHPTal.php
./includes/LogPage.php
./includes/GlobalFunctions.php
./includes/DatabaseFunctions.php
./includes/UpdateClasses.php
./includes/Database.php
./includes/CacheManager.php
./includes/Title.php
./includes/UserUpdate.php
./includes/ViewCountUpdate.php
./includes/SiteStatsUpdate.php
./includes/LinksUpdate.php
./includes/SearchUpdate.php
./includes/UserTalkUpdate.php
./includes/SquidUpdate.php
./includes/Namespace.php
./includes/MagicWord.php
./includes/LinkCache.php
./includes/Article.php
I also modified some of the above (sorry, I can't easily provide a
diff - I just remember every time I ran into an error with an
undefined variable either bypassing it or hard coding it or making it
use a conditional isset or adding another include - and I kept
repeating this until eventually the errors stopped). (From the
last-modified dates on the files, the modified/hacked files were
probably: Parser.php, Title.php, Skin.php, Language.php,
SkinPHPTal.php, Namespace.php, GlobalFunctions.php, and
DatabaseFunctions.php)
To tie it all together I needed a file like this which would
initialize things in the right order, supply dummy functions to cut
out things I didn't need, include required files, and so forth:
ludo:~nickj/wiki/HTML-validation# cat master.php
<?php
// report any errors at all
error_reporting (E_ERROR | E_WARNING | E_PARSE | E_CORE_ERROR);
/*
** @desc: FakeMemCachedClient imitates the API of memcached-client v. 0.1.2.
** It acts as a memcached server with no RAM, that is, all objects are
** cleared the moment they are set. All set operations succeed and all
** get operations return null.
*/
class FakeMemCachedClient {
function add ($key, $val, $exp = 0) { return true; }
function decr ($key, $amt=1) { return null; }
function delete ($key, $time = 0) { return false; }
function disconnect_all () { }
function enable_compress ($enable) { }
function forget_dead_hosts () { }
function get ($key) { return null; }
function get_multi ($keys) { return array_pad(array(),
count($keys), null); }
function incr ($key, $amt=1) { return null; }
function replace ($key, $value, $exp=0) { return false; }
function run_command ($sock, $cmd) { return null; }
function set ($key, $value, $exp=0){ return true; }
function set_compress_threshold ($thresh){ }
function set_debug ($dbg) { }
function set_servers ($list) { }
}
// we don't want any kind of profiling
function wfProfileIn( $fn = '' ) {}
function wfProfileOut( $fn = '' ) {}
function wfGetProfilingOutput( $s, $e ) {}
function wfProfileClose() {}
// because Debian woody doesn't have a high enough version of LIBXML
to enable XML, which means we have no 'utf8_encode'...
function utf8_encode($x) { return $x; }
// define required for include files to work OK.
define("MEDIAWIKI",true);
// initialize the IP global.
$IP = "";
define( "DB_READ", -1 ); # Read from the slave (or only server)
define( "DB_LAST", -3 ); # Whatever database was used last
// include default settings
require_once ("DefaultSettings.php");
// initialize $wgMemc global (needed by languages).
$wgMemc = new FakeMemCachedClient();
// include MagicWord, needed for the Parser.php to work OK. Should
come before language.php to avoid errors.
$wgMagicWords = array();
require_once("includes/MagicWord.php");
// include Namespace, needed for the Parser.php to work OK. Should
come before language.php to avoid errors.
require_once("includes/Namespace.php");
// Setup languages, needed to get us the $wgLang global, which we need.
require_once("languages/Language.php");
require_once("languages/LanguageUtf8.php");
$wgLangClass = 'LanguageUtf8';
$wgLang = new LanguageUtf8();
require_once("includes/GlobalFunctions.php");
// include Skin, needed for the User.php to work OK.
require_once("includes/Skin.php");
// include User, needed for the Parser.php to work OK.
require_once("includes/User.php");
// include LinkCache, needed for the Parser.php to work OK.
require_once("includes/LinkCache.php");
$wgLinkCache = new LinkCache();
// include Article, needed for the Parser.php to work OK.
require_once("includes/Article.php");
// set up the parse the output
require_once("Parser.php");
$parserOptions = new ParserOptions();
$mParserOptions = $parserOptions->newFromUser( $temp = NULL );
// create a Parser object
$parser = new Parser();
// supply a blank title
$title = NULL;
// make up some text for test purposes
$text = "A [[test]] ''blah''";
// Generate some output, but as an object.
$parserOutput = $parser->parse( $text, $title, $mParserOptions );
// convert the output of the parser to a string.
$output = $parserOutput->mText;
print $output;
?>
> You can't guarantee that without doing template inclusions, though as
> for instance template inclusions can be embedded in HTML attribute
> values (yyyuuuccckkkkk!) and mistakes there are a likely source of
> borken HTML output.
That's true, but I just wanted something simple to catch most HTML
cock-ups, and was willing to accept a small percentage of
false-positives.
> For now we run parsed output through
> the HTML Tidy library for an additional cleanup pass on Wikipedia; this
> is optional in MediaWiki and requires either the tidy executable or the
> PHP extension form.
Ah, OK, interesting. I too was experimenting with using the PHP tidy
extension to do the above checking, but I wanted the errors and their
solutions in order to make a list of them (like the lists used in Wiki
Syntax) so that interested people could fix the data, whereas I
presume you folks just throw the errors away.
On one hand running the output through tidy as you currently do is
good because tidy can be updated as required to detect and fix new
errors, and because it means web browsers will get nice clean output
(but only if you're using MediaWiki to transform the wiki string into
HTML), and because tidy seems fairly quick, but on the other hand
maybe it's slightly bad because it's a run-time solution with added
overhead for something can be fixed once in the data.
However, that doesn't really matter from my perspective. Basically you
folks are already fixing this problem automatically, which means I
don't have to concern myself with this problem any more. To quote
Keith Packard: "this problem is now being fixed by my favourite person
- someone else!" :-)
All the best,
Nick.