Can I please ask a unusual question: Is there some way to get MediaWiki to render a page from just the wiki source, and no database?
If you do a bunch of hacking into the internals, probably...
and without having to delve deeply into the internals of how MediaWiki works.
d'oh! ;)
I actually did something like this with MediaWiki 1.4 (gross hacks on the internals to get a Wiki-string to HTML string conversion without requiring database access), but I did not enjoy the experience (in particular the use of globals, plus the dependency tree in which a file would include another file or two, which would include another, etc., plus getting the initialization order right was non-trivial), and honestly it never worked reliably (from memory it worked sometimes but not always, almost certainly due to something I stuffed up with a hack).
I was hoping it might have changed in 1.5 :-(
In case anyone ever feels tempted to repeat the experiment, I started from an RC of MediaWiki 1.4, and needed these files: ./DefaultSettings.php ./languages ./languages/LanguageUtf8.php ./languages/Language.php ./languages/Names.php ./languages/LanguageEn.php ./Parser.php ./includes ./includes/User.php ./includes/Utf8Case.php ./includes/WatchedItem.php ./includes/Skin.php ./includes/SkinStandard.php ./includes/Image.php ./includes/Feed.php ./includes/RecentChange.php ./includes/SkinPHPTal.php ./includes/LogPage.php ./includes/GlobalFunctions.php ./includes/DatabaseFunctions.php ./includes/UpdateClasses.php ./includes/Database.php ./includes/CacheManager.php ./includes/Title.php ./includes/UserUpdate.php ./includes/ViewCountUpdate.php ./includes/SiteStatsUpdate.php ./includes/LinksUpdate.php ./includes/SearchUpdate.php ./includes/UserTalkUpdate.php ./includes/SquidUpdate.php ./includes/Namespace.php ./includes/MagicWord.php ./includes/LinkCache.php ./includes/Article.php
I also modified some of the above (sorry, I can't easily provide a diff - I just remember every time I ran into an error with an undefined variable either bypassing it or hard coding it or making it use a conditional isset or adding another include - and I kept repeating this until eventually the errors stopped). (From the last-modified dates on the files, the modified/hacked files were probably: Parser.php, Title.php, Skin.php, Language.php, SkinPHPTal.php, Namespace.php, GlobalFunctions.php, and DatabaseFunctions.php)
To tie it all together I needed a file like this which would initialize things in the right order, supply dummy functions to cut out things I didn't need, include required files, and so forth: ludo:~nickj/wiki/HTML-validation# cat master.php <?php
// report any errors at all error_reporting (E_ERROR | E_WARNING | E_PARSE | E_CORE_ERROR);
/* ** @desc: FakeMemCachedClient imitates the API of memcached-client v. 0.1.2. ** It acts as a memcached server with no RAM, that is, all objects are ** cleared the moment they are set. All set operations succeed and all ** get operations return null. */
class FakeMemCachedClient { function add ($key, $val, $exp = 0) { return true; } function decr ($key, $amt=1) { return null; } function delete ($key, $time = 0) { return false; } function disconnect_all () { } function enable_compress ($enable) { } function forget_dead_hosts () { } function get ($key) { return null; } function get_multi ($keys) { return array_pad(array(), count($keys), null); } function incr ($key, $amt=1) { return null; } function replace ($key, $value, $exp=0) { return false; } function run_command ($sock, $cmd) { return null; } function set ($key, $value, $exp=0){ return true; } function set_compress_threshold ($thresh){ } function set_debug ($dbg) { } function set_servers ($list) { } }
// we don't want any kind of profiling function wfProfileIn( $fn = '' ) {} function wfProfileOut( $fn = '' ) {} function wfGetProfilingOutput( $s, $e ) {} function wfProfileClose() {}
// because Debian woody doesn't have a high enough version of LIBXML to enable XML, which means we have no 'utf8_encode'... function utf8_encode($x) { return $x; }
// define required for include files to work OK. define("MEDIAWIKI",true);
// initialize the IP global. $IP = "";
define( "DB_READ", -1 ); # Read from the slave (or only server) define( "DB_LAST", -3 ); # Whatever database was used last
// include default settings require_once ("DefaultSettings.php");
// initialize $wgMemc global (needed by languages). $wgMemc = new FakeMemCachedClient();
// include MagicWord, needed for the Parser.php to work OK. Should come before language.php to avoid errors. $wgMagicWords = array(); require_once("includes/MagicWord.php");
// include Namespace, needed for the Parser.php to work OK. Should come before language.php to avoid errors. require_once("includes/Namespace.php");
// Setup languages, needed to get us the $wgLang global, which we need. require_once("languages/Language.php"); require_once("languages/LanguageUtf8.php"); $wgLangClass = 'LanguageUtf8'; $wgLang = new LanguageUtf8();
require_once("includes/GlobalFunctions.php");
// include Skin, needed for the User.php to work OK. require_once("includes/Skin.php");
// include User, needed for the Parser.php to work OK. require_once("includes/User.php");
// include LinkCache, needed for the Parser.php to work OK. require_once("includes/LinkCache.php"); $wgLinkCache = new LinkCache();
// include Article, needed for the Parser.php to work OK. require_once("includes/Article.php");
// set up the parse the output require_once("Parser.php"); $parserOptions = new ParserOptions(); $mParserOptions = $parserOptions->newFromUser( $temp = NULL );
// create a Parser object $parser = new Parser();
// supply a blank title $title = NULL;
// make up some text for test purposes $text = "A [[test]] ''blah''";
// Generate some output, but as an object. $parserOutput = $parser->parse( $text, $title, $mParserOptions ); // convert the output of the parser to a string. $output = $parserOutput->mText;
print $output;
?>
You can't guarantee that without doing template inclusions, though as for instance template inclusions can be embedded in HTML attribute values (yyyuuuccckkkkk!) and mistakes there are a likely source of borken HTML output.
That's true, but I just wanted something simple to catch most HTML cock-ups, and was willing to accept a small percentage of false-positives.
For now we run parsed output through the HTML Tidy library for an additional cleanup pass on Wikipedia; this is optional in MediaWiki and requires either the tidy executable or the PHP extension form.
Ah, OK, interesting. I too was experimenting with using the PHP tidy extension to do the above checking, but I wanted the errors and their solutions in order to make a list of them (like the lists used in Wiki Syntax) so that interested people could fix the data, whereas I presume you folks just throw the errors away.
On one hand running the output through tidy as you currently do is good because tidy can be updated as required to detect and fix new errors, and because it means web browsers will get nice clean output (but only if you're using MediaWiki to transform the wiki string into HTML), and because tidy seems fairly quick, but on the other hand maybe it's slightly bad because it's a run-time solution with added overhead for something can be fixed once in the data.
However, that doesn't really matter from my perspective. Basically you folks are already fixing this problem automatically, which means I don't have to concern myself with this problem any more. To quote Keith Packard: "this problem is now being fixed by my favourite person - someone else!" :-)
All the best, Nick.
wikitech-l@lists.wikimedia.org