On 5/9/05, Brion Vibber brion@pobox.com wrote:
There are a couple possibilities here. One is to rewrite MediaWiki in some other language entirely, say Python or C# or whathaveyou.
...
In particular, PHP tends to impose an architecture where each request is served by an entirely new script invocation: you have to build any information up from scratch on each hit, and sharing things like localization tables between invocations is kind of hard. In another language it might be easier to run MediaWiki as a standlone server, which can keep shared data in memory and use it transparently from each thread.
Blaming the language is rarely a productive way to fix such a problem. In particular, Python usually uses the same initialize-and-run model you complain about PHP using, and mod_mono doesn't seem to be widely used. (And I assume Wikimedia has no interest in switching to Windows servers.)
Most reasonably mature languages are fast enough that the performance bottlenecks are usually in user code. And at least one of the PHP problems--lack of a JIT--will be solved when PHP-on-Parrot is available. (There's at least one project to do this; the interpreter itself is already quite fast and has JITs for several platforms, although much of it still has to be written.)
The other is to keep refactoring the PHP codebase (and it's been much changed since you left it, Lee) and, optionally, rewrite particular hotspots in another language.
I like the idea of porting hotspots, but keep in mind that we want people to be able to use this even if they don't have access to a C compiler.
A possible middle road is to rewrite the core wiki engine to a separate daemon, and adapt the existing PHP user interface to call into it for much of the backend work that actually touches data.
I like this idea, although it carries its own costs. In particular, communicating between processes has inherent overhead; we'd have to be reasonably sure that the caching would outweigh that. Also note that IPC and process-management mechanisms tend to vary across operating systems; we might lose Windows support, for example.
One advantage of this split is that we could rewrite one of the components in another language if we want, without affecting the other one. If the front end becomes little more than a shell around a daemon, we could provide a version written in C as an Apache module, which is about as fast as it gets.
On the back end, Perl 6 is specced to have one of the most powerful pattern-matching engines ever shipped with a language; it should be able to eat wikicode for breakfast. It's also designed to allow easy interoperability with other languages, so just the wikicode parser could be written in it while the rest is left as PHP. Implementation is moving quickly, with the backend Parrot Grammar Engine in progress and the Pugs (Perl 6 in Haskell) compiler just starting on the syntax. (And, of course, you'd have a very happy Perl hacker here.)