David A. Desrosiers wrote:
The other is to
keep refactoring the PHP codebase (and it's been
much changed since you left it, Lee) and, optionally, rewrite
particular hotspots in another language.
The downside of this are for people like myself who run an
external (though not public) version of mediawiki exclusively for use
in taking parts of it to convert to other formats (in my case, mobile
and handheld formats). If you change the core language mediawiki is
driven by, you further burdon external contributors and supporters of
mediawiki as a whole.
..unless you're talking about a rewrite _exclusively_ for use
on the Wikipedia/etc. servers, and not for the main
SF.net project as
distributed to the community. I suspect you're not talking about this
approach because that means maintaining two separate (and gradually
diverging) codebases.
MediaWiki is primarily targeted at Wikipedia and Wikimedia's other
projects (and other similar large-scale sites with people running their
own servers), secondarily at people running local instances to work with
data from our sites, and only incidentally at anyone else.
If to serve our primary target users we have to do something that cuts
out the guy running in a hyper-limited cheap hosting account, sorry but
we may have to do that. Someone running their own installation on their
own box should always be able to obtain the necessary tools, however;
we're committed to always being able to run on a pure free software stack.
The old 'pure PHP' MediaWiki would continue to exist even if we go in a
different direction, and it could be maintained separately for that user
segment if there's interest.
In particular,
PHP tends to impose an architecture where each
request is served by an entirely new script invocation: you have to
build any information up from scratch on each hit, and sharing
things like localization tables between invocations is kind of hard.
(Below comments are paraphrased from a conversation I just had
about an hour ago with Rasmus Lerdorf):
Isn't this exactly what ICU[1] was developed to solve? ICU
automatically sticks requests in shared memory in order to
optimize itself across different processes on the same server.
I was primarily thinking of the localized user interface messages there,
but yes the Unicode normalization tables also need to be loaded when
they're needed (though just from source, as they're not user-editable!)
I'm not entirely sure what you're getting at, but we do have an
experimental PHP extension for using ICU to do Unicode normalization. It
needs some more thorough testing before we take it live on our own
servers, though. (When I last tried it it failed completely, returning
empty strings for everything. This may have been an ICU library version
mismatch, I haven't had a chance to fiddle with it again.)
To be truly scalable, nothing should prevent
subsequent
requests to be handled by different physical web servers.
Subsequent requests are virtually always handled by different physical
web servers, and we would always expect this to be so. Server-local
retained data thus either needs to be for things that don't change (such
as Unicode normalization tables!) or things that can easily be updated
when necessary (such as caches of the localized UI messages).
Otherwise we have and use the cluster-wide memcached cloud; this
involves some network latency and serialization/deserialization.
And no, it is not hard to use shared memory from PHP.
PHP has a shared memory extension, but IIRC it basically entails copying
a binary string into and out of a shared memory segment, and uses
serializing/deserializing to store arrays and objects. It works, sure,
but if we're _trying_ to avoid constantly copying around large chunks of
data that's only a limited help over what we're already doing.
Compare this with a multithreaded Java or C# app which can simply refer
to the live object or array in memory, in a synchronization block if
necessary.
-- brion vibber (brion @
pobox.com)