On Mon, 2005-05-09 at 12:43 -0700, Brion Vibber wrote:
A possible middle road is to rewrite the core wiki
engine to a separate
daemon, and adapt the existing PHP user interface to call into it for
much of the backend work that actually touches data.
That's the approach I'd favor. The existing PHP code represents a lot
of good user-interface work for which PHP is perfectly suited. The
underlying stuff could easily be split up into multiple daemons (say,
one for wikitext, one for images, one for equations,...) that could
feed the PHP front-end.
This would also allow incremental development, since each of the daemons
could be written and attached indicidually without disturbing the rest
of the codebase.
Our biggest, nastiest burden is with internal
communications: the
database changes too much. We have to wait on things getting sent,
received, applied, and copied around, and lagging databases send
everything into the toilet fast.
Yep. That's why I think the bulk of the text should be stored in a
plain filesystem, where those problems are already well-known and
solved, for the most part. That would reduce the database to just
metadata, which would be much smaller and more efficient.
...Better distinguishing between requests
that need to be absolutely current and requests where it's ok to load a
page that's 30 seconds out of date could make much better use of the
slave servers.
Yes! There's only one tricky part for which we may have to consider
creative implementations: I tried as much as possible to take style
markup (especially skin-specific) out of the rendered wikitext to
allow it to be cached, but there's one case that's still a problem:
red links (i.e., links to non-existent pages). Users shouted at me
that this was a sine-qua-non feature, and so I had to leave it in.
But it makes caching rendered wikitext hard, and slows down rendering.
One alternative is to simply tolerate them being out of date for the
life of the cache. Another is to possibly update the cache in some
cheaper way. Yet another is to optimize the hell out of discovering
the simple existence of a page, so that it's not a bottleneck in
rendering (say, by having a daemon that keeps a one-bit field for
every page using a spell-checker data structure)
If there's a really strong reason for a rewrite,
then we should start
planning a (cleanly implemented, *compatible*, complete) rewrite, making
use of the existing parser tests and other tests in existence and yet to
be written to make sure it'll really be able to take over. If we do, we
should probably target the end of this year or early next year for
taking it live. (No need to rush though; if we're going to rewrite, the
point is to plan ahead and do it right.)
I'm not totally convinced that there is a really strong reason, though.
I'm all for your method, and I agree it's not an urgent need. But I
think we can slip the timeline even more. The existing codebase will
eventually be a liability, but I think we can throw hardware at it for
a year or two. Also, if we go the route of making independent daemons
linked into the existing UI code, we don't have to deploy all at once.
We could, for example, make and deploy the math daemon as a proof-of-
concept, work out bugs with that, then do the others afterward.
Another thing to consider: at least some of the wikipedia-driven
development will be totally unnecessary for mediawiki as a general-
purpose open source project. We may want to decouple those projects
at some point.
--
Lee Daniel Crocker <lee at piclab.com> <http://www.piclab.com/lee/>
<http://creativecommons.org/licenses/publicdomain/>