Brion, you're missing my point. I agree with you entirely that things need
to "get done". My suggestion to have a public discussion was to find out
which things we can get done reasonably quickly (because, realistically,
we all have other things to do) with substantial impact; to figure out the
server situation, which features should be disabled, who might contribute
which piece of code etc. If we can sort these things out in the next few
days via mail, fine. I'm no IRC junkie. But we need to implement at least
some reasonable emergency fixes, and think about a mid term strategy.
As for code, this is one thing I'd like to talk about: If we have the
Nupedia Foundation set up, we can collect donations. It would be stupid
not to use some of that money for funding development. I don't care who is
funded, but I think this could greatly speed things up. If we can't get
the NF set up reasonably quickly, we should collect donations regardless,
tax-deductible or not.
* Page viewing is still kinda inefficient. Rendering
everything on every
view is not so good...
Why? It's just PHP stuff. Our bottleneck is the database server. Fetching
stuff from CUR and converting it into HTML is not an issue. 20 pass
parser? Add another zero. Until I see evidence that this has any impact on
performance, I don't care. Turn off link checking and all pages are
rendered lightning fast.
What would be useful is to maintain a persistent (over several sessions)
index of all existing and non existing pages in memory for the link
checking. A file on a ramdisk maybe? I think it would be worth giving it a
try at least, and not a lot of work.
We need to tell which pages are or aren't
cacheable (not a diff, not a
special page, not a history revision, not a user with really weird
display options -- or on the other hand, maybe we _could_ cache those,
if only we can distinguish them), we need to be able to generate and
save the cached material appropriately, we need to make sure it's
invalidated properly, and we need to be able to do mass invalidation
when, for instance, the software is upgraded. Cached pages may be kept
in files, rather than the database.
Wasted effort, IMHO. Cache improvements have added little measurable
performance benefits, and there are many, many different situations to
test here (different browsers, different browser cache settings etc.).
Meanwhile, our real bottlenecks (search, special pages, out of control
queries) remain in place.
* The page saving code is rather inefficient,
particularly with how it
deals with the link tables (and potentially buggy -- sometimes pages end
up with their link table entries missing, possibly due to the system
timing out between the main save chunk and the link table update). If
someone would like to work on this, it would be very welcome. Nothing
that needs to be _discussed_, it just needs to be _done_ and changes
checked in.
I doubt that a *relatively* rare activity like that makes much of an
impact, but I'll be happy to be proven wrong. Bugs are annoying, but I'm
writing this for one reason: we need to make Wikipedia usable again on a
regular basis. There are countless small problems that need to be fixed.
This is not the issue here.
* Various special pages are so slow they've been
disabled. Most of them
could be made much more efficient with better queries and/or by
maintaining summary tables. Some remaining ones are also pretty
inefficient, like the Watchlist. Someone needs to look into these and
make the necessary adjustments to the code.
Caching special pages seems like a reasonable approach. Watchlists could
definitely be improved, haven't seen a good way to do this yet, though. It
could be done on page save, but with a much-watched page, this again would
add severe drain, with possibly no overall benefit. Improve the SQL and
indexes? Maybe, but I'm no SQL guru.
* Can MySQL 4 handle fulltext searches better under
load? Is boolean
mode faster or slower? Someone needs to test this (Lee has a test rig
with mysql4 already, but as far as I know hasn't tested the fulltext
search with boolean mode yet), and if it's good news, we need to make an
upgrade a high priority.
Sounds good to me. If safe enough, we should update in any case; it is my
understanding that MySQL4 has support for subqueries which could, if we
know what we're doing, potentially be used to write significantly more
effective queries.
Regards,
Erik