Brion, you're missing my point. I agree with you entirely that things need to "get done". My suggestion to have a public discussion was to find out which things we can get done reasonably quickly (because, realistically, we all have other things to do) with substantial impact; to figure out the server situation, which features should be disabled, who might contribute which piece of code etc. If we can sort these things out in the next few days via mail, fine. I'm no IRC junkie. But we need to implement at least some reasonable emergency fixes, and think about a mid term strategy.
As for code, this is one thing I'd like to talk about: If we have the Nupedia Foundation set up, we can collect donations. It would be stupid not to use some of that money for funding development. I don't care who is funded, but I think this could greatly speed things up. If we can't get the NF set up reasonably quickly, we should collect donations regardless, tax-deductible or not.
- Page viewing is still kinda inefficient. Rendering everything on every
view is not so good...
Why? It's just PHP stuff. Our bottleneck is the database server. Fetching stuff from CUR and converting it into HTML is not an issue. 20 pass parser? Add another zero. Until I see evidence that this has any impact on performance, I don't care. Turn off link checking and all pages are rendered lightning fast.
What would be useful is to maintain a persistent (over several sessions) index of all existing and non existing pages in memory for the link checking. A file on a ramdisk maybe? I think it would be worth giving it a try at least, and not a lot of work.
We need to tell which pages are or aren't cacheable (not a diff, not a special page, not a history revision, not a user with really weird display options -- or on the other hand, maybe we _could_ cache those, if only we can distinguish them), we need to be able to generate and save the cached material appropriately, we need to make sure it's invalidated properly, and we need to be able to do mass invalidation when, for instance, the software is upgraded. Cached pages may be kept in files, rather than the database.
Wasted effort, IMHO. Cache improvements have added little measurable performance benefits, and there are many, many different situations to test here (different browsers, different browser cache settings etc.). Meanwhile, our real bottlenecks (search, special pages, out of control queries) remain in place.
- The page saving code is rather inefficient, particularly with how it
deals with the link tables (and potentially buggy -- sometimes pages end up with their link table entries missing, possibly due to the system timing out between the main save chunk and the link table update). If someone would like to work on this, it would be very welcome. Nothing that needs to be _discussed_, it just needs to be _done_ and changes checked in.
I doubt that a *relatively* rare activity like that makes much of an impact, but I'll be happy to be proven wrong. Bugs are annoying, but I'm writing this for one reason: we need to make Wikipedia usable again on a regular basis. There are countless small problems that need to be fixed. This is not the issue here.
- Various special pages are so slow they've been disabled. Most of them
could be made much more efficient with better queries and/or by maintaining summary tables. Some remaining ones are also pretty inefficient, like the Watchlist. Someone needs to look into these and make the necessary adjustments to the code.
Caching special pages seems like a reasonable approach. Watchlists could definitely be improved, haven't seen a good way to do this yet, though. It could be done on page save, but with a much-watched page, this again would add severe drain, with possibly no overall benefit. Improve the SQL and indexes? Maybe, but I'm no SQL guru.
- Can MySQL 4 handle fulltext searches better under load? Is boolean
mode faster or slower? Someone needs to test this (Lee has a test rig with mysql4 already, but as far as I know hasn't tested the fulltext search with boolean mode yet), and if it's good news, we need to make an upgrade a high priority.
Sounds good to me. If safe enough, we should update in any case; it is my understanding that MySQL4 has support for subqueries which could, if we know what we're doing, potentially be used to write significantly more effective queries.
Regards,
Erik