On Fri, 21 Feb 2003, Lee Daniel Crocker wrote:
Can we estimate how long we'll be able to limp along with the current code, adding performance hacks and hardware to keep us going? If it's a year, that will give us certain opportunities and guide some choices; if it's only a month or two, that will constrain a lot of those choices.
The immediate crisis is over. Now that we're on the track of proper indexing, performance should no longer significantly degrade with increased size.
The special pages that are currently disabled just need to be rewritten to have and use appropriate indexes or summary tables. Performance hacks? Sure.
We're planning to move the database and web server to two separate machines, which should help quite a bit as well, and there's still a lot of optimization to be done in the common path. (Caching HTML would save trips to the database as well as rendering time, though it's not the biggest priority yet.)
I'd feel quite confident giving us another year with the current codebase.
- Suggestion 1: The test suite.
AMEN BROTHER!
I'd even like to revisit the decision of using a database at all. After all, a good file system like ReiserFS (or to a lesser extent, ext3) is itself a pretty well-optimized database for storing pieces of free-form text, and there are good tools available for text indexing, etc. Plus it's easier to maintain and port.
Really though, our text _isn't_ free-form. It's tagged with metadata that either needs to be tucked into a filesystem (non-portably) or a structured file format (XML?). And now we have to worry about locking multiple files for consistency, which likely means separate lockfiles... and we quickly find we've reinvented the database, just using more file descriptors. ;)
The great advantage of the database though is the ability to perform ad-hoc queries. Obviously our regular operations have to be optimized, and special queries have to be set up such that they don't bog down the general functioning of the wiki, but in general the coolest thing about the phase II/III PediaWiki is the SQL query ability: savvy (and responsible) users can cook up their own queries to do useful little things such as:
* looking up new user accounts who haven't yet been greeted * checking for "orphan" talk pages * most frequent contributors
etc, without downloading a 4-gigabyte database to their home machines or begging the developers to write a special-purpose script.
Now, it may well be that it would make sense to store the rendered HTML in files which could be rapidly spit out on request, but that's supplementary to what the database does for us.
For example, we could probably make it easier to cache page requests if we made most of the article content HTML not dependent on skin by tagging elements well and using CSS appropriately.
You mean, like we had in phase II before you rewrote it? ;)
-- brion vibber (brion @ pobox.com)