Re: [Wikitech-l] Thinking about Phase IV

22 Feb 2003

      On Fri, 21 Feb 2003, Lee Daniel Crocker wrote:
...
Can we estimate how long we'll be able to limp along with
  the current code, adding performance hacks and hardware to
  keep us going?  If it's a year, that will give us certain
  opportunities and guide some choices; if it's only a month
  or two, that will constrain a lot of those choices.
The immediate crisis is over. Now that we're on the track of proper
indexing, performance should no longer significantly degrade with
increased size.
The special pages that are currently disabled just need to be rewritten to
have and use appropriate indexes or summary tables. Performance hacks?
Sure.
We're planning to move the database and web server to two separate
machines, which should help quite a bit as well, and there's still a lot
of optimization to be done in the common path. (Caching HTML would save
trips to the database as well as rendering time, though it's not the
biggest priority yet.)
I'd feel quite confident giving us another year with the current codebase.
...

Suggestion 1: The test suite.

AMEN BROTHER!
...
I'd even like to revisit the decision of using a database
  at all.  After all, a good file system like ReiserFS (or to
  a lesser extent, ext3) is itself a pretty well-optimized
  database for storing pieces of free-form text, and there are
  good tools available for text indexing, etc. Plus it's
  easier to maintain and port.
Really though, our text _isn't_ free-form. It's tagged with metadata that
either needs to be tucked into a filesystem (non-portably) or a structured
file format (XML?). And now we have to worry about locking multiple files
for consistency, which likely means separate lockfiles... and we quickly
find we've reinvented the database, just using more file descriptors. ;)
The great advantage of the database though is the ability to perform
ad-hoc queries. Obviously our regular operations have to be optimized, and
special queries have to be set up such that they don't bog down the
general functioning of the wiki, but in general the coolest thing about
the phase II/III PediaWiki is the SQL query ability: savvy (and
responsible) users can cook up their own queries to do useful little
things such as:
* looking up new user accounts who haven't yet been greeted
* checking for "orphan" talk pages
* most frequent contributors
etc, without downloading a 4-gigabyte database to their home machines or
begging the developers to write a special-purpose script.
Now, it may well be that it would make sense to store the rendered HTML in
files which could be rapidly spit out on request, but that's supplementary
to what the database does for us.
...
For
  example, we could probably make it easier to cache page
  requests if we made most of the article content HTML not
  dependent on skin by tagging elements well and using CSS
  appropriately.
You mean, like we had in phase II before you rewrote it? ;)
-- brion vibber (brion @ pobox.com)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Thinking about Phase IV