New subject: deletion management

22 Feb 2003


      After getting back into wikiland catching up with wikipedia-l
was pretty easy, but catching up with the wikitech list took a
little longer.  It seems you guys have had interesting times
lately (in the Chinese curse sense).  Sorry I abandoned you,
but you guys do seem to have risen to the challenge.
Magnus did a great service by giving us code with features
that made Wikipedia usable and popular.  When that code bogged
down to the point where the wiki became nearly unusable, there
wasn't much time to sit down and properly architect and develop
a solution, so I just reorganized the existing architecture for
better performance and hacked all the code.  This got us over
the immediate crisis, but now my code is bogging down, and we
are having to remove useful features to keep performance up.
I think it's time for Phase IV.  We need to sit down and design
an architecture that will allow us to grow without constantly
putting out fires, and that can become a stable base for a fast,
reliable Wikipedia in years to come.  I'm now available and
equipped to help in this, but I thought I'd start out by asking
a few questions here and making a few suggestions.
* Question 1: How much time do we have?
Can we estimate how long we'll be able to limp along with
  the current code, adding performance hacks and hardware to
  keep us going?  If it's a year, that will give us certain
  opportunities and guide some choices; if it's only a month
  or two, that will constrain a lot of those choices.
* Suggestion 1: The test suite.
I think the most critical piece of code to develop right now
  is a comprehensive test suite.  This will enable lots of
  things.  For example, if we have a performance question, I
  can set up one set of wiki code on my test server, run the
  suite to get timing data, tweak the code, then run the suite
  again to get new timing.  The success of the suite will tell
  us if anything broke, and timing will tell us if we're on
  the right track.  This will be useful even during the
  limp-along with current code phase.  I have a three-machine
  network at home, with one machine I plan to dedicate 100% to
  wiki code testing, and my test server in San Antonio that we
  can use.  This will also allow us to safely refactor code.
  I'd like to use something like Latka for the suite (see
  http://jakarta.apache.org/commons/latka/index.html ).
* Question 2: How wedded are we to the current tools?
Apache/MySQL/PHP seems a good combo, and it probably would
  be possible to scale them up further, but there certainly
  are other options.  Also, are we willing to take chances on
  semi-production quality versions like Apache 2.X and MySQL 4.X?
  I'd even like to revisit the decision of using a database
  at all.  After all, a good file system like ReiserFS (or to
  a lesser extent, ext3) is itself a pretty well-optimized
  database for storing pieces of free-form text, and there are
  good tools available for text indexing, etc. Plus it's
  easier to maintain and port.
* Suggestion 2: Use the current code for testing features.
In re-architecting the codebase, we will almost certainly
  come to points where we think a minor feature change will
  make a big performance difference that won't hurt usability,
  or just features that we want to implement anyway.  For
  example, we could probably make it easier to cache page
  requests if we made most of the article content HTML not
  dependent on skin by tagging elements well and using CSS
  appropriately. Also, we probably want to eventually render
  valid XHTML.  I propose that while we are building the
  phase IV code, we add little features like this to the
  existing code to guage things like user reactions and
  visual impact.
Other suggestions/questions/answers humbly requested
(including "Are you nuts? Let's stick with Phase III!" if
you have that opinion).
-- 
Lee Daniel Crocker lee@piclab.com http://www.piclab.com/lee/
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC

Thinking about Phase IV