Re: [Wikitech-l] Re: Proposed table restructuring

13 Dec 2003

On Dec 12, 2003, at 20:22, Lars Aronsson wrote:
...
  You can still store a copy of each text (cur) in the
database, and use
 that for searching. 
We do this anyway, since InnoDB tables don't support fulltext search, 
and the search text has to be pre-processed to strip markup and fix up 
encoding. Further decoupling would not really change the search system.

As far as atomic operations; if we were to use the filesystem to store 
page text, the safest, simplest thing would be to name the files based 
on the unique revision identifiers (which we don't have yet due to the 
way the cur/old split works). The textual content of a given revision 
should never change (save perhaps being compressed), and the metadata 
(title, user name, comment) can still be easily worked with in the 
database. Rename and deletion operations would not actually have to 
touch the files.

The trick would be making sure that the numbers really stay unique; you 
need to add a row to the table to get its ID number back, and then 
ensure that the data actually gets written to the filesystem before 
anyone asks for it.

Relying on both a database and a filesystem for persistent storages 
means you need to maintain two ways to connect if you're going to have 
multiple web servers, of course. Also, this leaves us with a couple 
million relatively small files, which the filesystem ought to be tuned 
for (small block size).

...
  The vast amount of I/O over the database
 client-server socket is when every page view has to read the blob from
 the database to the (PHP) application, through the socket where the
 bandwidth might be limited. 
The majority of page views should be cache hits which pull the output 
HTML data from the local filesystem, checking the DB just enough for 
cache validation. (I don't have exact figures at the moment, but we 
should probably check.) We could make better use of filesystem or 
memory-based caching than we do and decrease the DB load further.

-- brion vibber (brion @ pobox.com)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Proposed table restructuring