On Dec 12, 2003, at 20:22, Lars Aronsson wrote:
You can still store a copy of each text (cur) in the
database, and use
that for searching.
We do this anyway, since InnoDB tables don't support fulltext search,
and the search text has to be pre-processed to strip markup and fix up
encoding. Further decoupling would not really change the search system.
As far as atomic operations; if we were to use the filesystem to store
page text, the safest, simplest thing would be to name the files based
on the unique revision identifiers (which we don't have yet due to the
way the cur/old split works). The textual content of a given revision
should never change (save perhaps being compressed), and the metadata
(title, user name, comment) can still be easily worked with in the
database. Rename and deletion operations would not actually have to
touch the files.
The trick would be making sure that the numbers really stay unique; you
need to add a row to the table to get its ID number back, and then
ensure that the data actually gets written to the filesystem before
anyone asks for it.
Relying on both a database and a filesystem for persistent storages
means you need to maintain two ways to connect if you're going to have
multiple web servers, of course. Also, this leaves us with a couple
million relatively small files, which the filesystem ought to be tuned
for (small block size).
The vast amount of I/O over the database
client-server socket is when every page view has to read the blob from
the database to the (PHP) application, through the socket where the
bandwidth might be limited.
The majority of page views should be cache hits which pull the output
HTML data from the local filesystem, checking the DB just enough for
cache validation. (I don't have exact figures at the moment, but we
should probably check.) We could make better use of filesystem or
memory-based caching than we do and decrease the DB load further.
-- brion vibber (brion @
pobox.com)