Timwi wrote:
First of all, there is no reason to believe that
reading from one file
out of thousands is any faster than reading one record in the DB out of
I agree that this is how it should be, but I have seen some real
systems where file I/O bandwidth was considerably higher than database
I/O bandwidth, even after tuning all buffer sizes and kernel
parameters. I don't know if that is the case with Wikipedia's system
using PHP + MySQL in 2003, but it was the case with one system using
ASP/VB + MS SQL Server in 1999 and with another system using Java +
Oracle 9i in 2002. In these years I made my entire living from
telling people they should avoid blobs, and everybody was happy with
the results. The fact is that Wikipedia, using blob I/O extensively,
has been slow at times. This could of course be a coincidence.
Now, much of the suspicion in these cases was drawn to the
implementation of the ODBC/JDBC/equivalent drivers for the database
communication, which are not open source in the case of Oracle and
Microsoft, so there is reason to believe that MySQL + PHP would do
better than the others.
Secondly, it makes atomic transactions impossible. It
makes
backing up a consistent state of the database/file-system mix virtually
impossible. It is too difficult to move data around without
violating the consistency of the construct.
Correct, although not entirely relevant. Atomic transactions is
nothing you rely on with MySQL anyway, data seldom moves, and for some
uses you can move the filename and let the file stay where it is.
I agree that these are drawbacks, but they are not necessarily worse
than the I/O bandwidth limitation.
"Database backup" is in general more complex than "file backup", and
the smaller the database, the easier it goes.
And lastly, it makes creating a full-text search index
quite a bit
harder.
You can still store a copy of each text (cur) in the database, and use
that for searching. The vast amount of I/O over the database
client-server socket is when every page view has to read the blob from
the database to the (PHP) application, through the socket where the
bandwidth might be limited.
Convinced yet? ;-)
No, sorry, only a real test would convince me. Right now Wikipedia is
fast, so nobody is going to test this.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se/