Am Freitag, 27. Februar 2004 14:42 schrieb Erik Moeller:
Ivo-
I think the bottleneck is using a database
mangement system (here mysql).
So my question is: What is the advantage of using it? As I understand the
toppic dbms is usefull for very structured redundant data.
This is an eternal debate. You may want to look at MoinMoin, which stores
all its pages in plaintext and still manages to be very powerful:
Woudnt it make more sence to store the pages in plaintext and just indexing
them in the db? What is the advantage of storing it in the db?
Using a database allows us to run a set of highly
complex queries to
examine the relationships between articles and the metadata about
articles. In fact we can even examine the relationship between articles
and users (watchlist), or between the metadata of different articles
(redirect status in "What links here").
This also can be done without a database. Of course in that case you have to
implement the needet datastructure and can not use sql-queries. This takes
more time to develop, but you are free to code whatever datastructure you
might think is the best. The try to emulate it by a db-structure can fail and
even if you it does not fail you are slower since you have all the
overhead-calculations the database produce since it wants to be as generell
as possible.
All that would only be possible using ugly hacks
without a database -- you
You only get ugly hacks if you don't think about your code before implementing
something.
would effectively have to build your own mini-DB to
store this metadata,
and most text-based wikis do this. They get extremely slow when it comes
to very basic stuff such as listing backlinks or doing a fulltext search.
Thats just a matter of using the right datastructure, independently from using
a db or not.
Not on a small wiki, but on one of the size of
Wikipedia. I have imported
large datasets into TWiki etc., and many operations became practically
unusable.
But the reason ist not that the use no database they just don't have the right
datastructure for the operations.
The MediaWiki FT search has locking issues. It's
very fast for a normal
wiki, but for a huge one like Wikipedia it tends to get in a deadlock
state. That's why we have to disable it on Wikipedia. Some other queries
could also be made faster. Many of our queries have been optimized now and
every single one should eventually return results with a response time of
milliseconds.
There I have a question. Do we use SQL for queries or do we use directly the
relational operations (join, semijoin, projection etc [relations algebra])?
Is it possible to use that?
It is a fallacy that a database makes things more
complicated. It makes
I dont say this. It makes coding easier. But easy code is always slow...
them much easier because you don't have to worry
where to store something
before implementing it. How do you want to store your user prefs, the
That is the only real argument for using a dbms. But than we have to ask
ourself: Do we want something slow and easy or something fast but more
complicated.
redirect info, the recent changes table, the page
history, always keeping
quick access in mind? Usemod, Twiki etc. all use ugly hacks for that kind
of stuff.
Jeah, the features are implemented fast and in consequence running slow.
In short, a DB is faster, cleaner, and more
convenient. We benefit from
the thousands of man hours that the MySQL developers have invested in
building a scalable, fast, powerful multi-user database instead of having
to reinvent the wheel whenever we want to do something that goes beyond
"show me that file".
Reinventing the wheel is a danger that always exists. But blowing up by trying
to be too general is anotherone. I still think we do not really need all the
features of the db. I think we need very few and this could be implemented
directly and thus be much faster... But on the other hand it may is to late
to change that again...
--Ivo Köthnig