On Mit, 2003-01-29 at 06:05, Takuya Murata wrote:
Oh, I see. But are you only talking about searching? I
don't
think MySQL can be bottleneck of simple displaying pages.
Displaying a Wikipedia page is far from simple. The pages are stored in
the table as wikitext, not as HTML, and are rendered dynamically.
Rendering a Wikipedia article requires, for example, to look up all the
links contained in it and determine if the pages exist or not.
Without simply extending server capacity, is it
possible to
sustain the increase of traffic?
Actually, we had a *decrease* in traffic in the last month due to the
Google hiccups. We should be able to cope with much higher traffic if we
optimize our queries. Note that pure bandwidth is not a problem; the
database tarball downloads are very fast. Ask Brion for the server specs
and be impressed.
I am not sure I understand you. Whatever algorithm is
too
stupid, the increase of capacity always makes the site fast,
1) If the algorithm doesn't scale linearly, expanding your server
linearly will gain you almost nothing. If an increase in edits by a
factor 10 will lead to a 100 decrease in performace, you need to stop
buying hardware and look at what you're doing wrong. Of course, if you
keep optimizing and you don't gain anything, you need to think about
buying hardware, but this is not the case here.
2) You cite Google as an example of a huge centralized database, which
is untrue. Google is actually an example of a highly distributed
database, using >10,000 Linux servers. When the index is updated, it
takes a while for the index updates to populate to all those servers,
even though they're essentially in the same building and using high
bandwidth connections. The "Google dance":
http://www.wikipedia.org/wiki/Google
With a distributed architecture that is actually hosted in different
locations with different bandwidth, and with updates not coming from the
inside but from the outside *all the time*, in addition to our highly
complex queries, this kind of sync operations would be virtually
impossible to do properly, unless you move much stuff to a central
server, in which case you gain very little by distributing.
Trust me, wiki is *very* hard to decentralize. It's a nice idea, but it
will take years until it happens. You need an architecture like Freenet
(
http://freenetproject.org ), only scalable (which Freenet is not),
plus SQL-like query support.
Regards,
Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS -
http://www.berlios.de