On Mit, 2003-01-29 at 06:05, Takuya Murata wrote:
Oh, I see. But are you only talking about searching? I don't think MySQL can be bottleneck of simple displaying pages.
Displaying a Wikipedia page is far from simple. The pages are stored in the table as wikitext, not as HTML, and are rendered dynamically. Rendering a Wikipedia article requires, for example, to look up all the links contained in it and determine if the pages exist or not.
Without simply extending server capacity, is it possible to sustain the increase of traffic?
Actually, we had a *decrease* in traffic in the last month due to the Google hiccups. We should be able to cope with much higher traffic if we optimize our queries. Note that pure bandwidth is not a problem; the database tarball downloads are very fast. Ask Brion for the server specs and be impressed.
I am not sure I understand you. Whatever algorithm is too stupid, the increase of capacity always makes the site fast,
1) If the algorithm doesn't scale linearly, expanding your server linearly will gain you almost nothing. If an increase in edits by a factor 10 will lead to a 100 decrease in performace, you need to stop buying hardware and look at what you're doing wrong. Of course, if you keep optimizing and you don't gain anything, you need to think about buying hardware, but this is not the case here.
2) You cite Google as an example of a huge centralized database, which is untrue. Google is actually an example of a highly distributed database, using >10,000 Linux servers. When the index is updated, it takes a while for the index updates to populate to all those servers, even though they're essentially in the same building and using high bandwidth connections. The "Google dance": http://www.wikipedia.org/wiki/Google
With a distributed architecture that is actually hosted in different locations with different bandwidth, and with updates not coming from the inside but from the outside *all the time*, in addition to our highly complex queries, this kind of sync operations would be virtually impossible to do properly, unless you move much stuff to a central server, in which case you gain very little by distributing.
Trust me, wiki is *very* hard to decentralize. It's a nice idea, but it will take years until it happens. You need an architecture like Freenet ( http://freenetproject.org ), only scalable (which Freenet is not), plus SQL-like query support.
Regards,
Erik