Hi Taku,
we *know* that the site is slow. And we *know* that it's
not because our
server is too small. Our problems are database/lock
related, and putting
stuff on an even bigger server will not help much when
dealing with O(n^x)
problems. What we need to figure out is:
- When are our tables/rows locked and why (this behavior
has changed
drastically with the recent update to InnoDB).
- When are our queries using indices and when are they not,
and why (MySQL
index behavior can be very hard to predict).
Solving these two problems should make Wikipedia very fast.
If we cannot
optimize some queries, we need to think about making them
simpler, or
caching them. (Also note that MySQL now supports
subqueries, which we
don't use yet.) We are dealing with *particular* queries in
*particular*
situations that make Wikipedia slow.
Oh, I see. But are you only talking about searching? I don't think MySQL can be bottleneck of simple displaying pages. Without simply extending server capacity, is it possible to sustain the increase of traffic? (Fortunettely or unfortunettley? the wikipedia seems still to grow.)
Not practical. Too many queries require access to a single
centralized
article database, even an index alone won't suffice. Think
of stuff like
Most wanted, Orphaned pages etc. Besides, it won't make
things any faster
because our problem is not a too small server.
I am not sure I understand you. Whatever algorithm is too stupid, the increase of capacity always makes the site fast, if not . Think of brute-force algorithm. I am not talking about adding just 2 or 3 more servers but possibly hundereds of servers. Maybe I am wrong because I am still not sure how to implement my idea.
Surely we can't have the server that Google or Amazon has. The strength of wikipedia is its democratic structure. Why don't we employ it for hosting too?
Maybe my idea is not practical. Then can you tell me how so?