here's ,my take
split webservers and db servers.
1 master db, only write query's come to this db
as needed add slave servers, these only do read query's
add webservers as needed
this is the easiest way to go about it using mysql's built in replication
feature.. it makes the most sense too in my book....
the only thing needed to make wikipedia work like this is a db connection
library that looks at an SQL statement and routes it to where it's supposed
to be.. i wrote a db library for mysql in php once that did all this, its
pretty cool if i may say so.. if you are interested i'll send you the code,
its part of a much bigger project, but i figure any decent php programmer
should be able to grasp the concept of it... it might not be super efficient
cause i programmed this when i didn't know many tricks and was kinda still
learning, but it works.. oh well .. if anyone is interested let me know
Lightning
----- Original Message -----
From: "Nick Hill" <nick(a)nickhill.co.uk>
To: <wikitech-l(a)wikipedia.org>
Sent: Monday, November 25, 2002 4:53 PM
Subject: [Wikitech-l] Long term plans for scalability
I believe Wikipedia is being held back in terms of how
many people can use
it and how it can grow, through architectural constraints.
The current architecture of one machine taking the entire burden of all
searches, updates and web page delivery inherently limits the rate at
which
Wikipedia can grow.
In order for Wikipedia to grow, it needs an architecture which can easily
devolve work to other servers. A main database is still required to
enforce
administrative policy and maintain database
consistency.
Work to improve the speed of the database and reduce lag will, in the long
run, only be of very limited benefit and, perhaps, reduce the amount of
lag
users experience for a few days or weeks.
A method of easily implementing mirror servers with live, real-time
updates
is required. Each mirror server should cater for all
the functionality
users expect from Wikipedia except for taking care of form submissions of
updates, which should be forwarded to the master wiki server.
The main database server should be released from the burden of serving web
pages and concentrate on running administrative code, processing and
posting database updates.
The update system can be achieved by either:
1) the main server creating SQL files of incremental changes to
be emailed to mirror servers, signed with a key pair, sequentially
numbered to ensure they are automatically processed in order this
way, the server can run asynchronously with the mirrors which is
better for reliability of the server. The server will not need to
wait for connection responses from the mirror and updates will be
cached in the mail system in the event that the mirror server be
unavailable. (The main server will then only need to create one
email per update. The mail system infrastructure will take care of
sending the data to each mirror. In fact, a system such as pipermail
used on this list would solve the problem wonderfully. Mirror admins
simply subscribe to the list to get all updates sent to their machine
and can manually download updates they are missing from the list!)
Or
2) by the master server opening a connection directly to the SQL daemon
on each remote machine. In which case the server will need to track what
the mirrors have and have not received updates and need to wait for
time-out on non-operational mirrors)(this system may open exploits on the
server via the sql interface).
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)wikipedia.org
http://www.wikipedia.org/mailman/listinfo/wikitech-l