New system ideas - Wikitech-l

1 Jan 2004


      I throught I would throw some ideas of configurations for new wikipedia
systems:
Assuming we have a fast and reliable infrastructure for wikipedia to
operate on, I would hope and expect many more people to benefit from the
gret project. The new hardware configuration needs to have redundancy
built in and reliability.
The DNS system is, by nature, distributed and designed to have many
systems adding redundancy to the resolution service.
A single ip address will always be a single point of failure as the
routers leading up to the physical ip destination will take some time to
propogate a new physical destination for the ip address.
It therefore makes sense to have redundancy switching at the DNS level
and having systems located at different network locations offering
equivalent service.
This brings issues of updates and authentication into the question. The
slave machine cannot authenticate or take database updates. A mechanism
is needed to automatically nominate a machine as master or slave. The
DNS system and each machine would respond to this.
Each wiki server, when booted, will, by default, be a slave. Each wiki
server will periodically request master status from the arbitration
server. The arbitration server can have an algorithm thus:
a)Each wiki server requests master status every 5 minutes.
b)The arbitration/master DNS server checks whether the wiki server
making the request was the last server to have the request granted. If
it is, the request is granted again.
c)If the server making the request is other than the wiki server who had
the request granted last, then if the last grant was >10 minutes ago,
the request is granted. Otherwise, the request is denied.
d) If the master wiki server makes a request and does not receive a
grant, it demotes itself to slave. (Including if it receives no answer
at all)
A specific host name is used for all updates. The 'master hostname'.
After each grant, the ip address of the master hostname is compared with
the ip address of the machine which just received a grant. If it
differs, the ip address of the master hostname is changed. This change
is pushed to the slave DNS servers using ordinary BIND8 protocol. The
DNS TTL for the domain name of the master wiki server would be set to
300 seconds. If a master went down for any reason, everyone should see
the alternative master within 20 minutes.
If the arbitration server failed, all wikis would remain available for
read access.
Redundancy switching:
Exactly the same system can take care of system redundancy. The master
server takes each request for master as an indication that the wiki
server is ready to handle queries.
To complement a master hostname is a 'slave hostname'. The slave
hostname resolves to the ip address of any machine which is ready to
handle a query. The DNS server may have multiple IP address entries for
the slave hostname. The query load will be spread evenly between all
machines whose ip address is registered against the slave hostname using
round-robin.
If a machine fails to request master status for 10 minutes, it's ip
address entry is de-registered from the slave hostname, all DNS servers
are updated using normal BIND protocol. If a machine goes off-line,
after 15 minutes, no more requests should be sent to it until it is
working again.
This system provides a totally automatic redundancy switching, master
arbitration and master/slave database selection.
--
Server requirements for fine grained databases:
The database containining the text of each article is fine grained. Each
article text being a few K in size. The cost of putting this in memory
would be small. The benefits of avoiding hard disk head movements large.
Hard drive head movements are expensive; both in terms of I/O time (and
therefore server performace) and mechanical wear and tear/ reliability.
As the number of articles increases, the number of seeks across a disk
surface increases. By copying the fine-grained database to ramdisk when
the machine boots, and using the ramdisk image for all queries, much
load is removed from the hard drive. System performance should improve
(at least when first powered on) by an order of magnitude. The ramdisk
database must be replicated to another database located on the physical
hard disk 1) So that when the machine boots, a database image is
available to load into ramdisk and 2) In order to keep a more solid copy
than the ephemeral ramdisk image.
Dual 64 bit Opteron mainboards are available with 8 DIMM sockets capable
of taing 16Gb. eg:
http://www.tyan.com/products/html/thunderk8w_spec.html
If the system will be using >4Gb memory, moving to 64bit is a good idea.
4Gb is an addressable limit for 32 bit.
Mainboard $432:
http://shopper.cnet.com/Tyan_Thunder_K8W_S2885ANRF___mainboard___extended_AT...
Memory throughput is probably more important than CPU Mhz.
2xAMD Opteron 240 @$220 each =$440
8x1Gb Registered ECC $300 each
http://www.crucial.com/store/listModule.asp?module=DDR+PC2700&cat=RAM&am...
Mainboard and CPU solution:
8Gb Dual 64 bit Opteron hypertransport with 8Gb ram:
US$3272
Eric card:
http://us.daxten.com/overview.cfm?prodID=37
$715
plus ancilliaries (good PSU, two HDDs CD rom with Knoppix left in for
remote system repair via Eric. Total around $4500.
Two of these to provide redundancy comes to around $9000.