Re: [Wikitech-l] Re: New system ideas

3 Jan 2004

      Gabriel Wicke wrote:
...
I agree with all you're saying and like the thought of having a global
cluster with arbitration, but i have some doubts:

What's the minimum hardware capable of running the databases, the

webserver, the cache etc? Is all this possible on a cheap unit while still
being fast? I would expect a RAM requirement of at least 4Gb, but i might
be wrong. This would certainly increase once more languages start to grow,
so it might be necessary to have separate machines for separate languages.
This depends on the size of data sets. The most busy fine grained data 
can be held in memory, eg a ramdisk. A machine with dual 64Bit Opteron, 
8Gb ram and an Eric remote administration card weighs in at around 
US$4500. The machine can be upgraded to 16Gb. 64 bit is necessary for 
machines of over 4Gb. 4Gb is an addressable limit for 32bit.
I posted a possible hardware config to wikitech-l on 01/01/04 14:16.
...

With the number of nodes increasing, replication traffic might be fairly

high (imagine mass undoing somebody's changes replicated to ten machines)

encryption of replication traffic will drain the cpu, even a simple scp

does this- imagine the same for ten streams
A compressed SCP connection using blowfish cypher for English text 
between two AMD Athlon XP2200+ CPUs gives a throughput of 
3.3Megabytes/sec. The majority of the load is on the sending machine. A 
dual Opteron 240 with a 64 bit optimised cipher algorithm can probably 
transfer 12Mb/sec or more.
...
...
If no single machine is critical and machines are widely separated, we 
would not even need to worry whether the machines are equipped with UPS 
or redundant supplies.
If the switchover is quick, this would be perfect- no need for separate
backups and so on.
To get an idea of the hardware requirements it would be nice if somebody
could install all of wikipedia on a cheap box and do some load testing on
it ( if possible with replication).
I agree.
Running the fine-grained database (article database, not media, 
graphics) from ramdisk could eliminate I/O blocking and wear on hard 
drives possibly increasing performance by a massive degree. We should 
really experiment with this.
Do we have means to replay typical wikipedia activity from a log file? I 
am thinking of a PERL script which reads a real wikipedia common log 
file replaying the same load pattern at defineable speeds.
If someone has a pre-configured server image and some real log files, I 
can do this.
I have already written a simple PERL script which can be modified to 
replay server load in real time from log files.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: New system ideas