On Tue, 10 Sep 2002 14:09:33 -0700
Ray Saintonge <saintonge(a)telus.net> wrote:
<information suggesting a non-linear growth curve for Wikipedia>
I have seen messages talking about changing the database engine from MySql
to postgreSQL to fix table locking problems on a busy system.
I am concerned that this _type_ of engineering work may not be what is
really needed.
My contention is that Wikipedia load can grow at an exponential rate but
may be constrained by resource availablility. There are many factors which
cause self-multiplication.
Decisions which need to be made:
1) Do we want Wikipedia to be _able_ to grow at an exponential rate?
If yes:
a) We need to consider a technical system which can be put in place to
distribute load such that no one system needs to handle all the load
b) Consider whether the current social system of regulation can scale to
meet demand and monitor this
c) Keep a conscious review open to ensure the quality of Wikipedia
with such an exponential growth and consider adding constraints
to growth if such a growth rate starts causing undesirable effects.
If no:
a) Consider how availability of the system will be limited in order to
prevent exponential growth, and at what rate, if any, availability is
extended.
b) What parts of the system are best rationed to limit growth rate. ie
should searches, page views or edits be limited?
From my experience at using the system the last few
days, I percieve there
is currently a technical constraint limiting the rate of
growth. This may
be desirable, this may be undesirable. Do we know which it is? Has an
explicit decision been made?
A scalable solution is to give nearly all responsibility for all wiki
functionality to mirror servers. Updates are posted directly to the main
Wiki server which in turn posts the database updates to registered first
tier mirrors which, in turn, can post database updates to second tier
mirrors registered with them and so on. This way, all mirrors can be kept
in sync in near real time with a minimum of CPU, memory and network load.
The main server then need do nothing other than maintain database
consistency, accept and post updates.