On Thu, 28 Nov 2002 03:24:43 +0100 (CET)
Lars Aronsson <lars(a)aronsson.se> wrote:
Nick Hill wrote:
I envisage many wikipedia servers around the
world, supported by
private individuals, companies and universities. Much like the system
of mirror FTP and mirror web sites. All these servers are updated in
real time from the core wikipedia server. From the user's perspective,
all are equivalent.
My experience from situations like the one you describe tells me that
the designed system can easily get more complex
Systems can always become complex in an unworkable sense. If the
implementation is carefully managed, the complexity can be kept under
control.
The system I suggested, in effect, distributes chunks of data. The issue is
will these chunks of data, at some point, become incompatible with the
systems which are supposed to read them? Is a degree of non-fatal
incompatibility allowed?
example:
As the definition for tags changes, the front end will interpret them
differently, making the pages look different between newer and older
implementations.
and cause more
overhead than the needed performance gain,
Technical or computing overhead? How do
you convert technical overhead to
computing overhead? What is the needed performance gain?
and that Moores law will
give us the speed that we need in time when we need it.
Several variables in the
wikipedia system self-multiply. As the size of
database multiplies, the demands on the database system grow. As the size
of the database grows, the system becomes more attractive, bringing more
people to Wikipedia. We may currently be in a situation where we have
latent demand, which has been held back by system overload. Whilst the size
of wikipedia may approximate moore's law, the demand will probably exceed
it. The demand x size product is likely to far exceed moore's law.
We need more analysis on these issues. We need forecasts which the
architecture can be moulded around.
Do you have any experience from designing systems like
this? Would you
write a prototype for this system that could be tested?
I have designed database systems. I have not designed a system of exactly
this sort. I don't think the system I proposed is complex. It uses known
database tecniques combined with public key and email. All of which are
well matured and understood technologies which have extensibility built
into the current free software code base. If it becomes clear to me no-one
else is prepared to pick up the gauntlet, I will do so if I get time. A lot
of my time is being spent on the GNU project.
The vision
sounds like science fiction to me, but a prototype that I can run is
not science fiction, so that would make all the difference.
'still haven't made the transporter! :-(
Here is another vision: I envision a system where I can synchronize
my laptop or PDA with a wiki, then go offline and use it, update it,
and when I return to my office I can resynchronize the two again.
I have no idea on how to implement this vision. I think it would be a
lot of work. But I think the result could be really useful.
The system I mentioned would work for this purpose. The PDA could collect
the update emails then integrate them into the database. Porting Wiki and
the supporting technology to the PDA would be a lot of work.
I also see there are similarities between your vision and mine. The
idea is to express the update activity as a series of transactions
(update submits) that can be transfered to another instance or
multiple instances and be applied there. In either case, one must
take care of the case that the transmission of updates gets
interrupted or delayed, and the potential "edit conflicts" that would
result. It doesn't seem trivial to me.
The solution I proposed is
1) To have edits serialised. They can only be applied in the specific order
they were generated. 2) The pipermail mailing list will give sysadmins the
facility of downloading missed updates. 3) Spoof edits would be filtered-
the attachments from the main wiki server would be signed using a
private/public key pair and verified at the receiving end.