On Thu, 28 Nov 2002 03:24:43 +0100 (CET) Lars Aronsson lars@aronsson.se wrote:
Nick Hill wrote:
I envisage many wikipedia servers around the world, supported by private individuals, companies and universities. Much like the system of mirror FTP and mirror web sites. All these servers are updated in real time from the core wikipedia server. From the user's perspective, all are equivalent.
My experience from situations like the one you describe tells me that the designed system can easily get more complex
Systems can always become complex in an unworkable sense. If the implementation is carefully managed, the complexity can be kept under control.
The system I suggested, in effect, distributes chunks of data. The issue is will these chunks of data, at some point, become incompatible with the systems which are supposed to read them? Is a degree of non-fatal incompatibility allowed?
example: As the definition for tags changes, the front end will interpret them differently, making the pages look different between newer and older implementations.
and cause more overhead than the needed performance gain,
Technical or computing overhead? How do you convert technical overhead to computing overhead? What is the needed performance gain?
and that Moores law will give us the speed that we need in time when we need it.
Several variables in the wikipedia system self-multiply. As the size of database multiplies, the demands on the database system grow. As the size of the database grows, the system becomes more attractive, bringing more people to Wikipedia. We may currently be in a situation where we have latent demand, which has been held back by system overload. Whilst the size of wikipedia may approximate moore's law, the demand will probably exceed it. The demand x size product is likely to far exceed moore's law.
We need more analysis on these issues. We need forecasts which the architecture can be moulded around.
Do you have any experience from designing systems like this? Would you write a prototype for this system that could be tested?
I have designed database systems. I have not designed a system of exactly this sort. I don't think the system I proposed is complex. It uses known database tecniques combined with public key and email. All of which are well matured and understood technologies which have extensibility built into the current free software code base. If it becomes clear to me no-one else is prepared to pick up the gauntlet, I will do so if I get time. A lot of my time is being spent on the GNU project.
The vision sounds like science fiction to me, but a prototype that I can run is not science fiction, so that would make all the difference.
'still haven't made the transporter! :-(
Here is another vision: I envision a system where I can synchronize my laptop or PDA with a wiki, then go offline and use it, update it, and when I return to my office I can resynchronize the two again. I have no idea on how to implement this vision. I think it would be a lot of work. But I think the result could be really useful.
The system I mentioned would work for this purpose. The PDA could collect the update emails then integrate them into the database. Porting Wiki and the supporting technology to the PDA would be a lot of work.
I also see there are similarities between your vision and mine. The idea is to express the update activity as a series of transactions (update submits) that can be transfered to another instance or multiple instances and be applied there. In either case, one must take care of the case that the transmission of updates gets interrupted or delayed, and the potential "edit conflicts" that would result. It doesn't seem trivial to me.
The solution I proposed is 1) To have edits serialised. They can only be applied in the specific order they were generated. 2) The pipermail mailing list will give sysadmins the facility of downloading missed updates. 3) Spoof edits would be filtered- the attachments from the main wiki server would be signed using a private/public key pair and verified at the receiving end.