On Mar 10, 2005 5:20 PM, Stirling Newberry stirling.newberry@xigenics.net wrote:
More than you are aware of, in that so far wikipedia has only had to deal with "natural" challenges of software, social organization and server load factor. Wait until you start having to deal with the problems of organized attempts to extract value from wikipedia, it will generate server loads and social problems which are, if the experience of e-tailers like EBay and Amazon are any indication two to four orders of magnitude above current peak loads. And I am not kidding.
Wikipedia is different from Amazon and ebay in that it is possible to download its entire content in a single compressed bundle (or a small set of these, one for each wiki). Presumably some way can be found to section off the bandwidth consumed by downloading these bundles from the bandwidth used for reading and editing the online version (i.e. have a host of mirrors only for these downloads).
Data miners and others trying to extract value from reading Wikipedia have a strong incentive to download a bundle and run their datamining scripts locally, rather than accessing the current Wikipedia page-by-page over the Net. It's faster, easier to code, and less likely to give offense.
So, what we need to worry about is mostly people doing wide-scale editing with bots for nefarious purposes (e.g. improving PageRank rankings).
Steve