On Mar 10, 2005 5:20 PM, Stirling Newberry
<stirling.newberry(a)xigenics.net> wrote:
More than you are aware of, in that so far wikipedia
has only had to
deal with "natural" challenges of software, social organization and
server load factor. Wait until you start having to deal with the
problems of organized attempts to extract value from wikipedia, it will
generate server loads and social problems which are, if the experience
of e-tailers like EBay and Amazon are any indication two to four orders
of magnitude above current peak loads. And I am not kidding.
Wikipedia is different from Amazon and ebay in that it is possible to
download its entire content in a single compressed bundle (or a small
set of these, one for each wiki). Presumably some way can be found to
section off the bandwidth consumed by downloading these bundles from
the bandwidth used for reading and editing the online version (i.e.
have a host of mirrors only for these downloads).
Data miners and others trying to extract value from reading Wikipedia
have a strong incentive to download a bundle and run their datamining
scripts locally, rather than accessing the current Wikipedia
page-by-page over the Net. It's faster, easier to code, and less
likely to give offense.
So, what we need to worry about is mostly people doing wide-scale
editing with bots for nefarious purposes (e.g. improving PageRank
rankings).
Steve