On Wed, Mar 11, 2009 at 2:09 AM, Brian Brian.Mingus@colorado.edu wrote:
Sure - creating a lucene index of the entire revision history of all wikipedia's for a WikiBlame extension.
More realistically (although I would like to do the above) a natural language parse of the current revision of the english wikipedia. Based on the supposed availability of this hardware, I'd say it could be done in less than a week.
https://wiki.toolserver.org/view/Servers
I have to say the toolserver has grown a lot from that first donated server ^_^
I will confess that this server list is significantly more impressive than I expected it to be based on historical recollections.
To answer River's question, I would basically agree with Brian. The starting point is providing full-text history availability and once you have that there are a number of different projects (like wikiblame) which would desire to pull and process every revision in some way. Some of the code I've worked with would probably take weeks to run single-threaded against enwiki, but that can be made practical if one is willing to throw enough cores at the problem. From an exterior point of view it often seems like toolserver is significantly lagged or tools are going down, and from that I have generally assumed that it operates relatively close to capacity a lot of the time. Perhaps that is a bad assumption, and there is in fact plenty of spare capacity?
-Robert Rohde