-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Robert Rohde:
The starting point is providing full-text history availability and once you have that there are a number of different projects (like wikiblame) which would desire to pull and process every revision in some way.
okay, so full text access has been a 'would be nice' thing for a while. i added an item to this year's shopping list for it.
it seems more useful to provide the text in uncompressed form, instead of the MediaWiki internal form that's almost impossible to work with. does that seem reasonable?
Some of the code I've worked with would probably take weeks to run single-threaded against enwiki, but that can be made practical if one is willing to throw enough cores at the problem.
well, this probably isn't something we could afford ourselves, but if there's enough interest in a batch computing infrastructure, it's probably worth talking to external organisations about this.
From an exterior point of view it often seems like toolserver is significantly lagged or tools are going down, and from that I have generally assumed that it operates relatively close to capacity a lot of the time.
that is correct. the way it works is we run at or over capacity for a while, until we can afford new hardware, then things are fast for a while, until we reach capacity again. this repeats every year or so. (interestingly, this is exactly how Wikipedia worked in the first few years.)
- river.