-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Rohde:
The starting point is providing full-text history
availability and once you
have that there are a number of different projects (like wikiblame) which
would desire to pull and process every revision in some way.
okay, so full text access has been a 'would be nice' thing for a while. i
added an item to this year's shopping list for it.
it seems more useful to provide the text in uncompressed form, instead of the
MediaWiki internal form that's almost impossible to work with. does that seem
reasonable?
Some of the code I've worked with would probably
take weeks to run
single-threaded against enwiki, but that can be made practical if one is
willing to throw enough cores at the problem.
well, this probably isn't something we could afford ourselves, but if there's
enough interest in a batch computing infrastructure, it's probably worth
talking to external organisations about this.
From an exterior point of view it often seems like
toolserver is
significantly lagged or tools are going down, and from that I have generally
assumed that it operates relatively close to capacity a lot of the time.
that is correct. the way it works is we run at or over capacity for a while,
until we can afford new hardware, then things are fast for a while, until we
reach capacity again. this repeats every year or so. (interestingly, this is
exactly how Wikipedia worked in the first few years.)
- river.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (HP-UX)
iEYEARECAAYFAkm3iigACgkQIXd7fCuc5vKo+ACfS62b7U0dF+EtTcLcrEBHE22I
h1QAoItjhW1XYmzRl3KyJDFmxQ4nMvye
=jvq3
-----END PGP SIGNATURE-----