Brion Vibber wrote:
It would take hours just to run a complete dump, which would be the equivalent of a sizeable fraction of our total daily page views. (Best case might be 100ms per page for 240,000 pages =~ 6 hours 40 minutes)
If we're going to run something like this daily, some sort of incremental updates are a must, though we can probably get away with stuffing the saved data per page in a database or such and slurping it back out fairly quickly.
*nod* That's very interesting.
As of this weekend, assuming things go well, we will have geoffrin and suda as database servers, and there's no problem in the short run with having gunther (still semi-on-loan from bomis) continue to run a 3rd replicator. So perhaps gunther could be tasked with generating this daily, so as to make sure it isn't interfering with anything else.
Am I right in assuming that the load on the main db of having a replicator attached is quite small?
Although we are not a business, so that pageviews don't equal revenue, I still like to make decisions as if that were true. For the organization as a whole, with my grand longterm vision of the Wikimedia Foundation being an organization similar in size and scope to say the National Geographic Society or Consumers Union, that's not a bad approximation. More pageviews means more fame which will ultimately mean more donations, more book sales, whatever.
Therefore, feeding data in a helpful way to major search engines is a very inexpensive form of advertising. It's worth doing, I think.
--Jimbo