Erik Moeller wrote:
As others have noted, there's a difference between offering data (which we do - we've spent a lot of time, money and effort to ensure that stuff like dumps.wikimedia.org works reliably even at enwiki scale) and providing a working environment for the dev community.
Having a primary working environment like Labs makes sense in much the same way that it makes sense to have a primary multimedia repository like Commons (and Wikidata, and in future probably a gadget repository, a Lua script repository, etc.). It enables community network effects and economies of scale that can't easily be replicated and reduces wasteful duplication of effort.
Yes, there's a difference. But in this case, as far as I understand it, a direct cost (or casualty) of setting up Wikimedia Labs is the existence of the Toolserver. Does Wikimedia need a great testing infrastructure? Yes, of course. (And it's not as though the Toolserver has ever been without its share of issues; I'm not trying to white-wash the past here.) But the question is: if such a Wikimedia testing infrastructure comes at the cost of losing the Toolserver, is that acceptable?
Ryan Lane wrote:
If WMF becomes evil, fork the entire infrastructure into EC2, Rackspace cloud, HP cloud, etc. and bring the community operations people along for the ride. Hell, use the replicated databases in Labs to populate your database in the cloud.
Tim Landscheidt wrote:
But the nice thing about Labs is that you can try out (re- plicable :-)) replication setups at no cost, and don't have to upfront investments on hardware, etc., so when time comes, you can just upload your setup to EC2 or whatever and have a working Wikipedia clone running in a manageable time- frame.
This is not an easy task. Replicating the databases is enormously challenging (they're huge datasets in the cases of the big wikis) and they're constantly changing. If you tried to rely on dumps alone, you'd always be out of date by at least two weeks (assuming dumps are working properly). Two weeks on the Internet is a lot of time.
But more to the point, even if you suddenly had a lot of infrastructure (bandwidth for constantly retrieving the data, space to store it all, and extra memory and CPU to allow users to, y'know, do something with it) and even if you suddenly had staff capable of managing these databases, not every table is in even available currently. As far as I'm aware, http://dumps.wikimedia.org doesn't include tables such as "user", "ipblocks", "archive", "watchlist", any tables related to global images or global user accounts, and probably many others. I'm not sure a full audit has ever been done, but this is partially tracked by https://bugzilla.wikimedia.org/show_bug.cgi?id=25602.
So beyond the silly simplicity of the suggestion that one could simply "move to the cloud!", there are currently technical impossibilities to doing so.
MZMcBride