Erik Moeller wrote:
As others have noted, there's a difference between
offering data
(which we do - we've spent a lot of time, money and effort to ensure
that stuff like
dumps.wikimedia.org works reliably even at enwiki
scale) and providing a working environment for the dev community.
Having a primary working environment like Labs makes sense in much the
same way that it makes sense to have a primary multimedia repository
like Commons (and Wikidata, and in future probably a gadget
repository, a Lua script repository, etc.). It enables community
network effects and economies of scale that can't easily be replicated
and reduces wasteful duplication of effort.
Yes, there's a difference. But in this case, as far as I understand it, a
direct cost (or casualty) of setting up Wikimedia Labs is the existence of
the Toolserver. Does Wikimedia need a great testing infrastructure? Yes, of
course. (And it's not as though the Toolserver has ever been without its
share of issues; I'm not trying to white-wash the past here.) But the
question is: if such a Wikimedia testing infrastructure comes at the cost of
losing the Toolserver, is that acceptable?
Ryan Lane wrote:
If WMF becomes evil, fork the entire infrastructure
into EC2,
Rackspace cloud, HP cloud, etc. and bring the community operations
people along for the ride. Hell, use the replicated databases in Labs
to populate your database in the cloud.
Tim Landscheidt wrote:
But the nice thing about Labs is that you can try out
(re-
plicable :-)) replication setups at no cost, and don't have
to upfront investments on hardware, etc., so when time
comes, you can just upload your setup to EC2 or whatever and
have a working Wikipedia clone running in a manageable time-
frame.
This is not an easy task. Replicating the databases is enormously
challenging (they're huge datasets in the cases of the big wikis) and
they're constantly changing. If you tried to rely on dumps alone, you'd
always be out of date by at least two weeks (assuming dumps are working
properly). Two weeks on the Internet is a lot of time.
But more to the point, even if you suddenly had a lot of infrastructure
(bandwidth for constantly retrieving the data, space to store it all, and
extra memory and CPU to allow users to, y'know, do something with it) and
even if you suddenly had staff capable of managing these databases, not
every table is in even available currently. As far as I'm aware,
http://dumps.wikimedia.org doesn't include tables such as "user",
"ipblocks", "archive", "watchlist", any tables related to
global images or
global user accounts, and probably many others. I'm not sure a full audit
has ever been done, but this is partially tracked by
<https://bugzilla.wikimedia.org/show_bug.cgi?id=25602>.
So beyond the silly simplicity of the suggestion that one could simply "move
to the cloud!", there are currently technical impossibilities to doing so.
MZMcBride