On Sep 25, 2012, at 11:12 PM, "DaB." WP@daniel.baur4.info wrote:
Hello Erik, At Tuesday 25 September 2012 22:24:33 DaB. wrote:
The initial focus for Labs has been to provide functionality that toolserver doesn't - get root on a VM or set of VMs to install/test arbitrary software/services, and get it ready for production deployment.
It is nice to have root on a (virtual) machine, but I doubt most tools need it [..]
Indeed, however the reason this is crucial for labs is because its scope is much wider than Toolserver.
For example, in the "deployment" project we simulate nearly the entire WMF production cluster (including db hosts, apaches, squids, varnish, scalers, etc.).
This makes one of the very different goals of Labs possible, namely to allow volunteers to contribute to operations (as opposed to the software we run).
Once everything is puppetized one can basically create a new labs project, use "wmf-production" as template and instantiate a complete wmf cluster (not with all the database contents, just the server setup, though it'd contain sufficient sample data, the purpose is to simulate the servers to develop new configurations, not use as web site). Give it a subdomain and you'd immediately have stuff like commons.wikimedia.myproject.wmflabs.org.
Back to the subject, does that mean users will have to learn to manage a VM and require a public IP and subdomain? No, not at all. We're confusing Dev Labs with Tool Labs (perhaps we shouldn't name them like that as isolated projects).
Implementation of Tool Labs isn't decided on afaik, but I believe it will naturally solve itself by being distributed among various projects. Behind the scenes they will likely be a regular labs project, but abstracted for users (e.g. not an instance-group or even an instance per tool, but all in one instance-group, with a group of servers for different purposes, like Toolserver has web servers, sql servers, login/application servers).
E.g. the tools project in wmflabs would have various web servers and application servers[1]. Users wanting to run queries, bots and long-running/periodic processes would use the application servers. Ideally we'd encourage use of SGE (or something alike) from the beginning so that the application servers are optimally used, and it would make it easy to start a process in the background of an application server from a process on the web server
Access to the wmf wiki replicated dbs is public across the entire wmflabs network so that's a given within the toollabs project as well.
-- Krinkle
[1] The "bots" project exists already. It doesn't have SGE yet but it's a first step. There is also a generic "webtools" project being set up as we speak. Perhaps these two could be merged so that users have shared project storage for bots generating data to be used by bots and vice-versa.