Perhaps it is useful to summarize reasons why toolserver users are not able to change to tool/bot labs. I added my main reasons. Perhaps other can add their reasons, too? (Mabe we should also add this list to the wiki page)
temporary blockers * no replication of wikimedia wiki databases ** joining of user databases with wiki databases * no support for script execution dependency (on ts: currently done by sge) * no support for servlets
missing support blockers * no support for new users not familar with unix based systems * no transparent updating of packages with security problems/bug
permanent blockers * license problems (i wrote code at work for my company and reuse parts for my bot framework. I have not the right to declare this code as open source which is needed by labs policy.) * no DaB.
Merlissimo wrote:
Perhaps it is useful to summarize reasons why toolserver users are not able to change to tool/bot labs. I added my main reasons. Perhaps other can add their reasons, too? (Mabe we should also add this list to the wiki page)
temporary blockers
- no replication of wikimedia wiki databases
** joining of user databases with wiki databases
- no support for script execution dependency (on ts: currently done by sge)
- no support for servlets
missing support blockers
- no support for new users not familar with unix based systems
- no transparent updating of packages with security problems/bug
permanent blockers
- license problems (i wrote code at work for my company and reuse parts
for my bot framework. I have not the right to declare this code as open source which is needed by labs policy.)
- no DaB.
I think I'd add "general direction of centralizing everything under a single Wikimedia Foundation is a bad idea" as a permanent blocker. Maybe there's a reasonable case for why deprecating the Toolserver and creating Wikimedia Labs is a great idea, but I don't see it yet.
I don't see why each (Wikimedia) chapter shouldn't have its own replica of the databases. We want free content to be free (and re-used and re-mixed and whatever else). If you're going to invest in infrastructure, I think it makes more sense to bolster replication support than try to compete with the Toolserver.
That said, pooled resources can sometimes be a smart move to save on investments such as hardware. Chapters working together is not a bad thing (I believe some chapters donated to Wikimedia Deutschland for Toolserver support in the past). But the broader point is that users should be very cautious of the general direction that a Wikimedia (Foundation) Labs is headed and ask whether it's really a good idea iff it means the destruction of free-standing projects such as the Toolserver.
MZMcBride
(anonymous) wrote:
[...] I think I'd add "general direction of centralizing everything under a single Wikimedia Foundation is a bad idea" as a permanent blocker. Maybe there's a reasonable case for why deprecating the Toolserver and creating Wikimedia Labs is a great idea, but I don't see it yet.
I don't see why each (Wikimedia) chapter shouldn't have its own replica of the databases. We want free content to be free (and re-used and re-mixed and whatever else). If you're going to invest in infrastructure, I think it makes more sense to bolster replication support than try to compete with the Toolserver.
That said, pooled resources can sometimes be a smart move to save on investments such as hardware. Chapters working together is not a bad thing (I believe some chapters donated to Wikimedia Deutschland for Toolserver support in the past). But the broader point is that users should be very cautious of the general direction that a Wikimedia (Foundation) Labs is headed and ask whether it's really a good idea iff it means the destruction of free-standing projects such as the Toolserver.
IMHO you have to differentiate between data and function. It makes no sense to build artificial obstacles when setting up some tool that can only be reasonably used with the live dataset. On the other hand, preparing for a day where WMF turns rogue is never wrong.
But the nice thing about Labs is that you can try out (re- plicable :-)) replication setups at no cost, and don't have to upfront investments on hardware, etc., so when time comes, you can just upload your setup to EC2 or whatever and have a working Wikipedia clone running in a manageable time- frame.
Tim
At the risk of outing myself as "naive": I do not see this as a problem like MZMcBride does. I think the foundation should have earned our trust by now and them locking down the data does not seem like a credible threat to me. In any case:
a) you can download dumps to access the data independently from WMF b) the replication to the TS is already "at the mercy" of WMF. The TS does not make the data any free-er.
Best, Dschwen
I think I'd add "general direction of centralizing everything under a single Wikimedia Foundation is a bad idea" as a permanent blocker. Maybe there's a reasonable case for why deprecating the Toolserver and creating Wikimedia Labs is a great idea, but I don't see it yet.
I don't see why each (Wikimedia) chapter shouldn't have its own replica of the databases. We want free content to be free (and re-used and re-mixed and whatever else). If you're going to invest in infrastructure, I think it makes more sense to bolster replication support than try to compete with the Toolserver.
On Wed, Sep 26, 2012 at 10:15 AM, MZMcBride z@mzmcbride.com wrote:
I think I'd add "general direction of centralizing everything under a single Wikimedia Foundation is a bad idea" as a permanent blocker.
As others have noted, there's a difference between offering data (which we do - we've spent a lot of time, money and effort to ensure that stuff like dumps.wikimedia.org works reliably even at enwiki scale) and providing a working environment for the dev community.
Having a primary working environment like Labs makes sense in much the same way that it makes sense to have a primary multimedia repository like Commons (and Wikidata, and in future probably a gadget repository, a Lua script repository, etc.). It enables community network effects and economies of scale that can't easily be replicated and reduces wasteful duplication of effort.
That said, I'd love to make more real-time data feeds available for third parties in general. The analytics team is currently looking into offering a sensible alternative to the IRC feed for edit metadata, for example.
Erik
As others have noted, there's a difference between offering data (which we do - we've spent a lot of time, money and effort to ensure that stuff like dumps.wikimedia.org works reliably even at enwiki scale) and providing a working environment for the dev community.
Having a primary working environment like Labs makes sense in much the same way that it makes sense to have a primary multimedia repository like Commons (and Wikidata, and in future probably a gadget repository, a Lua script repository, etc.). It enables community network effects and economies of scale that can't easily be replicated and reduces wasteful duplication of effort.
I'd like to go a little further on this point.
One of the goals of Labs is to have a fully virtualized clone of our entire infrastructure that is also completely puppetized in a way that's reusable by third parties. If you're worried about WMF, then you should participate in Labs. You should help puppetize and should help make everything usable by non-WMF entities.
Bringing community operations members back into the operations of the site is another one of the goals of Labs. If we have enough community operations people, then the projects aren't dependent on the knowledge of the staff to survive.
If WMF becomes evil, fork the entire infrastructure into EC2, Rackspace cloud, HP cloud, etc. and bring the community operations people along for the ride. Hell, use the replicated databases in Labs to populate your database in the cloud.
- Ryan
Erik Moeller wrote:
As others have noted, there's a difference between offering data (which we do - we've spent a lot of time, money and effort to ensure that stuff like dumps.wikimedia.org works reliably even at enwiki scale) and providing a working environment for the dev community.
Having a primary working environment like Labs makes sense in much the same way that it makes sense to have a primary multimedia repository like Commons (and Wikidata, and in future probably a gadget repository, a Lua script repository, etc.). It enables community network effects and economies of scale that can't easily be replicated and reduces wasteful duplication of effort.
Yes, there's a difference. But in this case, as far as I understand it, a direct cost (or casualty) of setting up Wikimedia Labs is the existence of the Toolserver. Does Wikimedia need a great testing infrastructure? Yes, of course. (And it's not as though the Toolserver has ever been without its share of issues; I'm not trying to white-wash the past here.) But the question is: if such a Wikimedia testing infrastructure comes at the cost of losing the Toolserver, is that acceptable?
Ryan Lane wrote:
If WMF becomes evil, fork the entire infrastructure into EC2, Rackspace cloud, HP cloud, etc. and bring the community operations people along for the ride. Hell, use the replicated databases in Labs to populate your database in the cloud.
Tim Landscheidt wrote:
But the nice thing about Labs is that you can try out (re- plicable :-)) replication setups at no cost, and don't have to upfront investments on hardware, etc., so when time comes, you can just upload your setup to EC2 or whatever and have a working Wikipedia clone running in a manageable time- frame.
This is not an easy task. Replicating the databases is enormously challenging (they're huge datasets in the cases of the big wikis) and they're constantly changing. If you tried to rely on dumps alone, you'd always be out of date by at least two weeks (assuming dumps are working properly). Two weeks on the Internet is a lot of time.
But more to the point, even if you suddenly had a lot of infrastructure (bandwidth for constantly retrieving the data, space to store it all, and extra memory and CPU to allow users to, y'know, do something with it) and even if you suddenly had staff capable of managing these databases, not every table is in even available currently. As far as I'm aware, http://dumps.wikimedia.org doesn't include tables such as "user", "ipblocks", "archive", "watchlist", any tables related to global images or global user accounts, and probably many others. I'm not sure a full audit has ever been done, but this is partially tracked by https://bugzilla.wikimedia.org/show_bug.cgi?id=25602.
So beyond the silly simplicity of the suggestion that one could simply "move to the cloud!", there are currently technical impossibilities to doing so.
MZMcBride
Yes, there's a difference. But in this case, as far as I understand it, a direct cost (or casualty) of setting up Wikimedia Labs is the existence of the Toolserver. Does Wikimedia need a great testing infrastructure? Yes, of course. (And it's not as though the Toolserver has ever been without its share of issues; I'm not trying to white-wash the past here.) But the question is: if such a Wikimedia testing infrastructure comes at the cost of losing the Toolserver, is that acceptable?
This is a scarecrow argument. The mere existence of Labs doesn't mean the loss of Toolserver.
Labs is more than just a testing infrastructure. It's an infrastructure for creating things, for enable volunteer operations, for bringing operations and development together, for integrating other projects, and for providing free hosting to projects that may not have it otherwise. Labs just also happens to need some of the same features as Toolserver.
Again, as I've mentioned, Labs purpose isn't a Toolserver replacement. It's vision is much, much larger than what the Toolserver can do.
Ryan Lane wrote:
If WMF becomes evil, fork the entire infrastructure into EC2, Rackspace cloud, HP cloud, etc. and bring the community operations people along for the ride. Hell, use the replicated databases in Labs to populate your database in the cloud.
Tim Landscheidt wrote:
But the nice thing about Labs is that you can try out (re- plicable :-)) replication setups at no cost, and don't have to upfront investments on hardware, etc., so when time comes, you can just upload your setup to EC2 or whatever and have a working Wikipedia clone running in a manageable time- frame.
This is not an easy task. Replicating the databases is enormously challenging (they're huge datasets in the cases of the big wikis) and they're constantly changing. If you tried to rely on dumps alone, you'd always be out of date by at least two weeks (assuming dumps are working properly). Two weeks on the Internet is a lot of time.
But more to the point, even if you suddenly had a lot of infrastructure (bandwidth for constantly retrieving the data, space to store it all, and extra memory and CPU to allow users to, y'know, do something with it) and even if you suddenly had staff capable of managing these databases, not every table is in even available currently. As far as I'm aware, http://dumps.wikimedia.org doesn't include tables such as "user", "ipblocks", "archive", "watchlist", any tables related to global images or global user accounts, and probably many others. I'm not sure a full audit has ever been done, but this is partially tracked by https://bugzilla.wikimedia.org/show_bug.cgi?id=25602.
So beyond the silly simplicity of the suggestion that one could simply "move to the cloud!", there are currently technical impossibilities to doing so.
It's the same impossibilities for forking any single CC project online. We're not allowed by our privacy policy (and very likely by law) to provide that information. It's absurd to fault us on this. I guess we're being evil by not being evil.
We've providing every single other needed piece of the puzzle required for forking.
- Ryan
You may not have meant for it to lead to the end of the Toolserver, but apparently that's how WMDE is taking it, and it sounds like that's going to be the inevitable result. To say otherwise is rather naive at this point, given the size of the threads talking about this.
---- User:Hersfold hersfoldwiki@gmail.com
On 9/26/2012 6:06 PM, Ryan Lane wrote:
Yes, there's a difference. But in this case, as far as I understand it, a direct cost (or casualty) of setting up Wikimedia Labs is the existence of the Toolserver. Does Wikimedia need a great testing infrastructure? Yes, of course. (And it's not as though the Toolserver has ever been without its share of issues; I'm not trying to white-wash the past here.) But the question is: if such a Wikimedia testing infrastructure comes at the cost of losing the Toolserver, is that acceptable?
This is a scarecrow argument. The mere existence of Labs doesn't mean the loss of Toolserver.
Labs is more than just a testing infrastructure. It's an infrastructure for creating things, for enable volunteer operations, for bringing operations and development together, for integrating other projects, and for providing free hosting to projects that may not have it otherwise. Labs just also happens to need some of the same features as Toolserver.
Again, as I've mentioned, Labs purpose isn't a Toolserver replacement. It's vision is much, much larger than what the Toolserver can do.
Ryan Lane wrote:
If WMF becomes evil, fork the entire infrastructure into EC2, Rackspace cloud, HP cloud, etc. and bring the community operations people along for the ride. Hell, use the replicated databases in Labs to populate your database in the cloud.
Tim Landscheidt wrote:
But the nice thing about Labs is that you can try out (re- plicable :-)) replication setups at no cost, and don't have to upfront investments on hardware, etc., so when time comes, you can just upload your setup to EC2 or whatever and have a working Wikipedia clone running in a manageable time- frame.
This is not an easy task. Replicating the databases is enormously challenging (they're huge datasets in the cases of the big wikis) and they're constantly changing. If you tried to rely on dumps alone, you'd always be out of date by at least two weeks (assuming dumps are working properly). Two weeks on the Internet is a lot of time.
But more to the point, even if you suddenly had a lot of infrastructure (bandwidth for constantly retrieving the data, space to store it all, and extra memory and CPU to allow users to, y'know, do something with it) and even if you suddenly had staff capable of managing these databases, not every table is in even available currently. As far as I'm aware, http://dumps.wikimedia.org doesn't include tables such as "user", "ipblocks", "archive", "watchlist", any tables related to global images or global user accounts, and probably many others. I'm not sure a full audit has ever been done, but this is partially tracked by https://bugzilla.wikimedia.org/show_bug.cgi?id=25602.
So beyond the silly simplicity of the suggestion that one could simply "move to the cloud!", there are currently technical impossibilities to doing so.
It's the same impossibilities for forking any single CC project online. We're not allowed by our privacy policy (and very likely by law) to provide that information. It's absurd to fault us on this. I guess we're being evil by not being evil.
We've providing every single other needed piece of the puzzle required for forking.
- Ryan
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
On Wed, Sep 26, 2012 at 6:29 PM, Hersfold hersfoldwiki@gmail.com wrote:
You may not have meant for it to lead to the end of the Toolserver, but apparently that's how WMDE is taking it, and it sounds like that's going to be the inevitable result. To say otherwise is rather naive at this point, given the size of the threads talking about this.
I'll be honest, I don't really care about the politics behind any of this, and I'm going to ignore anything more related to that. WMDE dropping Toolserver is their decision and it doesn't affect how Labs will operate in the future.
Labs is adding infrastructure needed to support Toolserver users. If there's anything the Toolserver community needs that isn't in our current roadmap, I'm more than happy to work those issues with the community. The environment isn't going to be exactly the same, so tools and bots may need to be modified. We can provide the necessary resources, access, and training to integrate into the new environment. WMDE will be providing resources to help with migrations.
Overall the environment provided by Labs has the ability to be much more flexible and much more powerful than Toolserver. I hope everyone migrates over, but I'll understand if anyone feels like it's too much work.
- Ryan
Ryan wrote:
Again, as I've mentioned, Labs purpose isn't a Toolserver replacement. It's vision is much, much larger than what the Toolserver can do.
Which in the meanwhile will allow us to do a much, much narrower set of things for Wikimedia projects than the Toolserver can do.
Of course, maybe in 5 or 10 years users will be able to reinvent from scratch or readapt what has been done on the Toolserver in these years, and in a much better way. Before that age en.wiki might have been locked due to editor activity drop.
But yes, you're right, we should be using a better terminology: the Toolserver isn't being "replaced", it's being killed/terminated/discontinued/trashed/<insert favourite word here>.
Hersfold wrote:
You may not have meant for it to lead to the end of the Toolserver, but apparently that's how WMDE is taking it, and it sounds like that's going to be the inevitable result. To say otherwise is rather naive at this point, given the size of the threads talking about this.
+1 (except that it's not WMDE).
Ryan wrote:
I'll be honest, I don't really care about the politics behind any of this, and I'm going to ignore anything more related to that. WMDE dropping Toolserver is their decision [...]
Ridiculous. Your boss said that it's the WMF's decision to terminate the Toolserver just a few mails ago: «for our part, we will not continue to support the current arrangement (DB replication, hosting in our data-center, etc.) indefinitely». http://lists.wikimedia.org/pipermail/toolserver-l/2012-September/005294.html
[...] and it doesn't affect how Labs will operate in the future. [...] WMDE will be providing resources to help with migrations.
Can Pavel confirm this? Or are you the one who decides about WMDE budget now?
In general, I'm really amazed by this approach "it doesn't affect us" etc. Is Wikimedia Labs supposed to advance Wikimedia's mission and help Wikimedia projects or not? Can this be done ignoring the context? Do you really think that trashing all tools and services currently on Toolserver rather than ensuring they mostly will continue operating makes any sense for the scope of Wikimedia Labs? I wish someone guesstimated the value of Toolserver's current tools and services in terms of developing work hours and the cost for migration. I'm quite sure that by requiring a huge effort for migration and therefore trashing most stuff you'll be losing millions of dollars of value for your Wikimedia Labs. Too bad that also Wikimedia projects will lose the corresponding value.
Finally, I'm greatly re-evaluating the wisdom of those users who across the years insistently used things like appspot.com, heroku.com or their own websites where possible for their Wikimedia tools. They are extremely unreliable and limited by what's possible with dumps, API and screenscraping, but at least they don't rely on a single person in the WMF not pressing the huge red button.
Nemo
(anonymous) wrote:
[...] Ryan Lane wrote:
If WMF becomes evil, fork the entire infrastructure into EC2, Rackspace cloud, HP cloud, etc. and bring the community operations people along for the ride. Hell, use the replicated databases in Labs to populate your database in the cloud.
Tim Landscheidt wrote:
But the nice thing about Labs is that you can try out (re- plicable :-)) replication setups at no cost, and don't have to upfront investments on hardware, etc., so when time comes, you can just upload your setup to EC2 or whatever and have a working Wikipedia clone running in a manageable time- frame.
This is not an easy task. Replicating the databases is enormously challenging (they're huge datasets in the cases of the big wikis) and they're constantly changing. If you tried to rely on dumps alone, you'd always be out of date by at least two weeks (assuming dumps are working properly). Two weeks on the Internet is a lot of time.
I don't know if this is not an easy task, but you are proba- bly right. So what? If a scenario of WMF turning rogue couldn't bear losing two weeks of edits while saving almost a decade, we should work on ways to incremental dumps.
But more to the point, even if you suddenly had a lot of infrastructure (bandwidth for constantly retrieving the data, space to store it all, and extra memory and CPU to allow users to, y'know, do something with it) and even if you suddenly had staff capable of managing these databases, not every table is in even available currently. As far as I'm aware, http://dumps.wikimedia.org doesn't include tables such as "user", "ipblocks", "archive", "watchlist", any tables related to global images or global user accounts, and probably many others. I'm not sure a full audit has ever been done, but this is partially tracked by https://bugzilla.wikimedia.org/show_bug.cgi?id=25602.
The first part is easy: You go to some supplier and buy bandwith, space, memory and CPU. There is even staff for hire.
The second part is simple as well: What do you need "ipblocks" or "watchlist" in a Wikipedia clone for? It cer- tainly is neither free content nor the content users use Wi- kipedia for.
So beyond the silly simplicity of the suggestion that one could simply "move to the cloud!", there are currently technical impossibilities to doing so.
And it would be far more helpful if you could stop spreading FUD and instead show what actual impediments there are, for example in a Labs project.
Tim
Στις 26-09-2012, ημέρα Τετ, και ώρα 23:38 +0000, ο/η Tim Landscheidt έγραψε:
(anonymous) wrote:
[...] Ryan Lane wrote:
If WMF becomes evil, fork the entire infrastructure into EC2, Rackspace cloud, HP cloud, etc. and bring the community operations people along for the ride. Hell, use the replicated databases in Labs to populate your database in the cloud.
Tim Landscheidt wrote:
But the nice thing about Labs is that you can try out (re- plicable :-)) replication setups at no cost, and don't have to upfront investments on hardware, etc., so when time comes, you can just upload your setup to EC2 or whatever and have a working Wikipedia clone running in a manageable time- frame.
This is not an easy task. Replicating the databases is enormously challenging (they're huge datasets in the cases of the big wikis) and they're constantly changing. If you tried to rely on dumps alone, you'd always be out of date by at least two weeks (assuming dumps are working properly). Two weeks on the Internet is a lot of time.
I don't know if this is not an easy task, but you are proba- bly right. So what? If a scenario of WMF turning rogue couldn't bear losing two weeks of edits while saving almost a decade, we should work on ways to incremental dumps.
In fact there are (experimental) adds/changes dumps, so while it might not be a 5 minute procedure to get that data into your copy, and deletions and suppressions wouldn't be covered, the amount of data that would be lost would be pretty small.
Ariel
temporary blockers
- no replication of wikimedia wiki databases
** joining of user databases with wiki databases
We currently have no plans for having the user databases on the same servers as the replicated databases. Direct joins will not be possible, so tools will need to be modified.
- no support for script execution dependency (on ts: currently done by sge)
There's less of a need for this in Labs. If whatever you are running is really expensive, you can have your own instance. That said, I was looking at integrating a global queuing system. It won't be SGE, though.
If someone is really keen on SGE, then I recommend they work with us to puppetize it. Thankfully, open grid engine is already packaged in ubuntu, which should make that much easier.
- no support for servlets
I'm not sure what you mean by servlet?
missing support blockers
- no support for new users not familar with unix based systems
Can you describe how this is handled in Toolserver currently?
- no transparent updating of packages with security problems/bug
Ubuntu has unattended-upgrades. It's generally enabled on instances.
permanent blockers
- license problems (i wrote code at work for my company and reuse parts for
my bot framework. I have not the right to declare this code as open source which is needed by labs policy.)
This will continue to be a permanent blocker.
You can't decide that on your own, but you can ask your employer if you can open source the code.
- no DaB.
I'd love DaB to help us improve Labs.
Everything about Labs is fully open. Anyone can help build it, even the production portions.
- Ryan
On 26/09/12 20:25, Ryan Lane wrote:
temporary blockers
- no replication of wikimedia wiki databases
** joining of user databases with wiki databases
We currently have no plans for having the user databases on the same servers as the replicated databases. Direct joins will not be possible, so tools will need to be modified.
-50
It's such a useful feature, that it would be worth making a local mysql slaves for having them. I know, the all-powerful labs environment is unable to run a mysql instance, but we could use MySQL cluster, trading memory (available) to get joins (denied).
- no support for script execution dependency (on ts: currently done by sge)
There's less of a need for this in Labs. If whatever you are running is really expensive, you can have your own instance. That said, I was looking at integrating a global queuing system. It won't be SGE, though.
If someone is really keen on SGE, then I recommend they work with us to puppetize it. Thankfully, open grid engine is already packaged in ubuntu, which should make that much easier.
SGE is a strong queue system. We have people and tools already trained to use it. It would be my first option. That said, if the presented alternative has the same user interface, it shouldn't be a problem. For instance, I don't have an opinion about which of the SGE forks would be preferable.
- no support for servlets
I'm not sure what you mean by servlet?
J2EE, I guess.
- no DaB.
I'd love DaB to help us improve Labs.
Everything about Labs is fully open. Anyone can help build it, even the production portions.
- Ryan
Would it be worth our efforts? I sometimes wonder why we should work on that (yes, I'm pessimistic right now). For instance the squid in front of *.beta.wmflabs.org. It was configured by Petan and me. We had absolutely no support from the WMF. The squid wasn't purging correctly. It worked on production, so there was a config error somewhere. We begged to see the squid config for months. But as it was in the private repository, no, it can't be shown, just in case it has something secret (very unlikely for squid config). Yes, we will clean them up and publish, eventually. Months passed (not to mention how publishing the config had been requested years ago). It could have been quickly reviewed before handing out, and we weren't going to abuse it if there really something weird was there. Replicating the WMF setup was done without viewing that same setup. I finally fixed it. I was quite proud of having solved it. Where is that file right now? It vanished. The file was lost in one of the multiple corruptions of labs instances. It was replaced with a copy of the cluster config (which was finally published in the meantime). So it feels like wasted effort now. I'd have liked to save a local copy at least.
It's not enough to leave tools there and say "It is fully open. Anyone can help build it"
We currently have no plans for having the user databases on the same servers as the replicated databases. Direct joins will not be possible, so tools will need to be modified.
-50
It's such a useful feature, that it would be worth making a local mysql slaves for having them. I know, the all-powerful labs environment is unable to run a mysql instance, but we could use MySQL cluster, trading memory (available) to get joins (denied).
I'm not the one setting up the databases. If you want information about why this won't be available, talk to Asher (binasher in #wikimedia-operations on Freenode). Maybe he can be convinced otherwise.
Of course, in the production cluster we don't do joins this way. We handle the joins in the app logic, which is a more appropriate way of doing this.
SGE is a strong queue system. We have people and tools already trained to use it. It would be my first option. That said, if the presented alternative has the same user interface, it shouldn't be a problem. For instance, I don't have an opinion about which of the SGE forks would be preferable.
In general in Labs we don't have a large need for a queuing system right now. If Toolserver folks need it very badly, it's possible to add, someone just needs to put the effort into it. It likely wouldn't be amazingly hard to puppetize this to run in a single project. Making things multi-project is difficult and takes effort. Anyone can do the single-project version in a project, multi-project will likely take engineering effort.
- no support for servlets
I'm not sure what you mean by servlet?
J2EE, I guess.
Well, if it's available in the ubuntu repos, or if it's open source, then it's available in Labs.
I'd love DaB to help us improve Labs.
Everything about Labs is fully open. Anyone can help build it, even the production portions.
Would it be worth our efforts? I sometimes wonder why we should work on that (yes, I'm pessimistic right now). For instance the squid in front of *.beta.wmflabs.org. It was configured by Petan and me. We had absolutely no support from the WMF. The squid wasn't purging correctly. It worked on production, so there was a config error somewhere. We begged to see the squid config for months. But as it was in the private repository, no, it can't be shown, just in case it has something secret (very unlikely for squid config). Yes, we will clean them up and publish, eventually. Months passed (not to mention how publishing the config had been requested years ago). It could have been quickly reviewed before handing out, and we weren't going to abuse it if there really something weird was there. Replicating the WMF setup was done without viewing that same setup. I finally fixed it. I was quite proud of having solved it.
And you should be. Your changes kept that project moving along for months until I broke it.
Where is that file right now? It vanished. The file was lost in one of the multiple corruptions of labs instances. It was replaced with a copy of the cluster config (which was finally published in the meantime). So it feels like wasted effort now. I'd have liked to save a local copy at least.
To be fair, there's only been a single occurrence of instance corruption, which was due to a bug in KVM.
Also, yes, the squid configuration was finally published because ones of the devs spent the time to do so. I was working on stabilizing things most of that time.
Does this mean your efforts were wasted? Of course not. Your efforts helped keep the project running, which is important. Just because your file was replaced with the production copy doesn't mean the work put into it was for nothing.
It's not enough to leave tools there and say "It is fully open. Anyone can help build it"
We're also putting effort into making the migration happen, but we're focusing our efforts in different places. We can't do everything, which is why I'm trying to encourage others to help out. If we work on separate pieces of the work it'll go much quicker.
- Ryan
In general in Labs we don't have a large need for a queuing system right now.
Of course, because nobody is using it right now. I suppose Toolserver didn't need it when it had only a few users consuming its resources.
Does this mean your efforts were wasted? Of course not. Your efforts helped keep the project running, which is important. Just because your file was replaced with the production copy doesn't mean the work put into it was for nothing.
Amazing! We should suggest this approach to the editor engagement team, the post-edit feedback could say "Your valuable edit is now visible to the world. It will probably be reverted in a few minutes, but you can still be proud of it, knowing that the new revision written by someone else is much better!".
Nemo
On Thu, Sep 27, 2012 at 1:36 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
In general in Labs we don't have a large need for a queuing system right now.
Of course, because nobody is using it right now. I suppose Toolserver didn't need it when it had only a few users consuming its resources.
I should know better than to feed a troll, but Labs is relatively heavily used. At this moment there are 233 virtual machines running across 125 projects. It's actively used by quite a number of bots (which have already moved from Toolserver). It's being used by the following teams;
* Analytics * Editor-engagement * Visual editor * Global education * QA * Mobile * Pediapress * Localization * Wikidata * Operations * Fundraising * Core services
Many of those teams host multiple active projects.
Additionally, we have a number of volunteer driven projects. Here's a few choice ones:
* Bots * Deployment-prep * Maps (for OpenStreetMaps) * Wikistats * Wikitrust * Signwriting * Phabricator * Metavidwiki * Huggle * Glam * Wiki loves monuments * Blamemaps * Counter vandalism network
It was used extensively during Google summer of code by the students and mentors. It's also used very heavily during hackathons; most projects demo at the end with Labs.
These projects aren't in great need of a queue because they don't fight against each other for shared resources. When bots and tools are added that need to do expensive, long-running queries against a set of common databases we'll likely need some form of queuing system, but it hasn't been a high priority since we haven't been working on Toolserver like features.
- Ryan
Additionally, we have a number of volunteer driven projects. Here's a
few choice ones:
- Bots
- Deployment-prep
- Maps (for OpenStreetMaps)
- Wikistats
- Wikitrust
- Signwriting
- Phabricator
- Metavidwiki
- Huggle
- Glam
- Wiki loves monuments
- Blamemaps
- Counter vandalism network
Where can we find more information about these projects, especially OSM and WLM?
Am 27.09.2012 17:21, schrieb Andrei Cipu:
Where can we find more information about these projects, especially OSM and WLM?
For OSM you can look at: https://labsconsole.wikimedia.org/wiki/Nova_Resource:Maps
None of that instances has a public IP or an planet import in the moment. So it's far away from being usable, but that's also what Ryan said.
We don't know how efficient a PostgreSQL server runs in a virtualisation, but after talk with some PostgreSQL experts I'm a little bit sceptical. We also don't know if the foundation will maintain an external high-performance db-server for OSM rendering or if we are as a non-wikimedia project at the end of the priority list or out-of-scope.
Especially we don't know when will labs be ready for production. We could need it at begining of next year or sooner to roll-out a CPB-project (sponsored by German chapter) and to do other map things.
Greetings Tim alias Kolossos
The long-term plan is to have OSM in production. OSM in Labs is meant for puppetization, test, and development. I think we even have the hardware for OSM in production. Someone just needs to put the effort in for puppetization.
- Ryan
On Thu, Sep 27, 2012 at 10:23 AM, Kolossos tim.alder@s2002.tu-chemnitz.de wrote:
Am 27.09.2012 17:21, schrieb Andrei Cipu:
Where can we find more information about these projects, especially OSM and WLM?
For OSM you can look at: https://labsconsole.wikimedia.org/wiki/Nova_Resource:Maps
None of that instances has a public IP or an planet import in the moment. So it's far away from being usable, but that's also what Ryan said.
We don't know how efficient a PostgreSQL server runs in a virtualisation, but after talk with some PostgreSQL experts I'm a little bit sceptical. We also don't know if the foundation will maintain an external high-performance db-server for OSM rendering or if we are as a non-wikimedia project at the end of the priority list or out-of-scope.
Especially we don't know when will labs be ready for production. We could need it at begining of next year or sooner to roll-out a CPB-project (sponsored by German chapter) and to do other map things.
Greetings Tim alias Kolossos
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Here! i am interest in that! let me know!
On Fri, Sep 28, 2012 at 7:03 AM, Ryan Lane rlane@wikimedia.org wrote:
The long-term plan is to have OSM in production. OSM in Labs is meant for puppetization, test, and development. I think we even have the hardware for OSM in production. Someone just needs to put the effort in for puppetization
On Fri, Sep 28, 2012 at 7:03 AM, Ryan Lane rlane@wikimedia.org wrote:
The long-term plan is to have OSM in production. OSM in Labs is meant for puppetization, test, and development. I think we even have the hardware for OSM in production. Someone just needs to put the effort in for puppetization.
We are working on it and home to have something configured in labs soonish (e.g. next month or so) and ready for production.
We obviously don't want to rely on third parties for stuff in the mobile apps and want to further improve and customize stuff for our various uses.
We can use still use labs and toolserver for testing and developing stuff, such as new styles.
Cheers, Katie
- Ryan
On Thu, Sep 27, 2012 at 10:23 AM, Kolossos tim.alder@s2002.tu-chemnitz.de wrote:
Am 27.09.2012 17:21, schrieb Andrei Cipu:
Where can we find more information about these projects, especially OSM and WLM?
For OSM you can look at: https://labsconsole.wikimedia.org/wiki/Nova_Resource:Maps
None of that instances has a public IP or an planet import in the
moment. So
it's far away from being usable, but that's also what Ryan said.
We don't know how efficient a PostgreSQL server runs in a virtualisation, but after talk with some PostgreSQL experts I'm a little bit sceptical. We also don't know if the foundation will maintain an external high-performance db-server for OSM rendering or if we are as a
non-wikimedia
project at the end of the priority list or out-of-scope.
Especially we don't know when will labs be ready for production. We could need it at begining of next year or sooner to roll-out a CPB-project (sponsored by German chapter) and to do other map things.
Greetings Tim alias Kolossos
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
On 27/09/12 17:21, Andrei Cipu wrote:
Additionally, we have a number of volunteer driven projects. Here's a
few choice ones:
- Bots
- Deployment-prep
- Maps (for OpenStreetMaps)
- Wikistats
- Wikitrust
- Signwriting
- Phabricator
- Metavidwiki
- Huggle
- Glam
- Wiki loves monuments
- Blamemaps
- Counter vandalism network
Where can we find more information about these projects, especially OSM and WLM?
You can go to https://labsconsole.wikimedia.org/ List projects, take a look at project members and bug them to tell you their evil plans :)
The project for WLM is actually one for making a tool for judging the images (Wlmjudging). There are a couple of VMs, I don't know if it produced something. I'd ask Ynhockey about it.
All "real" work for Wiki Loves Monuments is done in the toolserver (plus http://wlm.wikimedia.org/ which has a copy of the data produced at TS).
Ryan wrote:
I should know better than to feed a troll, but Labs is relatively heavily used.
I'm sorry if that was perceived as trolling. I know most of that stuff, I like it, I've advertised some of those projects quite a lot myself etc., but it seems to be nowhere close to a few hundreds users doing very expensive and maybe silly things as those DaB. mentioned: queries lasting days, grep or sed run on hundreds of GiB of stuff, scripts taking 4 GiB of memory etc.; not to mention all sorts of queries which can be triggered by any of the tens of thousands of users of the web tools on TS. That said, of course you know better what's the capacity of Labs; what I can be sure of is that it's not infinite as you almost pretend. Anyway, the problem is not (yet) if typical TS stuff will have enough resources but if it will be possible at all (current answer: no, never! change how you do things instead!).
Nemo
On 27/09/12 01:07, Ryan Lane wrote:
We currently have no plans for having the user databases on the same servers as the replicated databases. Direct joins will not be possible, so tools will need to be modified.
-50
It's such a useful feature, that it would be worth making a local mysql slaves for having them. I know, the all-powerful labs environment is unable to run a mysql instance, but we could use MySQL cluster, trading memory (available) to get joins (denied).
I'm not the one setting up the databases. If you want information about why this won't be available, talk to Asher (binasher in #wikimedia-operations on Freenode). Maybe he can be convinced otherwise.
Of course, in the production cluster we don't do joins this way. We handle the joins in the app logic, which is a more appropriate way of doing this.
I disagree. In production you can just create a new table in the wiki db. We can't create new tables there in the toolserver (the dbs are a mirror or what there is in production). Thus, we create a new db in the same server and use a cross-db join instead of joining a new table.
Joining several wiki tables is probably more strange, with the exception of commons, which is more often joined to others, as the commons images "are also at the local wikis".
On Thu, Sep 27, 2012 at 7:58 PM, Platonides platonides@gmail.com wrote:
On 27/09/12 01:07, Ryan Lane wrote:
We currently have no plans for having the user databases on the same servers as the replicated databases. Direct joins will not be possible, so tools will need to be modified.
-50
It's such a useful feature, that it would be worth making a local mysql slaves for having them. I know, the all-powerful labs environment is unable to run a mysql instance, but we could use MySQL cluster, trading memory (available) to get joins (denied).
I'm not the one setting up the databases. If you want information about why this won't be available, talk to Asher (binasher in #wikimedia-operations on Freenode). Maybe he can be convinced otherwise.
Of course, in the production cluster we don't do joins this way. We handle the joins in the app logic, which is a more appropriate way of doing this.
I disagree. In production you can just create a new table in the wiki db. We can't create new tables there in the toolserver (the dbs are a mirror or what there is in production). Thus, we create a new db in the same server and use a cross-db join instead of joining a new table.
Joining several wiki tables is probably more strange, with the exception of commons, which is more often joined to others, as the commons images "are also at the local wikis".
Which brings us to the next point, will the commons database be replicated to all clusters, like the toolserver?
Bryan
On Wed, Sep 26, 2012 at 2:25 PM, Ryan Lane rlane@wikimedia.org wrote:
We currently have no plans for having the user databases on the same servers as the replicated databases. Direct joins will not be possible, so tools will need to be modified.
This is unfortunate, and a huge step backwards from the situation on the toolserver.
For example, the project I maintain on toolserver (the enwiki WP 1.0 assessment data) has user database tables with several million rows of data about articles, from which it needs to select the data for pages from fixed categories on the wiki, which themselves could have a few thousand members. The natural way to do this is to join against the categorylinks table. Any non-join solution is going to be much, much less efficient.
A key role of the toolserver setup was that it allowed these sorts of joins. Web hosting is cheap and data about the live wiki is already available in non-joinable form through the API with no replag.
- Carl
(anonymous) wrote:
We currently have no plans for having the user databases on the same servers as the replicated databases. Direct joins will not be possible, so tools will need to be modified.
This is unfortunate, and a huge step backwards from the situation on the toolserver.
For example, the project I maintain on toolserver (the enwiki WP 1.0 assessment data) has user database tables with several million rows of data about articles, from which it needs to select the data for pages from fixed categories on the wiki, which themselves could have a few thousand members. The natural way to do this is to join against the categorylinks table. Any non-join solution is going to be much, much less efficient.
A key role of the toolserver setup was that it allowed these sorts of joins. Web hosting is cheap and data about the live wiki is already available in non-joinable form through the API with no replag.
Even more: If Labs replication isn't bound by Toolserver tradition, it would be *very* nice not to fragment the data according to the different WMF clusters, plus Commons or not, plus (separate) user databases or not, but have one cluster where users can join as logic suggests. As Toolser- ver merges Commons onto other clusters already, this seems to be possible with MySQL.
Tim
On 01/10/12 13:03, Tim Landscheidt wrote:
Even more: If Labs replication isn't bound by Toolserver tradition, it would be *very* nice not to fragment the data according to the different WMF clusters, plus Commons or not, plus (separate) user databases or not, but have one cluster where users can join as logic suggests. As Toolser- ver merges Commons onto other clusters already, this seems to be possible with MySQL.
Tim
It's possible, you just need bigger servers which can hold all dbs. (plus some master/slave replication for user tables)
I suppose it will use the same clusters as WMF. After all, there's a reason the WMF clusters needed to be splitted.
(anonymous) wrote:
Even more: If Labs replication isn't bound by Toolserver tradition, it would be *very* nice not to fragment the data according to the different WMF clusters, plus Commons or not, plus (separate) user databases or not, but have one cluster where users can join as logic suggests. As Toolser- ver merges Commons onto other clusters already, this seems to be possible with MySQL.
It's possible, you just need bigger servers which can hold all dbs. (plus some master/slave replication for user tables)
I suppose it will use the same clusters as WMF. After all, there's a reason the WMF clusters needed to be splitted.
Sure, but tools outside production probably don't need that extra millisecond those clusters are aiming for. Take the Analytics Team for example: They happily consider different tools because they have different goals.
Tim
Hi. I'm not happy with the decisions discussed here, too, and I don't want to support that decision with this mail, but if I understand it right, at least partly what you describe might be wrong.
If I read right, the Labs environment basically is a kind of private cloud, using lots of virtual machines as servers in a hardware setup of less machines with more power each.
If that's true and provided that the corresponding admins would allow it, then some of your points are wrong. (comments between your posts lines)
Am 26.09.2012 16:10, schrieb Merlissimo:
temporary blockers *[...]
- no support for script execution dependency (on ts: currently done by
sge)
if scripts run on virtual machines of the labs environment, it's possible 1) to run several scripts on the same machine, keeping dependencies in the execution, 2) to run several scripts on different machines with execution dependency modelled by web apis/interfaces. For some combinations that's overkill in complexity, but sometimes it might be useful, too.
- no support for servlets
most likely wrong as it should be possible to use virtual machines running java and a servlet container like tomcat or jetty on it.
missing support blockers
- no support for new users not familar with unix based systems
- no transparent updating of packages with security problems/bug
+1
permanent blockers
- license problems (i wrote code at work for my company and reuse
parts for my bot framework. I have not the right to declare this code as open source which is needed by labs policy.)
well... yes, but I doubt this is a big issue at all for most toolserver users as when I joined the toolserver a (declared) open source licensing declaration was a necessary condition for the tools.
- no DaB.
+10 (well... or more)
And after reading a little bit more about labs some points to add:
- as DaB already pointed out: no OSM database, and it's not possible even to use OSM data as every content used has to be under CC-license, which isn't true any more for OSM. - osm, which was a (not sure, how it was called exactly) partner project for WMDE - at least one that is acknowledged to be supportable by WMDE, is neither Wikimedia nor mediawiki and therefore not possible to work on at labs with projects, that are not directly incorporated to mediawiki (especially as the content isn't CC, see above)
regards Peter
P.S.: Interestingly very strict rules about personal data, but passwords are only to be hashed - someone could use unsalted md5 or even the hash-function myHash(s) { return substr(s, 0, 12) } which is a hash function, but neither cryptographic nor secure...
toolserver-l@lists.wikimedia.org