On 17/09/13 00:11, Tim Landscheidt wrote:
Ryan Lanerlane@wikimedia.org wrote:
[...]
I'm more than happy to recommend a number of cloud services and am more than willing to give advice on how to configure and run tools and bots from those services. It's even possible to reuse the work we're doing in the tools project, or in the Wikimedia infrastructure via our puppet repository since our infrastructure is Open Source.
Very nice idea – how I get the mysql-replication-stream? I got several offers of donation if the Toolserver would continue; the only problem is the replication-data. But because the data is open-source, it shouldn’t be a problem than, should it?
Assuming you found a non-profit, host your infrastructure somewhere that doesn't cause legal issues and every person that has access to the data stream signs an NDA it's likely doable. [...]
The NDA isn't necessary. According to https://wikitech.wikimedia.org/wiki/Tool_Labs/Database_plan, the data set at the LabsDB stage is free of non-public data (modulo MariaDB accounts information which should probably not go off-site even with an NDA :-)).
So we could (and IMHO should) provide DB dumps/bin logs at dumps.wikimedia.org or somewhere similar to anyone who can download them.
Tim
labs db server do contain the non-public data. It's just not viewable. So there aren't bin logs for just the non-public data. You *could* make a dump of the database (probably creating a new tool, as mysqldump would simply dump the view definition)... assuming you that in doing that you don't kill labs filesystem. ;)