Ryan Lane<rlane(a)wikimedia.org> wrote:
[...]
>> I'm more than happy to recommend a
number of cloud services and am
>> more than willing to give advice on how to configure and run tools
>> and bots from those services. It's even possible to reuse the work
>> we're doing in the tools project, or in the Wikimedia infrastructure
>> via our puppet repository since our infrastructure is Open Source.
> Very nice idea – how I get the
mysql-replication-stream? I got several
> offers of donation if the Toolserver would continue; the only problem is
> the replication-data. But because the data is open-source, it shouldn’t
> be a problem than, should it?
Assuming you found a non-profit, host your
infrastructure somewhere that
doesn't cause legal issues and every person that has access to the data
stream signs an NDA it's likely doable. [...]
The NDA isn't necessary. According to
https://wikitech.wikimedia.org/wiki/Tool_Labs/Database_plan,
the data set at the LabsDB stage is free of non-public data
(modulo MariaDB accounts information which should probably
not go off-site even with an NDA :-)).
So we could (and IMHO should) provide DB dumps/bin logs at
dumps.wikimedia.org or somewhere similar to anyone who can
download them.
Tim
labs db server do contain the non-public data. It's just not viewable.
So there aren't bin logs for just the non-public data.
You *could* make a dump of the database (probably creating a new tool,
as mysqldump would simply dump the view definition)... assuming you that
in doing that you don't kill labs filesystem. ;)