[Labs-l] A (21) day in the Labs
Tim Landscheidt
tim at tim-landscheidt.de
Mon Apr 8 19:28:26 UTC 2013
"Marc A. Pelletier" <marc at uberbox.org> wrote:
> [...]
> The database replication is also well on its way; you can find the
> current roadmap at:
> https://wikitech.wikimedia.org/wiki/Tool_Labs/Database_plan
> [...]
To quote from there:
| Overview
| * All public wikis will be replicated to the LabsDB servers,
| with private user data redacted.
| * First, data will be replicated to a special set of data-
| base servers (PreLabsDBDBS) that use triggers to rewrite
| or remove private data. They will write row based bin-
| logs. Production shards will map 1:1 with mysql in-
| stances, unlike on toolserver where some are combined via
| a custom replication engine.
| * Triggers will be created with the help of the redactatron
| schema review tool.
| * The actual labs databases will replicate from the above
| mentioned databases. Users will access data via views
| that only include reviewed tables and columns to ensure
| that unreviewed tables (such as from a new extension)
| aren't exposed without prior review.
| * Replicated data will be stored on flash storage, while
| each system will have a traditional disk array attached to
| store labs project data. Users will be able to join
| project tables against wiki tables, but only within the
| current shard.
| * The labs team will integrate these databases with labs,
| automating database creation and access on a per-project
| basis.
This means that JOINs for example between wikis and Commons
or Wikidata will not be possible. WTF? One of the stated
goals of Tool Labs is "Provide a location for analytics
work", so any changes here should /enhance/ the possibili-
ties the Toolserver offers and not shrink them. This is BTW
one of the top items on the "Needed Toolserver features"
list.
I'm all for the "lazy sysadmin" paradigm, but I think that
shouldn't preclude usable databases. River's trainwreck is
freely available
(https://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/trainwreck)
and open source, and the effort to port it to Ubuntu and set
it up is a valuable investment (or the setup of any other
replication engine that provides the *needed* functionality;
many-to-one/many isn't something only Tool Labs requires).
Tim
More information about the Labs-l
mailing list