[Labs-l] A (21) day in the Labs

Tim Landscheidt tim at tim-landscheidt.de
Mon Apr 8 19:28:26 UTC 2013


"Marc A. Pelletier" <marc at uberbox.org> wrote:

> [...]

> The database replication is also well on its way; you can find the
> current roadmap at:

> https://wikitech.wikimedia.org/wiki/Tool_Labs/Database_plan

> [...]

To quote from there:

| Overview

| * All public wikis will be replicated to the LabsDB servers,
|   with private user data redacted.

| * First, data will be replicated to a special set of data-
|   base servers (PreLabsDBDBS) that use triggers to rewrite
|   or remove private data.  They will write row based bin-
|   logs.  Production shards will map 1:1 with mysql in-
|   stances, unlike on toolserver where some are combined via
|   a custom replication engine.

| * Triggers will be created with the help of the redactatron
|   schema review tool.

| * The actual labs databases will replicate from the above
|   mentioned databases.  Users will access data via views
|   that only include reviewed tables and columns to ensure
|   that unreviewed tables (such as from a new extension)
|   aren't exposed without prior review.

| * Replicated data will be stored on flash storage, while
|   each system will have a traditional disk array attached to
|   store labs project data.  Users will be able to join
|   project tables against wiki tables, but only within the
|   current shard.

| * The labs team will integrate these databases with labs,
|   automating database creation and access on a per-project
|   basis.

This means that JOINs for example between wikis and Commons
or Wikidata will not be possible.  WTF?  One of the stated
goals of Tool Labs is "Provide a location for analytics
work", so any changes here should /enhance/ the possibili-
ties the Toolserver offers and not shrink them.  This is BTW
one of the top items on the "Needed Toolserver features"
list.

I'm all for the "lazy sysadmin" paradigm, but I think that
shouldn't preclude usable databases.  River's trainwreck is
freely available
(https://svn.wikimedia.org/svnroot/mediawiki/trunk/tools/trainwreck)
and open source, and the effort to port it to Ubuntu and set
it up is a valuable investment (or the setup of any other
replication engine that provides the *needed* functionality;
many-to-one/many isn't something only Tool Labs requires).

Tim




More information about the Labs-l mailing list