Hi Tim and all,
"Marc A. Pelletier" <marc@uberbox.org> wrote:
> [...]
> The database replication is also well on its way; you can find the
> current roadmap at:
> https://wikitech.wikimedia.org/wiki/Tool_Labs/Database_plan
> [...]
To quote from there:
| Overview
| * All public wikis will be replicated to the LabsDB servers,
| with private user data redacted.
| * First, data will be replicated to a special set of data-
| base servers (PreLabsDBDBS) that use triggers to rewrite
| or remove private data. They will write row based bin-
| logs. Production shards will map 1:1 with mysql in-
| stances, unlike on toolserver where some are combined via
| a custom replication engine.
| * Triggers will be created with the help of the redactatron
| schema review tool.
| * The actual labs databases will replicate from the above
| mentioned databases. Users will access data via views
| that only include reviewed tables and columns to ensure
| that unreviewed tables (such as from a new extension)
| aren't exposed without prior review.
| * Replicated data will be stored on flash storage, while
| each system will have a traditional disk array attached to
| store labs project data. Users will be able to join
| project tables against wiki tables, but only within the
| current shard.
| * The labs team will integrate these databases with labs,
| automating database creation and access on a per-project
| basis.
This means that JOINs for example between wikis and Commons
or Wikidata will not be possible. WTF? One of the stated
goals of Tool Labs is "Provide a location for analytics
work", so any changes here should /enhance/ the possibili-
ties the Toolserver offers and not shrink them. This is BTW
one of the top items on the "Needed Toolserver features"
list.