Am 03.06.19 um 22:50 schrieb John:
With the introduction of the actor and comment tables,
I have seen a notable
spike in query times. Take a look at Jessie’s query. A query that used to take a
couple of seconds now takes almost 5 minutes. It also spanned what 8-9 different
joins now? In database design you should almost never have perfect normalization
due to performance issues. I know this point will be ignored, but it’s at least
worth mentioning.
The MediaWiki schema is full of denormalization for performance reasons.
Factoring comments and actors out of the revision table adds quite a bit of
complexity, especially for the migration phase - you can be sure that this was
not done because it looks nicer in an ER diagram.
This change was driven by an urgent need to reduce storage requirements of the
revision table, as well as a need to store additional information (machine
readable records along with human readable comments, representations of
different kinds of anonymous or remote users, etc).
Prior discussion, for reference:
*
https://phabricator.wikimedia.org/T161671
*
https://phabricator.wikimedia.org/T167246
*
https://phabricator.wikimedia.org/T153333
*
https://phabricator.wikimedia.org/T6714
*
https://phabricator.wikimedia.org/T95144
Wikimedia is running one of the top ten websites on about 1% of the budget other
companies "up there" have for the job. We are *very* keen on optimizing for
performance in terms of request time, but also have to optimize for performance
in terms of replication bandwidth, the time it takes to apply schema changes to
existing tables, how much hardware we can maintain, etc.
This normalization was done because after long discussion, it was found to be
the optimal solution in terms of overall performance of the website, considering
all factors. Otherwise, it would not have been done.
Of course, what is optimal for the Wikimedia sites may not be optimal for things
on ToolForge. It's impossible to optimize a schema for arbitrary queries, we can
only optimize for the ones that have most impact on most requests.
If you are interested in how such decisions are made, I invite you all to
participate in the RFC process
<https://www.mediawiki.org/wiki/Requests_for_comment/Process>. New RFCs, IRC
discussions, last calls for comments, and similar information is published in
the weekly TechCom radar email on wikitech-l.
--
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation