Am 03.06.19 um 22:50 schrieb John:
With the introduction of the actor and comment tables, I have seen a notable spike in query times. Take a look at Jessie’s query. A query that used to take a couple of seconds now takes almost 5 minutes. It also spanned what 8-9 different joins now? In database design you should almost never have perfect normalization due to performance issues. I know this point will be ignored, but it’s at least worth mentioning.
The MediaWiki schema is full of denormalization for performance reasons. Factoring comments and actors out of the revision table adds quite a bit of complexity, especially for the migration phase - you can be sure that this was not done because it looks nicer in an ER diagram.
This change was driven by an urgent need to reduce storage requirements of the revision table, as well as a need to store additional information (machine readable records along with human readable comments, representations of different kinds of anonymous or remote users, etc).
Prior discussion, for reference: * https://phabricator.wikimedia.org/T161671 * https://phabricator.wikimedia.org/T167246 * https://phabricator.wikimedia.org/T153333 * https://phabricator.wikimedia.org/T6714 * https://phabricator.wikimedia.org/T95144
Wikimedia is running one of the top ten websites on about 1% of the budget other companies "up there" have for the job. We are *very* keen on optimizing for performance in terms of request time, but also have to optimize for performance in terms of replication bandwidth, the time it takes to apply schema changes to existing tables, how much hardware we can maintain, etc.
This normalization was done because after long discussion, it was found to be the optimal solution in terms of overall performance of the website, considering all factors. Otherwise, it would not have been done.
Of course, what is optimal for the Wikimedia sites may not be optimal for things on ToolForge. It's impossible to optimize a schema for arbitrary queries, we can only optimize for the ones that have most impact on most requests.
If you are interested in how such decisions are made, I invite you all to participate in the RFC process https://www.mediawiki.org/wiki/Requests_for_comment/Process. New RFCs, IRC discussions, last calls for comments, and similar information is published in the weekly TechCom radar email on wikitech-l.