If your code doesn't read or write to databases in MediaWiki, feel free to ignore this email.
Hello,
We have made changes to MediaWiki's Rdbms library that simplify and improve how to get database connections and how to perform UPDATE queries. We encourage you to adopt these and provide feedback here or via Phabricator.
== 1. Get a database connection ==
What changed:
To get a connection to the primary database, you previously needed the following:
$lbFactory = MediaWikiServices::getInstance()->getDBLoadBalancerFactory();
$lbFactory->getMainLB()->getConnection( DB_PRIMARY );
And for a replica connection:
$lbFactory->getMainLB()->getConnection( DB_REPLICA );
You may now adopt the simpler $lbFactory->getPrimaryDatabase() and $lbFactory->getReplicaDatabase().
In other words, instead of needing knowledge of the internal LoadBalancer, its getConnection methods, and the DB_ constants, you only call getPrimaryDatabase() or getReplicaDatabase().
We won’t deprecate LoadBalancer::getConnection() any time soon. If your code depends on other LoadBalancer methods, feel free to keep your code as-is. This change is aimed at the most common use case where you only need a database connection.
The optional parameters have gotten more usable as well. getPrimaryDatabase() only takes $domain, which means you no longer need to skip over the non-applicable $groups parameter. getReplicaDatabase() takes a string for $group (instead of an array in LB::getConnection).
If you maintain service classes that inject a LoadBalancer object, we recommend transitioning to inject and type against LBFactory instead. This removes exposure to LB altogether in most cases.
Currently, getPrimaryDatabase/getReplicaDatabase do not yet support connection to external clusters (LB::getExternalLB). This is coming soon (see below).
Reasoning:
We have been reducing complexity around LB/LBF for a while (T326274 https://phabricator.wikimedia.org/T326274). The distinction between these is arbitrary in practice, each wears too many hats, and are not well encapsulated internally.
Most importantly, LB is mostly internal to the Rdbms library. Most developers only need a connection to perform read or write queries. The added coupling can lead to mistakes, misunderstandings, and can feel overwhelming. The LB documentation includes jargon aimed at Rdbms maintainer, such as references to internal methods like isReadyForRoundOperations, setTempTablesOnlyMode, getAnyOpenConnection, appendShutdownCPIndexAsQuery, etc.
The getConnection() signature is fragile and has led to major outages due to $domain being set incorrectly. E.g. if you need to connect to another wiki, you have to set $domain in *both* getMainLB() and getConnection(), otherwise *fireworks in production*. We keep this internally now to prevent future outages.
Lastly, this change makes it possible to add a narrow typehint dedicated to read-only connections on replica databases, separate from primary database connections (more below).
Example adoption:
-
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/891357 -
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/8904...
Adaption is being tracked in T330641 https://phabricator.wikimedia.org/T330641
What is next:
getPrimaryDatabase() and getReplicaDatabase() don't support "external" databases yet (e.g. x1 at WMF https://wikitech.wikimedia.org/wiki/MariaDB#Extension_storage). This is coming soon. Details and discussion at T330590 https://phabricator.wikimedia.org/T330590.
In addition to reducing service dependencies to the ILBFactory typehint, we are working on an even narrower interface (that ILBFactory extends) called IConnectionProvider https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/includes/libs/rdbms/lbfactory/IConnectionProvider.php which only has four methods. It’s marked @internal for now, but that will change after we hammer out some details.
== 2. IReadableDatabase typehint for replica connections ==
What changed:
When you typehint a parameter or class member to IDatabase where a replica database is expected (DB_REPLICA), you may change it to the narrower IReadableDatabas interface.
LBFactory::getReplicaDatabase returns IReadableDatabase.
Reasoning:
MediaWiki developers have at times performed write queries on a replica connection by mistake, which reached production. While we have effective protections against data corruption, our only option in production is to throw a fatal error, which leads to an outage. With this change we move detection to the very start of the development cycle, verified fully by static analysis in CI, regardless of test coverage!
The new interface also helps you to self-document in a standard way whether your service class only reads or also writes to the database. And in IDEs we will no longer autocomplete write methods on what are known to be replica connection objects. It’ll also make the coupling between the rest of MediaWiki and extension with the rdbms library looser https://en.wikipedia.org/wiki/Loose_coupling#In_programming.
Note that the IDatabase::query() is not part of IReadableDatabase. This method is discouraged as it executes raw SQL that may include write queries. If you call this method, continue to typehint IDatabase until you transition to safe methods like select() or SelectQueryBuilder.
Example adoption:
-
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/89...
== 3. Introducing UpdateQueryBuilder ==
What changed:
In 2020, we introduced a chainable SelectQueryBuilder https://www.mediawiki.org/wiki/Manual:Database_access#SelectQueryBuilder for SELECT queries. To complement this, we now added a class called UpdateQueryBuilder, for UPDATE queries. The following existing code uses update() with unnamed positional parameters:
$db->update( 'image', [ 'img_name' => 'foo' ], [ 'img_name' => 'bar' ], __METHOD__ );
You may now use the fluent style like so:
$db->newUpdateQueryBuilder()
->update( 'image' )
->set( [ 'img_name' => 'foo' ] )
->where( [ 'img_name' => 'bar' ])
->caller( __METHOD__ ) ->execute();
Reasoning:
Similar to the SelectQueryBuilder, the fluent style is more readable. It simplifies building of conditions, is less prone to mistakes, and is more aligned with widely-used frameworks such as Doctrine (PHP) and Jooq (Java).
We have had at least two data corruption incidents in production due to passing the wrong order of $set and $where condition (swapping them by mistake). This reduces the chance of such data corruptions happening, which alone was worth developing the builder for, before we get to the improved ergonomics.
Example adoption:
-
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/889627/5/includes/Category...
Adaption is being tracked in T330640 https://phabricator.wikimedia.org/T330640.
What is next:
We will continue to develop query builders for DELETE and INSERT queries.
The work on migrating calls from ::select() to SelectQueryBuilder is on-going (T311866 https://phabricator.wikimedia.org/T311866).
Thank you
mediawiki-l@lists.wikimedia.org