It has been a while since I gave an update on the state of abstracting schema and schema changes in mediawiki. So here's a really long one.

So far around half of the mediawiki core tables have been migrated to abstract schema (plus lots of extensions lika Wikibase, Babel, Linter, BetaFeatures, etc.). Special thanks to Tgr for reviewing most of the patches and Sam Reed and James Forrester for doing the extensions.

With the growing number of schemas being abstracted, this is going to affect your development if you work on schema and schema changes in core or any of the extensions. So If you do, please read Manual:Schema changes in mediawiki.org

You might think that abstraction is just migrating SQL to JSON but it's much more, we are making the database schema of mediawiki much more consistent, We are basically addressing several long standing issues like T164898 and T42626 as well.

Improvement aspects

First aspect is drifts between different DBMSes. Sqlite schema is being produced by regex replacement (this code) which is less than great but at least it comes from one place. For Postgres, its schema and MySQL/Sqlite has drifted so drastically, that fixing it so far required 76 schema changes fixing issues ranging from missing indexes to missing PKs, extra AUTO_INCREMENT where it shouldn't be, missing DEFAULT values, drifting data types and much more.  You can follow the fixes of Postgres in here.

The second aspect is the inconsistency in the schema itself. How do we model strings? VARCHAR? VARBINARY()? VARCHAR() BINARY? (all three are different things). You'd be surprised how inconsistent our MySQL is. So far, we are migrating all VARCHAR() BINARY fields to VARBINARY() (so far ten schema changes).

Another inconsistency is timestamps. In MySQL, around half of them are BINARY(14) and the other half VARBINARY(14) (but in Postgres all are TIMESTAMPTZ), there is even a ticket about it. It makes sense to migrate all of them to BINARY(14) but not all timestamps are 14 characters, e.g. expiry fields accept "infinity" as value and it's a valid timestamp in Postgres ¯\_(ツ)_/¯ When you turn an expiry field to BINARY(14), "infinity" becomes "      infinity" and as the result mediawiki doesn't recognize it as infinity ("infinity" != "      infinity"). There are several ways to move forward handling expiry fields, you can follow the discussion in this gerrit patch.

Another fun aspect: Booleans. MySQL doesn't have boolean, it translates them to TINYINT(1) but other DBMSes don't have TINYINT, they have SMALLINT and BOOL though (and we mostly use SMALLINT for them), we decided to go with SMALLINT for these cases (which is different than what Doctrine DBAL does, it uses BOOL, so we introduced our own custom type for booleans).

Last but not least: ENUMs. MySQL and Postgres support that but Sqlite doesn't. Doctrine DBAL doesn't support ENUM at all (as it's an anti-pattern) while core has eight fields that are ENUM. There's an RFC to discourage using it in general. Feel free to comment on it.

A miscellaneous note: The directories that hold the archive of sql patches of schema change are exploding (some of the sql patches are even orphan but we can't find them because there are so many of them). So I started a RFC to clean that mess up: Drop support for database upgrade older than two LTS releases

What's next?

How can I help?
Glad you asked! You can follow the abstract-schema hashtag in gerrit and review patches or you can make them yourself (get yourself familiar using the documentations). If you maintain an extension feel free to migrate its table(s) (and track it in this ticket). If you use Postgres for mediawiki, please help us with testing the improvements for Postgres.

Thanks for reading this long email!

Amir (he/him)