Hello, There has been a lot of progress in abstract schema and abstract schema changes initiative since last time https://lists.wikimedia.org/pipermail/wikitech-l/2020-October/093954.html I gave an update on it. So here's another one.
*Abstract Schema* So far, more than 90% (51 out 56) of tables of mediawiki core are now migrated to abstract schema.
This means much smaller schema drifts between MySQL and Postgres. We have done more than 250 schema changes in Postgres to fix these drifts. Including 56 index rename, 66 data type change, setting default to 43 fields and changing nullability of 29 fields. To compare, that's more schema changes done on Postgres from 2014 until 2020. Once we have migrated all tables, we can close this four-year old ticket https://phabricator.wikimedia.org/T164898.
Similar improvement has happened on standardizing timestamp fields in MySQL https://phabricator.wikimedia.org/T42626, once all tables are migrated, we can call this eight-year old ticket done too.
One nice thing about having an abstract schema is that you can generate documentation automatically, This page is completely made https://www.mediawiki.org/w/index.php?title=User:Ladsgroup/Test&oldid=4379555 automatically from tables.json. We can make it generated in doc.wikimedia.org on every merge. And also we can make the database layout diagram https://www.mediawiki.org/w/index.php?title=Manual:Database_layout/diagram&action=render created automatically.
Another nice thing. When you have an abstract schema, you can easily write tests and enforce database conventions. For example, you can write a test to make sure all tables have exactly five columns (because five is your lucky number). We haven't written such a test but now there's a test that enforces a uniform prefix for columns and indexes of tables in core https://phabricator.wikimedia.org/T270033. We are currently fixing its violations to standardize our schema even more.
I'm planning to make reporting on drifts between the abstract schema and our production completely automated and make it accessible to DBAs for further investigations which is now much easier thanks to abstract schema. You can follow the progress of that work in this ticket. https://phabricator.wikimedia.org/T104459
*Abstract Schema Changes* Now we have a new maintenance script, it produces schema change sql files (aka ALTER TABLE files) based on snapshot of abstract schema of before and after of a table. Here's an example of an index rename. https://gerrit.wikimedia.org/r/c/mediawiki/core/+/651176 It would make creating schema change patches much easier (a little bit of work but you don't need to know internals of Postgres anymore, it's also less prone to mistakes)
With approval of RFC to drop support of upgrading from versions older than two LTS releases, we can now drop hundreds and hundreds of sql files. It would give us room to breath and audit our sql files to find orphan ones and improve abstract schema change work. That is currently blocked on this patch landing. https://gerrit.wikimedia.org/r/c/mediawiki/core/+/648576
We will work on reshaping the schema changes in general since its current checks system is less than optimal, its tests are not very updated and so much more to do.
*What can we do?* Glad you asked :D The biggest request I have from people is to migrate their extensions to abstract schema. There's a list of WMF-deployed extensions that their schema has not migrated yet https://phabricator.wikimedia.org/T261912. This is doubly important as we want to build a reporting system for drifts in production and it's not possible to report these drifts for extensions that their schema has not migrated yet. So if you or your team maintain an extension from that list, prioritize migrating that please. Reedy wrote a great script https://github.com/Ladsgroup/db-analyzor-tools/blob/master/db_abstractor.py that takes a sql file and produces its equivalent abstract schema and it gives you a good starting point (PR is welcome!). Feel free to add me as a reviewer to patches of migrating extensions to abstract schema.
Another thing is that if you use postgres for mediawiki, you help testing our postgres schema by trying master (make sure to take a backup first) and see if everything is alright.
*Thank you!* I would really like to thank Ammarpad for migrating lots of tables of core to abstract schema and handling all sorts of edge cases and doing most of the work of using uniform prefix tests and fixes. Thanks to James Forrester for reviewing lots of patches. Thanks Reedy for the script and also abstracting lots of tables in extensions and also Tgr for helping in reviews and getting the project going. Also a big thank you to DBAs for doing a lot more schema changes in production https://phabricator.wikimedia.org/tag/blocked-on-schema-change/. You rock! An apology is also warranted for breaking update.php on master twice (caused by yours truly).
Until next update!
wikitech-l@lists.wikimedia.org