Hello, If you missed the previous updates, there's the first https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/HSVI63YWZP7OOXB46WP2NRH2RWNPYA2I/and the second https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/message/TMFVXO46J57BWQHQGZXXICXNY35EVOMI/one. This is the last one I'm sending but in a good way.
We now finished migrating all 57 core tables to abstract schema. Now tables.sql is empty for MySQL https://gerrit.wikimedia.org/g/mediawiki/core/+/c60ccf4e6d4932dddc2efd72a8abf6e56243f086/maintenance/tables.sql. We will soon remove the tables.sql files and links to them.
We have also cleaned more than hundreds of old schema change files. https://phabricator.wikimedia.org/T272199 That enabled us to actually look for unused sql files and drop tens of unused ones that have not been used since 2002 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/668552, 2004 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/668761, or 2005 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/670176 (and much more) and they got lost in the sheer number of our sql patch files.
The next update is exciting for me. With abstraction in place, we can now have a proper tracking of drifts between schema in paper and our production (This is a follow up from a major incident in 2018). Now we have https://drift-tracker.toolforge.org/ that keeps track of these drifts. Our schema has been around for more than twenty years and we have hundreds of database hosts, making sure everything is using the right database schema (and stays correct) is impossible manually and we have been finding and fixing these drifts since 2018, see this comment onwards https://phabricator.wikimedia.org/T104459#4314828.). Of course more work in improving the tracker is welcome (here's the list https://phabricator.wikimedia.org/project/board/5350/)
If you want to enjoy the benefits of abstract schema [1] in extension(s) you or your team maintains, Please abstract the schema of your extension. There's a long list of WMF deployed extensions that are not using abstract schema https://phabricator.wikimedia.org/T261912 and some even already have a patch that only needs reviewing. Once that's done, we can add that to drift tracking and have a more comprehensive list of potential issues. If you need help with the abstraction work, just ping me.
This also helped us resolve several long-standing tickets like T104459 https://phabricator.wikimedia.org/T104459 (5.5 years old), T62962 https://phabricator.wikimedia.org/T62962 (7 years old), and T42626 https://phabricator.wikimedia.org/T42626 (soon reaching its ninth birthday) and will help us to address even more tech debt in future.
There is more to be done, improving the abstract schema *change* system, finding a home of schema documentation, improving the drift tracker and making it more automated, so much more. But the biggest chunk of work is now finally done.
I really would like to thank Ammarpad for great work on abstracting the tables and handling all sorts of edge cases, James Forrester and Tgr for their reviews which without them this wouldn't be possible and Sam Reed who wrote a script to speed up migration https://github.com/Ladsgroup/db-analyzor-tools/blob/master/db_abstractor.py. This was a team work to its core.
[1] In more details, by abstracting you will have automated checks for dirfits of the schema of extension(s) you maintain and production. You will have Postgres support for free. Also, you can have automated documentation generation, ability to test the schema itself, and have better consistency of your data types (like one datatype for timestamps).
Until the next adventure.
mediawiki-l@lists.wikimedia.org